Skip to main content Skip to navigation

Event Diary

Show all calendar items

CRiSM Seminar - Rajen Shah (Cambridge)

- Export as iCalendar
Location: A1.01

Rajen Shah (Cambridge)

Random Intersection Trees for finding interactions in large, sparse datasets

Many large-scale datasets are characterised by a large number (possibly tens of thousands or millions) of sparse variables. Examples range from medical insurance data to text analysis. While estimating main effects in regression problems involving such data is now a reasonably well-studied problem, finding interactions between variables remains a serious computational challenge. As brute force searches through all possible interactions are infeasible, most approaches build up interaction sets incrementally, adding variables in a greedy fashion. The drawback is that potentially informative high-order interactions may be overlooked. Here, we propose an alternative approach for classification problems with binary predictor variables, called Random Intersection Trees. It works by starting with a maximal interaction that includes all variables, and then gradually removing variables if they fail to appear in randomly chosen observations of a class of interest. We show that with this method, under some weak assumptions, interactions can be found with high probability, and that the computational complexity of our procedure is much smaller than for a brute force search.

Show all calendar items