Mini-project 1 - A method for test statistic selection via an urn analogy (Supervised by Dr. Stefan Grosskinsky and Dr. Sach Mukherjee)
Abstract: For problems in unsupervised learning, a test function, or test statistic T is often used without much consideration as to the suitability of that particular test to the given data set. We introduce a tool which can be used to make the choice of test function more methodical. This is done via resampling, and then comparing two or more realizations of the outputs that the tests give. This is a very general method, which can be used in a large number of unsupervised learning problems. We think of this process as being analogous to drawing balls from an urn with certain properties. A result that shows for which test statistic our utility is greater, using measurable variables, and a necessary condition for this to be true is derived theoretically, and we also present results from simulation aimed at measuring the effectiveness of our approach.
Mini-project 2 - Stochastic models of infectious disease on homogeneous and heterogeneous networks (Supervised by Dr.Thomas House)
Abstract: The inclusion of stochasticity and network structure in models of infectious disease is becoming increasingly prevalent, in an attempt to capture more accurately the variability of the infection dynamics. Most commonly these are included through simulation, though there has also been progress at tackling these issues analytically, allowing better prediction and intervention. In this project we have focused on the analytical approach and have looked at the susceptible-infectious- recovered (SIR) and susceptible-infected-susceptible (SIS) models. In the first part, we have looked at the early growth of the SIR model over a heterogeneous network, and how the variance of the number of infecteds is affected by the structure of the network that we consider. We find that the skew of the network distribution is of importance for this, as well as lower moments of the distribution. In the second part, we consider the SIS model and calculate how the temporal variability of the number of infecteds from the endemic equilibrium is spread over different frequencies. This is done for both mean-field models and homogeneous pairwise models, in an attempt to show that the limit of the pairwise model is the mean-field model for this measurement. During this project we have shown a qualitative agreement here, though the two models give quantitatively different results.