Talk Abstracts

Florian Markowetz. “Reconstructing networks from experimental and natural genetic perturbations”

Functional genomics has demonstrated considerable success in inferring the inner working of a cell through analysis of its response to various perturbations. Perturbations can take the form of experimental interventions, like gene deletions or RNA interference, or natural perturbations, like SNPs or copy-number alterations. In my talk I will describe methods my lab has developed to reconstruct networks from the phenotypic effects of gene perturbations. In particular, I will (1) describe Nested Effects Models, a class of probabilistic graphical to reconstruct signaling pathways from downstream effects, and (2) introduce methods to correlate the impact of copy-number variation on gene expression with different sub-types of breast cancer.

Nicolo Fusi and Neil Lawrence. “Estimating the contribution of non-genetic factors to gene expression using Gaussian Process Latent Variable Models”

Thanks to the recent increase in the amount of genetic profiling data available and to the ability to characterize disease activity through gene expression, it is possible to understand more in detail the multitude of causal factors linked with each disease. This is a challenging task because the integration of different sources of biological data is not straightforward and because non-genetic factors (such as differences in the experimental setting or individual characteristics such as gender and ethnicity) are not always artificially controlled. Since these non-genetic factors may cause most of the variation in gene-expression reducing the accuracy of genetic studies, there’s a pressing need for models that take them explicitly into account. We present a model in which non-genetic factors are unobserved latent variables the gene expression levels can be described as linear functions of both these latent variables and Single Nucleotide Polymorphisms (SNPs). From a generative point of view, we can see the gene expression levels Y as

Y = SV + XW +mu 1^T + epsilon

Where S is the matrix containing the SNPs, X are the latent variables, V and W are mapping matrices, is a Gaussian distributed isotropic error model and mu allows the model to have non-zero mean.

The model is inspired by the one proposed by Stegle et al. [1], but instead of optimizing parameters and marginalising latent variables (as in Probabilistic PCA), we marginalise the parameters and optimize the latent variables. For a particular choice of prior over the mapping matrices W and V the two approaches are equivalent.

This kind of model is called dual Probabilistic PCA and it belongs to a wider class of models called Gaussian Process - Latent Variable Models. Indeed, dual PPCA is the special case where the output dimensions are assumed to be linear, independent and identically distributed. Each of these assumptions can be relaxed obtaining new probabilistic models. Many extensions of this model are possible, but even in its simplest form the eQTL study results are extremely promising in terms of number of significant associations found.

Chris Barnes, Xia Sheng and Michael Stumpf. “Using Sequential Monte Carlo Approaches as a Design Tool in Synthetic Biology”

In many engineering contexts it is easy to state what we want but hard to achieve our desired outcomes. The more potential solutions exist, the harder it becomes to identify optimal solutions. Here we show how this problem can be approached in an approximate Bayesian computation framework. Our approach has the advantage that it builds on the powerful Bayesian model selection formalism, includes sensitivity and robustness analysis at no extra cost, and flexibly incorporates diverse design objectives. We illustrate the performance of this approach in the context of bacterial two-component systems (TCS). These systems enable prokaryotes (and some simple eukaryotes and plants) to sense their environments and adapt their internal state to changing circumstances. We present a detailed analysis of orthodox and unorthodox TCSs and show how we can rationally construct TCS that show robust and optimal response characteristics to different stimuli encountered during bacterial infections or in biotechnological (e.g. biofuels production and bioremediation) applications. We conclude by elaborating on the connections between our approach and maximum-entropy procedures and the advantages over traditional engineering strategies.

Diego di Bernado. “Identification of gene regulatory networks and drug mode of action from Large-scale Experimental Data”

A gene regulatory network, where two genes are connected if they are directly, or functionally, regulating each other, can be 'reverse-engineered' from large-scale experimental data such as gene expression profiles. Here used a simple but effective reverse-engineering approach using all the available gene expression profiles in mammals, solving along the way the problems of handling, normalizing and analysing such massive dataset. We reverse-engineered a coexpression network for Homo Sapiens (Mus Musculus) from a set of 20,255 (8895) gene expression profiles. The human (mouse) network is characterized by a set of 22283 (45101) nodes (i.e. genes) and a set of 4,817,629 (14,641,095) edges, where the edge is weighted by the Mutual Information (MI) measure between the two genes.

We show how the resulting network can be then used to understand the function of a gene, the modularity of gene regulation, as well as, as a tool to analyse "gene signatures" to identify the mode of action of a drug.

We will also show how it is possible to use gene expression profile to build a "drug network", where drugs can be automatically grouped in subnetworks ('communities') of drugs sharing a similar mode of action.

Georgia Chan and Michael P.H. Stumpf. “Conditional Relevance Networks and Assessment of Statistical Significance”

The 'ab initio' inference of gene regulatory networks from static microarray data, remains a challenging task for the bioinformatics community. We propose a scheme that seeks to de-convolute dependence loops that inflate the total number of significant pairwise interactions between genes and identify specific regulatory patterns that arise frequently in biological networks.

The method starts with the estimation of the mutual information information matrix of all the pairs of genes in the sample. Mutual information accounts for both linear and non-linear associations and is already an established tool for modelling dependencies between variables, for which we are not able, or do not wish to assume any parametric form. In theory, unrelated pairs of genes should have zero mutual information; however, sample point estimates usually deviate from zero, due to noise in the data. Moreover, the criterion for true dependency between two variables needs to be estimated from the data, as mutual information scores do not conform to any known distributional families.

To address these issues, as well filtering out the most significant of dependence scores, we employ non-parametric and semi-parametric approaches that allow us to rationally assess the significance of edges. Gene expression values are the result of interplay of different regulatory factors and more often than not form loops, or chain-like interactions. Using mutual information and its conditional equivalent, we have compiled set of heuristics that discriminate among a set of commonly regulatory patterns: corregulation, Markov chain and other synergistic effects like XOR and OR. We apply this formalism to a vast dataset generated in Neisseria Meningitides a largely unexplored organism where little prior knowledge is available but non-linear interactions are expected to be near-ubiquitous.

Antti Honkela, Neil Lawrence and Magnus Rattray. “Decoding Underlying Behaviour from Destructive Time Series Experiments through Gaussian Process Models”

A major problem for biological time series is that often experiments (such as gene expression measurements using microarrays or RNA-seq) require the organism or cells to be destroyed. This means that a particular time series is often a series of measurements of different organisms (or batches of cells) at different times. Biological replicates normally consist of a separate biological sample measured at the same time. With the advent of single cell expression experiments, where it is not currently conceivable to make genome-wide gene expression measurements without destroying the cell, we expect such set ups to be sustained.

Many existing approaches to modelling transcriptional data postulate a differential equation model for continuous-time expression profiles from which the repeated observations arise. Two ways of modelling repeat experiments would be either to handle repeated observations as being from a shared profile, or from completely independent profiles. The former approach assumes that gene expression profile for each experiment does not vary, whilst the latter approach assumes no relationship between the gene expression profiles. For many experimental set ups we might expect something in between these two extremes where, whilst each individual measurement comes from a different collection of cells or a different organism, the experimental set up is broadly the same. We therefore expect some shared affects and some independent affects for the experiments.

In this work we propose an integrated Gaussian process framework for analysis of such experiments. In our approach, independent aspects of the experiments are modelled as independent Gaussian process draws, while the common profile across the experiments is modelled by a separate Gaussian process. The method adds power through sharing of replicates for the common profile while being robust to outliers from individual rogue experiments.

Christopher Penfold and David Wild. “How to infer gene networks from expression profiles, revisited”

Reconstructing gene-regulatory networks from time-course expression profiles represents a major goal within the systems biology community [1]. Currently a significant number of linear approaches to network reconstruction exist, including methods based upon Ordinary Differential Equations (NIR, tSNI), dynamic Bayesian networks (BANJO, G1DBN) and Bayesian State Space models (VBSSM). Here a number of such algorithms, including a nonlinear method based upon Gaussian process regression (CSI), are benchmarked against known datasets, including the in silico network of Zak et al. [2] and the synthetic yeast network of Cantone et al. [3].

Whilst the nonlinear Gaussian process based approach appears to perform better than random on all networks tested, it does not outperform many standard linear methods. Furthermore, whilst previous benchmarking studies of the synthetic yeast network [3] suggest that dynamic Bayesian networks (BANJO) significantly under perform compared to ODE-based approaches, in some cases performing no better than random, alternative dynamic Bayesian network methods (G1DBN, VBSSM) are competitive with ODE-based methods in these benchmarks. This appears to be true of other networks, suggesting that expert knowledge of a single algorithm may be preferential to incomplete knowledge of several.

Richard Bonneau, "Learning biological networks: from modules to dynamics"

Cunlu Zou, Christophe Ladroue, Shuixia Guo and Jianfeng Feng, "Identifying interactions in the time and frequency domains in local and global networks"

Reverse-engineering approaches such as Bayesian network inference, ordinary differential equations (ODEs) and information theory are widely applied to deriving causal relationships among different elements such as genes, proteins, metabolites, neurons, brain areas and so on, based upon multi-dimensional spatial and temporal data. Here we focused on the Granger causality approach in both the time and frequency domains in local and global networks, and applied our approach to experimental data (genes and proteins). For a small gene network, Granger causality outperformed all the other three approaches mentioned above. A global protein network from 812 proteins was reconstructed, using a novel approach. The obtained results fitted well with known experimental findings and opened up many experimentally testable predictions. In addition to interactions in the time domain, interactions in the frequency domain were also recovered. Our approach is general and can be easily applied to other types of temporal data.

Svetlana Amirova, Declan Bates, Claudia Rato da Silva, Ian Stansfield and Heather Wallace. “Uncovering the design principles of polyamine regulation: an integrated modelling and experimental study”

A new complete predictive model of the polyamine metabolism in the yeast Saccharomyces cerevisiae is developed using a Systems Biology approach incorporating enzyme kinetics, statistical analysis, control engineering and experimental molecular biology of translation. The polyamine molecules putrescine, spermidine and spermine are involved in a number of important cellular processes such as transcriptional silencing, translation, protection from reactive oxygen species, coenzyme A synthesis and components of polyamine pathway are potential targets for cancer therapeutics. Unregulated polyamine synthesis can trigger uncontrolled cell proliferation. Conversely, polyamine depletion can cause apoptosis, and during development, defects leading to mental retardation in humans. Our approach uncovers the multiple feedback control mechanisms in the polyamine metabolic pathway; also it provides source of robustness and its associated dynamical properties. The main focus is highly conserved negative feedback loop regulating level of enzyme Spe1, the enzyme catalysing the first step in the polyamine biosynthesis pathway by the protein Antizyme synthesized by a polyamine-dependent translational frameshifting mechanism.

The non-linear dynamical model is based on data obtained from specially designed experiments on translational frameshifting and readthrough. The experimental data are analyzed and incorporated via statistical functions in corresponding Antizyme synthesis and Antizyme/Spe1 degradation modules of the model. Also model structure includes polyamine biosynthesis pathway module based on kinetics data for 6 enzymes and adapted for use with two other modules. This quantitative model of the polyamine "controller" reproduces experimental data and predicts polyamine content under normal conditions and at various decease-induced scenarios that cannot be seen from experiments. Possible applications are in pharmacology; toxicology, preclinical drug development for cancer and neurodegenerative disorders: anti-cancer drug DFMO, Snyder-Robinson Syndrome.

Wei Liu and Mahesan Niranjan. “Deterministic and Stochastic Models of Bicoid Protein Gradient Formation in Drosophila Embryos: Modelling Maternal mRNA degradation”

Passive diffusion of a class of molecules known as morphogens as a mechanism that helps to establish spatial patterns of gene expression during embryonic development was proposed by Turing [1]. This mechanism is usually modelled as passive diffusion of morphogen proteins translated from maternally deposited messenger RNAs. Such diffusion models assume a constant supply of morphogens at the source throughout the establishment of the required profile at steady state [2]. Working with the bicoid morphogen which establishes the anterior-posterior axis in the Drosophila embryo, we note that this constant source assumption is unrealistic since the maternal mRNA is known to decay after a certain time since egg laying. In [3], we have incorporated a more realistic model of the morphogen source since the maternal mRNA should be expected to decay.We explicitly model the source as a constant supply followed by exponential decay and solve the reaction diffusion equation numerically for one dimensional morphogen propagation. By minimising the squared error between model outputs and measurements published in the FlyEx database, we show how parameters of diffusion rate, mRNA and protein decay constants, and the onset of maternal mRNA decay can be assigned sensible values. We also extend this work to further show how such a realistic source model may be combined with a recently published flow model [4] that takes into account advective transport. Moreover, a stochastic simulation based model [5] which includes Bicoid molecule reactions has also been implemented with new source model in our work.

Vladimir Miloserdov and Nigel Burroughs. “tatistical analysis of protein patternation on cell membranes during immunological synapse”

A statistical analysis of two different experiments is considered. Both of them are related to understanding the mechanism behind the distribution of molecules involved in formation of organized patterns of protein complexes and molecules in the contact interface between the membranes of an immune cell and an antigen presenting cell. Such patterns are called immunological synapses.

In the first experiment a T-cell is adhering to the flat surface of a lipid bilayer. There are molecules of two types on the surface of the bilayer. They are fluorescently labelled with different colours so their distribution can be observed using microscope. During the contact molecules of one type are binding while second type molecules stay unbound. This results in segregation of different type molecules and forming a synapse pattern that can be observed and scanned using confocal microscopy. In the case of lipid bilayer the contact interface is flat and the whole contact interface can be scanned as a single image.

The second experiment deals with NK-cells forming synapses with target antigen presenting cells. Two-colour fluorescent labelling is used again and a similar protein patternation on the cell-cell contact interface can observed using confocal microscopy. The main difference with the first experiment is in imaging technique as instead of a single image a series of confocal images is made along the same axis which is approximately parallel to the synapse interface. As a result a stack of cross-section fluorescence images of the interacting cells is considered for the quantitative analysis.

In both experiments it is possible to observe the segregation of labelled molecules during the formation of the synapse pattern. In terms of fluorescence intensity values this is expressed in strong negative correlation between different colour fluorescence. We introduce a model based on the hypothesis of exclusion by size which explains the mutual segregation of molecules as a result of elastic properties of single molecules and bonds combined with the properties of the cell membrane. Based on this model a computational algorithm for the Bayesian statistical analysis of fluorescence images is developed in order to estimate relevant physical parameters that cannot be measured explicitly.

Elias Manolakos. “Machine Learning Methods for Effective Proteomics Image Analysis”

Two-dimensional gel electrophoresis (2DGE) remains the most widely used method for proteins identification and differential expression analysis, due to its lower cost and the existence of mature commercial software tools for 2DGE image analysis, despite the fact that non-gel based methods are gaining in popularity. Although there are several software packages that promise automation of the whole protein spot detection and quantification process, the hard reality remains today [1] that as Fey and Larsen stated in 2001, "There is no program that is remotely automatic when presented with complex 2-DE images" . . . "most programs require often more than a day of user hands-on time to edit the image before it can be fully entered into the database‚" [2].

To address these limitations and develop an automated 2DGE image analysis workflow we have developed in previous works an effective image analysis methodology that first denoises the 2DGE image based on the Controurlet transform [3] and then separates effectively the parts of the denoised image which include true protein spots (to be called Regions of Interest (ROIs) from the background-only areas, by using Active Contours (AC) without edges [4]. In this work we complete the image analysis workflow by adding a well tuned pipeline of operations based on unsupervised machine learning methods for analyzing further each isolated ROI, in order to "fish" in it the centers and estimate the quantities of the individual "hidden" spots.One-dimensional mixture modeling of the ROI pixel intensities histogram is applied first to identify and remove any remaining background pixels. Then the surviving ROI pixels are used as "molecules generators", in order to convert (by random sampling) the processed ROI image to an isomorphic dataset (through appropriate random sampling) representing the distribution of molecules of the underlying protein species (that are "projected" as spots on the gel image). This reverse engineering action rooted on machine learning constitutes a unique innovation of this work that, to the best of our knowledge, has not been applied before in 2DGE image analysis. The candidate protein spot centers are then located by applying hierarchical clustering. Finally the individual spot boundaries are delineated by fitting 2D Gaussian models to the data using generalized mixture modeling and the Minimum Message Length (MML) criterion to control the best model complexity. An extensive evaluation of this novel spot modeling methodology using both real and synthetic 2DGE images reveals that it is more precise and more specific than PDQuest in terms of spot detection while both methods achieve comparable high sensitivity. Furthermore, it can estimate more reliably the volumes of the extracted spots, even in the presence of substantial noise and in areas of the image where faint and overlapping (or saturated) spots are located close to each other. It should be noted that the end-to-end workflow that we have developed for 2DGE image analysis does not require any re-calibration of parameters every time a new gel image is presented for analysis. This desirable characteristic makes it a suitable candidate for the automatic processing of image stacks, as needed for highthroughput proteomics analysis to support systems biology projects.