Skip to main content

Statistical methods for integrating multi-omics sequence data and unveiling molecular underpinnings of non-small cell lung cancer

Principal Supervisor: Professor Zewei Luo – School of Biosciences

Co-supervisor: Dr Lindsey Leach – School of Biosciences

PhD project title: Statistical methods for integrating multi-omics sequence data and unveiling molecular underpinnings of non-small cell lung cancer

University of Registration: University of Birmingham

Project outline:

The past decade has witnessed tremendous advancements in sequencing technologies, enabling collection of highly accurate and high resolution data on the structure and function of the genome. This opens great opportunities to tackle many fundamental questions in medicine, agriculture and environmental biology.

Funded by an international collaborative project, we have collected genomic DNA, mRNA, microRNA and MeDIP (Methylated DNA Immunoprecipition) sequence data from about 200 carefully scrutinized pairs of NSCLC (Non-small cell lung cancer) tissue and the corresponding paratumorous tissue samples, as well as from 20 cell lines of 3 major NSCLC pathological types. Based on these sequencing datasets, the project is proposed to develop statistical methods and computational tools for integrating the multilayers of omic sequence data to identify the molecules and the interactions which have significant influence on progression and development of this disease. The project will involve case and control based genome wide association analysis (GWAS) with genomic DNA, RNA, and microRNA expression data, analysis of multi-dimensional network construction, causal relationship prediction, and functional module detection from the constructed expression and regulation networks.

Though the project uses sequencing datasets from lung cancer samples, it addresses a generic bioinformatics and statistical question of how different omics sequencing data can be integrated to differentiate biological case and control treatments. The focus of the project will be on the development of efficient statistical algorithms and the corresponding bioinformatic tools for modelling and analysing such biological datasets. It is therefore highly relevant to a broad range of genomic analyses with human, plant or animal species. Relevance to BBSRC and Approvals 1. How does this project fit within the remit of the BBSRC? (3-4 lines) This project will develop novel methods for integrating multiple layers of omics data into biological networks capable of predicting cellular behaviour. It addresses the bottleneck of developing statistical methods and computational tools for handling the complex high dimensional datasets rapidly accruing across modern biosciences.

BBSRC Strategic Research Priority: Molecules, cells and systems

Techniques that will be undertaken during the project:

Analytical/computational skills:

(i) development of statistical models and algorithms for analyzing the multi-omics datasets collected from NGS,

(ii) developing computational ability to compile computer programmes and scripts for computer simulation study and analysis of real experimental datasets (e.g. using Fortran 90/95 with IMSL libraries, C++, Perl), and

(iii) developing skills to use main stream computer software for mathematical and statistical analyses (e.g. Mathematica, R), for data manipulation (e.g. Perl) on windows or linux platforms.

We will provide applicants from a biology background with training in statistics and computer programming and applicants from a mathematics/physics subject with training in molecular biology and genomics.

Contact: Professor Zewei Luo, University of Birmingham