Skip to main content

Gene copy number variation and recent human evolution

Principal Supervisor: Dr Ed Hollox - Department of Genetics

Co-supervisor: Dr Richard Badge

PhD project title: Gene copy number variation and recent human evolution

University of Registration: University of Leicester

Project outline:

Over the past 60,000 years, modern humans have dispersed from Africa into the rest of the world. In the process, humans have adapted to new environments both culturally and genetically. Subsequently, humans have altered the environment themselves, with the development of species domestication, agriculture and spread of urbanisation.

The role of natural selection in recent human evolution remains unclear. There are some well-founded examples, where there is strong genetic evidence for natural selection, a molecular basis for the phenotypic change, and a convincing adaptive evolutionary explanation for the phenotype. An example is the lactase persistence allele enabling digestion of lactose in milk as an adult. Another example is the sickle-cell haemoglobin allele, which is at higher frequencies in populations with endemic malaria because of the protection it confers against severe malarial symptoms. However, such well-validated examples are few.

Because of the recent timescales involved in recent human evolution, it has been argued that subtle changes in allele frequency at multiple loci mediate adaptation. This is likely to be true, certainly in African populations where there is a high level of standing variation, but validating any phenotypic effect of such subtle changes is challenging. As well as single nucleotide polymorphism (SNP), multiallelic copy number variation (mCNV) is another source of genomic variation, where individuals differ in the number of copies of a gene, with several alleles within a population. mCNV is extensive and its evolutionary role is underexplored. Despite this, there are two notable instances where mCNV may underly a recent adaptation. The first is the amylase locus, where it has been claimed that higher copy number of the amylase gene in certain populations is an adaptation to a starch rich diet. The second is the salivary agglutinin gene, which encodes a protein that binds teeth and Streptococcus mutans (the causative agent of dental caries), and where two mCNVs covary with a history of a starch-rich diet. Such findings strongly suggest this is an area that will yield new discoveries.

There are three reasons why copy number variable regions are likely to contain genes that have responded rapidly during recent human evolution. Firstly, they have a high mutation rate, which both maintains a high level of genetic variation upon which selection can act, and also allows copy number to adapt rapidly to environmental change. Secondly, copy number changes can provide phenotypic changes through simple gross mechanisms, such as altering levels of gene expression by a gene dosage effect, or by generation of fusion genes with novel function. Thirdly, inspection of such CNV regions show that they are enriched genes involved in cell-cell interaction, metabolism and immune response, phenotypes that are likely to have been under selection in recent human evolution.

Large genomic atlases of copy number variation have appeared only recently, stimulated by new methods of analysing second-generation sequencing data. These allow comparison of mCNVs that show an unusually high differentiation in copy number between populations, suggesting natural selection has operated.

This project will focus on these highly-differentiated mCNVs, and the genes within them. In particular we will apply new bioinformatics approaches developed in EJH’s group to utilise short read sequencing to call sequence variation with copy number variable regions. This will open up not only the 1000 Genomes samples, which are currently being sequenced to high depth, but other samples, including those in the 100K Genomes project. EJH is a member of the Population genetics Genomics England Clinical Intepretation partnership, led by Richard Durbin. We will then use this data to fully explore the evolutionary dynamics of these CNV regions and investigate evidence of recent selection of particular paralogues.

BBSRC Strategic Research Priority: Molecules, cells and systems

Techniques that will be undertaken during the project:

  • Exact details are liable to change – this project will not start until October 2018 (two years time) and the field is moving fast.
  • Remapping of short sequence reads to a custom reference sequence
  • Genotype calling in a copy-number aware manner
  • Reconstruction of copy number haplotypes
  • Evolutionary inferences from haplotypes
  • Validation of inferences using long PCR and high throughput sequencing/long read sequencing
  • Analysis of publically available RNA-seq data for fusion transcripts or paralog-specific transcripts

Contact: Dr Ed Hollox, University of Leicester