Skip to main content

Feature: What is Data Science?

Data Banner
Data Science combines elements of Mathematics, Statistics and Computer Science to turn unprecedented volumes of data into actionable knowledge. Though it has been widely covered in the media and has found applications in almost every sector of business, it remains a subject that is often misunderstood. To help make sense of Data Science and understand a little about how it's changing the world, we asked a leading Warwick academic what his work is all about.

Professor Graham Cormode

Professor Graham Cormode

Data Science involves both applications to real world problems, and the development of new techniques to help analyze data. My work aims to bridge theory and practice, by looking for ways to create very small summaries of very large amounts of data that capture the essential features.
What are your main areas of research?
I'm interested in the whole data lifecycle, from capture and cleaning through to mining and analytics. In terms of specific topics, I look at fundamental problems in data mining, privacy and streaming algorithms.

What does this research entail?
Much of my work is about finding approximate answers to questions that are too costly to solve exactly. Our capacity for creating ever-more information is outpacing improvements in computational power. More and more devices are creating data: sensors recording GPS locations of vehicles in cities, smartphones capturing information about their users, social networks with millions of posts and photos. There's an increasing mismatch between our ability to create information, and our ability to digest this to make sense of it. Much of my research involves coming up with ways to quickly summarise information, and allowing these summaries to be used to understand the data.

How does it actually work?
Suppose you are doing a survey of all the cars on a stretch of road, over several weeks. For any car, you want to be able to count how many times you have seen it before. But you only have one sheet of A4 to keep notes on! There's no room to record a count for every number plate. Instead, you could just keep track of counts for the last few digits of each plate. Then, estimate the number of times we've seen a car as the number of times we've seen those digits. This will be quite a rough estimate, as we'll get different cars mixed up, but it's not a bad start. We can refine this approach by choosing a more unpredictable way to go from a number plate to a line on the paper. And we can reduce the number of cases of mistaken identity by mapping each plate a few times, using a different mapping function each time. When we estimate the number of times we see a plate, we look at all the places it is counted, and take the smallest count, as this has the least number of contributions from other vehicles.

Is this research that gets used?
This scheme, with more maths behind it, is the idea behind the "Count-Min sketch" algorithm I invented for tracking counts in rapid streams of data. It has been used in a variety of settings where many events are happening very quickly, and we want to find which are most popular. For example, when Twitter tracks which websites are showing tweets, they use this to see which attract the most hits. The algorithm is taught at universities around the world, including Warwick, as a way to summarize big data.

Big grants from the European Research Council and the Royal Society are supporting me to extend this approach to large matrices and networks. This will help to find patterns in big social networks and search engines, and to give better recommendations in social networks and e-commerce sites.

Triangle Statistics
Does Warwick teach Data Science?
Warwick was the first university to offer an undergraduate Data Science degree in the UK. The Department also runs a very popular Data Analytics Masters degree, which has lots of links with business and government. Data Scientific techniques also feature throughout our core Computer Science degree - with applications to digital forensics, social media analytics and the design of autonomous vehicles.

Warwick is also a founding partner in the Alan Turing Institute in London (with the universities of Cambridge, Edinburgh, Oxford and UCL), the national institute for research in Data Science.