Skip to main content Skip to navigation

Data Science Challenge 2014

Are you interested in using statistics and computers to dive deep into data? Can you find the patterns behind thousands of individual users activities to see the big picture? If so, maybe you can win the Warwick Data Science Challenge 2014.

Data Science

Data Science is the combination of Computer Science and Statistics. That is, using computer tools to understand large data sets and draw interesting conclusions. The topic is attracting increasing attention, with many companies looking to recruit people with exactly this combination of skills. Students choosing to study Statistics, Computer Science, or courses that cover both, are preparing themselves well to embark on this exciting career path.

The best way to appreciate data science is to try it for yourself, which is precisely why we have set up the Warwick Data Science Challenge. This challenge is designed to given A-level students (and those a little older if they like) the opportunity to explore and understand an interesting dataset.

The Data

The dataset is information on views of news stories from the BBC News Website (from the BBC Open RUM project). This covers information about 36,000 article loads from mobile devices relating to a period of just 10 minutes on Tuesday October 29th 2013. For each article load, there is a record containing quite a lot of information: the article that was viewed, the location of the user, the type of device and operating system used and more. Below you will find a few suggestions for how you might approach the problem of finding some trends or patterns in this data set, as well as a link to download the data and a description of it.

The Science

Data Science is about finding structure and patterns within the data. Your task is to explore this data set, and report back on what you find. Don't worry too much about exactly what you will find, coming up with something to say about the information in the data is all part of the challenge!

You can use whatever tools you like to perform your analysis: spreadsheet programs like Excel and Google Docs; programming languages like Python, Java and Basic; or packages like R, Weka, SPSS or MATLAB.

The aim is to find patterns or build models of the data that explain the existing behaviour or predict future trends.

The Challenge

To enter the competition you will need to write a short summary of your findings that will impress our panel of expert judges. Although there are no fixed criteria for how to impress them, you might like to consider what you find impressive when reading about scientific results, e.g., depth, originality, repeatability. Including plots and charts may help to get your message across.

Some possible questions you might want to pick from and use the data to address are:

  • Which news stories were the most popular during the span of the data set?
  • What categories of story attracted most interest? How does this vary around the world?
  • What type of phone do news readers use - iphone or Android? Are iphone users more interested in world news than android users?
  • Can you build a model from part of the data that takes in a new example, and gives a good prediction for where the user is based?

Prizes will be awarded for the best analysis, the best presentation of results (visualisation), and the most innovative approach.


Challenge Rules

The Warwick Data Science Challenge is being run by the Departments of Statistics and Computer Science at the University of Warwick. Warwick has an undergraduate degree in Data Science jointly taught by both departments, and the challenge is intended to raise awareness of this in-demand area as a degree subject for the most capable students. The winners and runners up will be invited to the University of Warwick to present their results and receive their awards.

  1. The competition is open to A-level (and equivalent) students based in the UK
  2. Submission format: Up to 4 pages A4 (no less than 11pt font)
  3. You may work in teams of up to 4 people
  4. You can receive input from friends, teachers, parents etc., but the work must be your own
  5. We will operate a message board for questions and clarifications
  6. You may use any additional data
  7. You will be judged on the quality of your written submission
  8. Address your own hypothesis or one (or more) of the questions above
  9. You must submit via the submission page (to appear shortly) before 31st March 2014


Submit Button

We hope you enjoy the Warwick Data Science Challenge 2014! Good luck!

All images used of this page are under a Creative Commons license, with full source details provided here.

The dataset is information on views of news stories from the BBC News Website (from the BBC Open RUM project). This covers information about 50,000 views from mobile devices, covering a span of just 10 minutes (on Tuesday October 29th 2013, to be precise). For each article view, there is a lot of information: the article that was viewed, the approximate location of the user, the type of device and operating system used, and the time taken to complete the request.