Ucl Summer School: Data Science and Big Data Analytics
Ucl Summer School: Data Science and Big Data Analytics
Key Information
Module Overview
Data Science is an exciting new area that combines scientific inquiry, statistical knowledge, substantive expertise,
and computer programming. One of the main challenges for businesses and policy makers when using big data
is to find people with the appropriate skills. Students taking this module will be introduced to the most
fundamental data analytic tools and techniques, and learn how to use specialised software to analyse real-world
data and answer policy-relevant questions.
Week One
• Principles of research design;
• Probability framework and statistical inference;
• Linear regression models.
Week Two
• Classification models;
• Resampling methods;
• Model selection.
Week Three
• Non-linear models;
• Tree-based models;
• Unsupervised learning;
Please note that this module description is indicative and may be subject to change.
1
• Unstructured data analysis.
Module Aims
This course aims to provide an introduction to the data science approach to the quantitative analysis of data
using the methods of statistical learning, an approach blending classical statistical methods with recent advances
in computational and machine learning. The course will cover the main analytical methods from this field with
hands-on applications using example datasets, so that students gain experience with and confidence in using
the methods we cover. It also covers data preparation and processing, including working with structured
databases, key-value formatted data (JSON), and unstructured textual data.
Teaching Methods
The module is fully taught online through a blend of daily live teaching sessions, guided learning activities,
structured group work, and independent learning.
Students will be expected to complete a series of learning activities before attending a live teaching session each
day. Learners will also be divided into study groups, which may meet live online and/or communicate on
discussion forums to help provide additional support for learning.
Teaching and learning will take place on UCL’s Virtual Learning Environment Moodle, where students will have
access to all teaching materials on demand, reading lists available through UCL’s online library resources, and
discussion forums.
Learning Outcomes
Upon successful completion of this module, students will:
• Have a sound understanding of the field of data science and develop the ability to analyse real-world
data using some of its main methods;
• Become comfortable applying regression models for continuous and limited outcome variables;
• Explore more complex models, such as the widely-used panel data models;
• Develop familiarity with descriptive and predictive analytics, and their application to big data problems;
• Explore methods of text analytics and automated data acquisition;
• Have received a solid foundation for more advanced or more specialised study.
Assessment Methods
• Practical assessments (50%)
• Final test (50%)
Key Texts
The primary texts are:
James et al. (2013) An Introduction to Statistical Learning: With applications in R. Springer. The book is available
from the authors’ page: http://www-bcf.usc.edu/~gareth/ISL/
Stock, James and Mark Watson. 2015. Introduction to Econometrics, 3rd edition (updated).
The following are supplemental texts which you may also find useful:
Please note that this module description is indicative and may be subject to change.
2
Conway, D. and White, J. (2012) Machine Learning for Hackers . O’Reilly Media.
Zumel, N. and Mount, J. (2014). Practical Data Science with R. Manning Publications.
Lantz, B. (2013). Machine Learning with R. Packt Publishing.
Please note that this module description is indicative and may be subject to change.
3
Module Lead
Dr Philip Lewis works in the Department of Cell and Developmental Biology at UCL but originally studied for his
PhD in the field of High Energy Physics. He worked on analysis of the massive datasets generated by the Tevatron
collider, and on the computing infrastructure needed to store, retrieve and analyse the data. For the last five
years he has worked in the field of Computational Biology, and currently helps to deliver the SysMIC course
which trains doctoral students across the UK in the computational skills increasingly necessary for cutting edge
biology research.
Please note that this module description is indicative and may be subject to change.
4