Ultra Mega Data Science Curriculum: Ugrad/25052017 - Dsa - Req - Ay1617 - v17-05 PDF
Ultra Mega Data Science Curriculum: Ugrad/25052017 - Dsa - Req - Ay1617 - v17-05 PDF
This list should not be taken as an exhaustive collection of all of topics which span the diverse
landscape of data science. There are many areas that are not covered, such as deep learning
or natural language processing. However, once a person has an understanding of the listed
subject matter, they should have the knowledge base to study additional topics which interest
them, or which they are required to understand for work or personal projects.
Being a good data scientist requires more than theoretical understanding of the subject matter.
Data science is learnt by doing data science. By this, I mean that people who are learning data
science engage in hands-on projects to solve real-world problems.
Example structure
Here is a good example of how a data science undergrad degree would be structured. I have
based it off of Nanyang Technological University’s, B.Sc. (Hons.) with a major in Data Science
and Analytics program. However, there are some alterations. The original structure can be
found here:
https://www.stat.nus.edu.sg/images/DSAP/UnderGrad_Dox/ProgRequirementPage-
Ugrad/25052017_dsa_req_ay1617_v17-05.pdf
This may give readers an idea of the required knowledge areas, and of how to structure their
self-study.
Level 1
- Programming
- Currently learning basics in R, Python with a focus on Python
- Statistics
- Developed an understanding of statistics material over the course of my college
academic career, however application of statistics needs improvement
- Linear algebra
- Newly developing skill
- Single variable calculus
- Newly developing skill
- Intro to computer science
- Course taken, includes python basics and data entry basics. Completed task
- Intro to data science
- Rmotr, currently finishing courses on Python and data science intro
Level 2
- Multivariable calculus
- Discrete maths
- Probability
- Data visualisation
- Numerical computation
- Data structures and algorithms
Level 3
- Machine learning
- Software engineering
- Database design and programming
- Big data
Level 4 - This is where things get interesting. Once a student has worked their way through the
three previous levels, they should possess a foundational understanding of data science. The
NTU degree a student is required to take six elective subjects and to complete a practical
project. Here is a sample of the elective subjects:
I would suggest that anyone following my curriculum do something similar. Pick a project which
relates to an area of interest, and pick some ‘extra credit’ subjects which complement the
project which you are working on. A list of potential courses can be found in the extra credit
section
Core skills
The sections below form subject areas that I consider the core skills of data science, you should
look at the courses in the section and decide which course best suits your learning style. There
are other learning resources which can help you learn these subjects, I would suggest looking
into them if you can’t find a resource which suits your particular learning style.
Not all of the courses have the same level of depth and breadth in their coverage of the subject.
Some of the shorter courses will be insufficient to cover the subject matter properly and thus a
follow up course should be taken. For example in the section on data structures and algorithms,
the Khan Academy course is excellent, however it should only be used as an introduction into
the topic; a follow up course should be take such as those offered by Stanford or the University
of Pennsylvania.
Many of the courses have significant overlap with other courses, for example 6.00.1x for
programming will overlap with introduction to computer science, thus if you did 6.00.1x you may
choose to skip the intro to computer science courses.
Additionally please be aware that some of the courses should be taken together, for example
the probability course I have listed by Purdue University is a two part series comprising of
Probability: Basic Concepts & Discrete Random Variables and its follow up course Probability:
Distribution Models & Continuous Random Variables.
Please be aware of the order of the subjects listed. While the subjects don’t strictly needed to be
taken sequentially, some subjects have prerequisites. For example one would need an
understanding of calculus before studying probability, similarly one should have an
understanding of programming, introductory level computer science, probability, and discrete
maths before studying data structures and algorithms.
Programing - For this section I have chosen to focus on Python courses, although the R
programming language is extremely popular for data science, I feel that python is a better
choice for the scope of this guide.
- https://www.edx.org/course/introduction-computer-science-mitx-6-00-1x-11
(If you choose to do this course I would highly recomend taking the follow up course -
https://www.edx.org/course/introduction-computational-thinking-data-mitx-6-00-2x-6)
- https://www.edx.org/course/introduction-computing-using-python-gtx-cs1301x
- https://www.codecademy.com/learn/learn-python
- https://docs.python.org/3.5/tutorial/
- https://www.udacity.com/course/programming-foundations-with-python--ud036
- https://www.udacity.com/course/introduction-to-python--ud1110
Statistics
- https://www.udemy.com/statshelp/
- https://lagunita.stanford.edu/courses/course-v1:OLI+ProbStat+Open_Jan2017/about
- https://www.coursera.org/specializations/statistics (This course is done in R)
- http://onlinestatbook.com/
Linear algebra
- https://www.edx.org/course/linear-algebra-foundations-frontiers-utaustinx-ut-5-05x-0
- https://www.edx.org/course/applications-linear-algebra-part-1-davidsonx-d003x-1
- https://www.edx.org/course/applications-linear-algebra-part-2-davidsonx-d003x-2
- https://www.udemy.com/linear-algebra-an-introduction/
- https://www.udemy.com/linear-algebra-for-beginners-open-doors-to-great-careers-2/
- https://www.khanacademy.org/math/linear-algebra
Multivariable calculus
- https://ocw.mit.edu/courses/mathematics/18-02sc-multivariable-calculus-fall-2010/
- https://www.udemy.com/calculus-3/
Discrete maths
- https://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-042j-
mathematics-for-computer-science-fall-2010/
- https://www.coursera.org/specializations/discrete-mathematics
Probability
- https://www.edx.org/course/introduction-probability-science-mitx-6-041x-2
- https://www.edx.org/course/probability-basic-concepts-discrete-purduex-416-1x-1
- https://www.edx.org/course/probability-distribution-models-purduex-416-2x-1
- https://www.khanacademy.org/math/statistics-probability/probability-library
Data visualisation
- https://www.edx.org/course/data-analysis-visualization-dashboard-delftx-ex102x-2#!
- https://www.udacity.com/course/data-analysis-and-visualization--ud404 (Requires R
programming, however the course is excelent)
- http://www.cs171.org/2017/index.html
- https://www.coursera.org/specializations/data-visualization
- http://vis.berkeley.edu/courses/cs294-10-sp11/wiki/index.php/CS294-10_Visualization
Numerical computation - This section was difficult, I wanted the learner to gain an
understanding for how computers can be used to perform numerical analysis. While neither of
the subjects span the topic of numerical analysis they will hopefully provide the student with
some exposure to the topic.
- https://www.edx.org/course/stochastic-processes-data-analysis-kyotoux-009x-0
- https://www.udacity.com/course/differential-equations-in-action--cs222
Data structures and algorithms - Note that this is also covered in the University of
Pennsylvania course in the Software engineering section
- https://lagunita.stanford.edu/courses/course-
v1:Engineering+Algorithms1+SelfPaced/about
- http://interactivepython.org/runestone/static/pythonds/index.html
Machine learning
- https://lagunita.stanford.edu/courses/HumanitiesSciences/StatLearning/Winter2016/abo
ut
- https://www.udemy.com/python-for-data-science-and-machine-learning-bootcamp/
- https://www.edx.org/course/learning-data-introductory-machine-caltechx-cs1156x-0
- https://www.coursera.org/learn/machine-learning
- https://www.udacity.com/course/machine-learning--ud262
Software engineering
- https://www.edx.org/course/software-development-fundamentals-pennx-sd1x
- https://www.edx.org/course/data-structures-software-design-pennx-sd2x
- https://www.edx.org/course/algorithm-design-analysis-pennx-sd3x
(All three courses above should be taken together, as they for a three part series run by
the University of Pennsylvania called Computer Science Essentials for Software
Development)
- https://www.edx.org/course/software-engineering-introduction-ubcx-softeng1x
- https://www.edx.org/course/software-engineering-essentials-tumx-seecx
- https://www.coursera.org/specializations/java-programming (Excelent course however it
will require you to learn Java)
Databases
- https://lagunita.stanford.edu/courses/Home/Databases/Engineering/about
- https://www.codecademy.com/learn/learn-sql
- https://www.udacity.com/course/database-systems-concepts-design--ud150
Big data - This is an area of data science which is becoming increasingly important however it
is difficult to find good online resources for the subject. Some of the books published by O'reilly
media such as Learning Spark: Lightning-Fast Big Data Analysis may serve as a useful guide in
lieu of a comprehensive online course.
- https://www.udemy.com/scala-and-spark-for-big-data-and-machine-learning/
- https://www.udemy.com/spark-and-python-for-big-data-with-pyspark/
- https://www.udacity.com/course/intro-to-hadoop-and-mapreduce--ud617
- https://www.udacity.com/course/deploying-a-hadoop-cluster--ud1000
Extra credit
Here are a collection of courses which I think are good, however they don’t fall neatly into one of
the categories which I have listed. If one wanted to gain a good understanding the field of data
science I would suggest that the reader take some of the courses listed below, and to try to fit
them into the structure which is outlined in the previous section.
- https://lagunita.stanford.edu/courses/Engineering/CVX101/Winter2014/about
- https://www.udacity.com/course/introduction-to-computer-vision--ud810
- https://www.udacity.com/course/reinforcement-learning--ud600
- https://www.udacity.com/course/machine-learning-for-trading--ud501
- https://www.udacity.com/course/data-wrangling-with-mongodb--ud032
- https://www.udacity.com/course/knowledge-based-ai-cognitive-systems--ud409
- https://www.udacity.com/course/data-analysis-and-visualization--ud404
- https://ocw.mit.edu/courses/media-arts-and-sciences/mas-622j-pattern-recognition-and-
analysis-fall-2006/
- https://ocw.mit.edu/courses/brain-and-cognitive-sciences/9-916-the-neural-basis-of-
visual-object-recognition-in-monkeys-and-humans-spring-2005/
- https://ocw.mit.edu/courses/brain-and-cognitive-sciences/9-591j-language-processing-
fall-2004/
- https://ocw.mit.edu/courses/brain-and-cognitive-sciences/9-520-a-networks-for-learning-
regression-and-classification-spring-2001/
- https://ocw.mit.edu/courses/brain-and-cognitive-sciences/9-520-statistical-learning-
theory-and-applications-spring-2006/
- https://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-867-
machine-learning-fall-2006/
- https://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-864-
advanced-natural-language-processing-fall-2005/
- https://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-863j-natural-
language-and-the-computer-representation-of-knowledge-spring-2003/
- https://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-825-
techniques-in-artificial-intelligence-sma-5504-fall-2002/
- https://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-801-
machine-vision-fall-2004/
- https://ocw.mit.edu/courses/mathematics/18-657-mathematics-of-machine-learning-fall-
2015/
- https://www.edx.org/course/robotics-vision-intelligence-machine-pennx-robo2x
- https://www.edx.org/course/analytics-edge-mitx-15-071x-3
- https://www.edx.org/course/developing-big-data-solutions-azure-microsoft-dat228x-0
- https://www.edx.org/course/processing-big-data-hadoop-azure-microsoft-dat202-1x-3
- https://www.edx.org/course/analyzing-big-data-microsoft-r-server-microsoft-dat213x-2
- https://www.edx.org/course/big-data-analytics-using-spark-uc-san-diegox-dse230x
- https://www.edx.org/course/introduction-nosql-data-solutions-microsoft-dat221x
- https://www.edx.org/course/implementing-real-time-analytics-hadoop-microsoft-dat202-
2x-3
- https://www.edx.org/course/implementing-predictive-analytics-spark-microsoft-dat202-
3x-2
- https://www.edx.org/course/introduction-apache-hadoop-linuxfoundationx-lfs103x
- https://www.coursera.org/specializations/jhu-data-science
- https://www.coursera.org/specializations/data-science-python
- https://www.coursera.org/specializations/data-mining
- https://www.coursera.org/specializations/big-data
- https://www.coursera.org/specializations/pwc-analytics
- https://www.coursera.org/specializations/data-science
- https://www.coursera.org/learn/linear-models
- https://www.coursera.org/learn/linear-models-2
- https://www.coursera.org/specializations/data-collection
- https://www.coursera.org/specializations/data-analysis
- https://www.coursera.org/learn/unix
- https://university.mongodb.com/courses/M101P/about
- https://lagunita.stanford.edu/courses/course-
v1:ComputerScience+MMDS+SelfPaced/about
- https://www.edx.org/course/statistical-modeling-regression-analysis-gtx-isye6414x