Buy New
-
To see product details, add this item to your cart.
Ships from: Amazon.com Sold by: Amazon.com
Save with Used - Good
-
To see product details, add this item to your cart.
Ships from: HPB-Red Sold by: HPB-Red
Sorry, there was a problem.
There was an error retrieving your Wish Lists. Please try again.Sorry, there was a problem.
List unavailable.
Download the free Kindle app and start reading Kindle books instantly on your smartphone, tablet, or computer - no Kindle device required.
Read instantly on your browser with Kindle for Web.
Using your mobile phone camera - scan the code below and download the Kindle app.
Data Science from Scratch: First Principles with Python
Purchase options and add-ons
To really learn data science, you should not only master the tools―data science libraries, frameworks, modules, and toolkits―but also understand the ideas and principles underlying them. Updated for Python 3.6, this second edition of Data Science from Scratch shows you how these tools and algorithms work by implementing them from scratch.
If you have an aptitude for mathematics and some programming skills, author Joel Grus will help you get comfortable with the math and statistics at the core of data science, and with the hacking skills you need to get started as a data scientist. Packed with new material on deep learning, statistics, and natural language processing, this updated book shows you how to find the gems in today’s messy glut of data.
- Get a crash course in Python
- Learn the basics of linear algebra, statistics, and probability―and how and when they’re used in data science
- Collect, explore, clean, munge, and manipulate data
- Dive into the fundamentals of machine learning
- Implement models such as k-nearest neighbors, Naïve Bayes, linear and logistic regression, decision trees, neural networks, and clustering
- Explore recommender systems, natural language processing, network analysis, MapReduce, and databases
- ISBN-101492041130
- ISBN-13978-1492041139
- Edition2nd
- PublisherO'Reilly Media
- Publication dateJune 11, 2019
- LanguageEnglish
- Dimensions6.9 x 0.9 x 9.1 inches
- Print length403 pages
Discover the latest buzz-worthy books, from mysteries and romance to humor and nonfiction. Explore more
Frequently bought together

Customers who viewed this item also viewed
Essential Math for Data Science: Take Control of Your Data with Fundamental Linear Algebra, Probability, and StatisticsPaperbackFREE Shipping by AmazonGet it as soon as Thursday, Jun 11
Python for Data Analysis: Data Wrangling with pandas, NumPy, and JupyterPaperbackFREE Shipping by AmazonGet it as soon as Thursday, Jun 11
Practical Statistics for Data Scientists: 50+ Essential Concepts Using R and PythonPaperbackFREE Shipping by AmazonGet it as soon as Thursday, Jun 11Only 17 left in stock (more on the way).
Python Data Science Handbook: Essential Tools for Working with DataPaperbackFREE Shipping by AmazonGet it as soon as Thursday, Jun 11
Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent SystemsPaperbackFREE Shipping by AmazonGet it as soon as Thursday, Jun 11
Python Crash Course, 3rd Edition: A Hands-On, Project-Based Introduction to ProgrammingPaperbackFREE Shipping on orders over $35 shipped by AmazonGet it as soon as Thursday, Jun 11
Customers also bought or read
- Essential Math for Data Science: Take Control of Your Data with Fundamental Linear Algebra, Probability, and Statistics
Paperback$37.10$37.10FREE delivery Thu, Jun 11 - Python Data Science Handbook: Essential Tools for Working with Data
Paperback$44.18$44.18FREE delivery Thu, Jun 11 - Practical Statistics for Data Scientists: 50+ Essential Concepts Using R and Python
Paperback$45.25$45.25FREE delivery Thu, Jun 11 - Python for Data Analysis: Data Wrangling with pandas, NumPy, and Jupyter
Paperback$43.99$43.99FREE delivery Thu, Jun 11 - Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems
Paperback$49.50$49.50FREE delivery Thu, Jun 11 - Introduction to Computation and Programming Using Python, third edition: With Application to Computational Modeling and Understanding Data
Paperback$75.00$75.00FREE delivery Thu, Jun 11 - SQL for Data Analysis: Advanced Techniques for Transforming Data into Insights
Paperback$36.49$36.49FREE delivery Thu, Jun 11 - The Hundred-Page Machine Learning Book (The Hundred-Page Books)
Paperback$34.94$34.94Delivery Thu, Jun 11 - Practical Linear Algebra for Data Science: From Core Concepts to Applications Using Python
Paperback$45.89$45.89FREE delivery Thu, Jun 11 - Introduction to Machine Learning with Python: A Guide for Data Scientists
Paperback$33.82$33.82Delivery Fri, Jun 12 - An Introduction to Statistical Learning: with Applications in Python (Springer Texts in Statistics)#1 Best SellerStatistics
Hardcover$82.59$82.59FREE delivery Jun 16 - 19 - Designing Machine Learning Systems: An Iterative Process for Production-Ready Applications
Paperback$40.00$40.00FREE delivery Thu, Jun 11 - Fundamentals of Data Engineering: Plan and Build Robust Data Systems
Paperback$43.99$43.99FREE delivery Thu, Jun 11 - Ace the Data Science Interview: 201 Real Interview Questions Asked By FAANG, Tech Startups, & Wall Street
Paperback$42.49$42.49FREE delivery Jun 13 - 14 - Python Data Science Handbook: Essential Tools for Working with Data
Paperback$53.52$53.52FREE delivery Jun 24 - 28 - Becoming a Data Head: How to Think, Speak, and Understand Data Science, Statistics, and Machine Learning
Paperback$22.30$22.30Delivery Thu, Jun 11 - Data Science for Business: What You Need to Know about Data Mining and Data-Analytic Thinking
Paperback$25.50$25.50Delivery Jun 14 - 16 - Machine Learning with PyTorch and Scikit-Learn: Develop machine learning and deep learning models with Python
Paperback$37.95$37.95FREE delivery Thu, Jun 11 - Fluent Python: Clear, Concise, and Effective Programming
Paperback$43.99$43.99FREE delivery Thu, Jun 11 - Hands-On Machine Learning with Scikit-Learn and PyTorch: Concepts, Tools, and Techniques to Build Intelligent Systems
Paperback$80.41$80.41FREE delivery Thu, Jun 11 - R for Data Science: Import, Tidy, Transform, Visualize, and Model Data#1 Best SellerMathematical & Statistical Software
Paperback$44.99$44.99FREE delivery Thu, Jun 11 - Data Science: The Hard Parts: Techniques for Excelling at Data Science
Paperback$37.75$37.75FREE delivery Thu, Jun 11 - Storytelling with Data: A Data Visualization Guide for Business Professionals#1 Best SellerInformation Management
Paperback$23.18$23.18Delivery Thu, Jun 11 - Hands-On Large Language Models: Language Understanding and Generation
Paperback$47.69$47.69FREE delivery Thu, Jun 11 - Mathematics of Machine Learning: Master linear algebra, calculus, and probability for machine learning
Paperback$50.99$50.99FREE delivery Thu, Jun 11 - Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython
Paperback$35.22$35.22$3.99 delivery Jun 12 - 29
From the brand
-
Explore more Data Science
-
Start learning with O'Reilly
-
More From O'Reilly
-
Sharing the knowledge of experts
O'Reilly's mission is to change the world by sharing the knowledge of innovators. For over 40 years, we've inspired companies and individuals to do new things (and do them better) by providing the skills and understanding that are necessary for success.
Our customers are hungry to build the innovations that propel the world forward. And we help them do just that.
Editorial Reviews
About the Author
Joel Grus is a research engineer at the Allen Institute for Artificial Intelligence. Previously he worked as a software engineer at Google and a data scientist at several startups. He lives in Seattle, where he regularly attends data science happy hours.
Product details
- Publisher : O'Reilly Media
- Publication date : June 11, 2019
- Edition : 2nd
- Language : English
- Print length : 403 pages
- ISBN-10 : 1492041130
- ISBN-13 : 978-1492041139
- Item Weight : 1.57 pounds
- Dimensions : 6.9 x 0.9 x 9.1 inches
- Best Sellers Rank: #73,679 in Books (See Top 100 in Books)
- #17 in Data Processing
- #18 in Data Mining (Books)
- #49 in Python Programming
- Customer Reviews:
About the author

Joel Grus is Principal Engineer at Capital Group, where he leads a small team that designs and implements machine learning and data products. Before that he was a software engineer at the Allen Institute for AI and Google, and a data scientist at a variety of startups.
He's the author of the the beloved "Data Science from Scratch", the quirky "Ten Essays on Fizz Buzz", and the polarizing JupyterCon talk "I Don't Like Notebooks".
He lives in Seattle, where he regularly attends data science happy hours. He blogs infrequently at joelgrus.com.
Customer reviews
Customer Reviews, including Product Star Ratings help customers to learn more about the product and decide whether it is the right product for them.
To calculate the overall star rating and percentage breakdown by star, we don’t use a simple average. Instead, our system considers things like how recent a review is and if the reviewer bought the item on Amazon. It also analyzed reviews to verify trustworthiness.
Learn more how customers reviews work on AmazonCustomers say
Generated from the text of customer reviewsSelect to learn more
Reviews with images
The BEST book for learning how many data science functions work under the hood - START HERE!
Top reviews from the United States
- 5 out of 5 stars
The BEST book for learning how many data science functions work under the hood - START HERE!
Reviewed in the United States on April 17, 2023Did you see something on the news about ChatGPT, Stable Diffusion, or some other big development that made you want to look into machine learning?
Maybe you truly plan on entering data science as a field but don't know where to start?
Or perhaps you've seen one of the author's brilliant/hilarious talks about why he doesn't like Jupyter Notebooks or how to answer the infamous "FizzBuzz" programming interview question using Tensorflow neural networks (seriously, look up Joel Grus on YouTube).
If you know a little bit of Python, a little bit of relevant math, and want to go into any data science or machine learning path, then this book is a must-have. It certainly won't be the only resource you'll need, but it helps you get the most out of other content you'll likely look into later (like how to code up a machine learning pipeline, or maybe a large language model if you're really adventurous).
Far too many machine learning lessons out there just tell you to import certain Python libraries (scikit-learn for example) and start using them without giving you any basic understanding of how those imported functions even work to begin with. Even to this day there are still college courses and coding bootcamps that ask you to download a Jupyter Notebook file and just hit "Shift + Enter" and look at the output.
You're not going to learn how to code that way!!!
Joel Grus does an excellent job of filling in this gap by teaching you more Python than what a statistics professional would usually know and more math than what a typical software developer would know. And that's key if you want to go into a field that relies on both.
All the information for Python and math that you need to get started is here. It's 27 chapters that get you familiar with Python and how to use it, as well as the math used in data science and ML (linear algebra, probability and statistics, algorithms, etc).
You eventually learn enough of both as you go through the chapters to start applying what you learn for some real-world usage.
I've had this book for years and it's still as useful as when it first came out, but the only exception I've seen is that the Twitter API tutorial in the book no longer applies to the paid format that Twitter now uses to access that feature. The tutorial is still good for learning how API's get put to use.
Once you've read this book and have gotten familiar with all it has to offer, your next step will probably involve looking into a book about how to actually use pre-built data science libraries (like what you find in the Anaconda distribution of Python).
This book may turn out to be heavily responsible for my first startup, but that's a story for later.
5 out of 5 starsThe BEST book for learning how many data science functions work under the hood - START HERE!
Reviewed in the United States on April 17, 2023Did you see something on the news about ChatGPT, Stable Diffusion, or some other big development that made you want to look into machine learning?
Maybe you truly plan on entering data science as a field but don't know where to start?
Or perhaps you've seen one of the author's brilliant/hilarious talks about why he doesn't like Jupyter Notebooks or how to answer the infamous "FizzBuzz" programming interview question using Tensorflow neural networks (seriously, look up Joel Grus on YouTube).
If you know a little bit of Python, a little bit of relevant math, and want to go into any data science or machine learning path, then this book is a must-have. It certainly won't be the only resource you'll need, but it helps you get the most out of other content you'll likely look into later (like how to code up a machine learning pipeline, or maybe a large language model if you're really adventurous).
Far too many machine learning lessons out there just tell you to import certain Python libraries (scikit-learn for example) and start using them without giving you any basic understanding of how those imported functions even work to begin with. Even to this day there are still college courses and coding bootcamps that ask you to download a Jupyter Notebook file and just hit "Shift + Enter" and look at the output.
You're not going to learn how to code that way!!!
Joel Grus does an excellent job of filling in this gap by teaching you more Python than what a statistics professional would usually know and more math than what a typical software developer would know. And that's key if you want to go into a field that relies on both.
All the information for Python and math that you need to get started is here. It's 27 chapters that get you familiar with Python and how to use it, as well as the math used in data science and ML (linear algebra, probability and statistics, algorithms, etc).
You eventually learn enough of both as you go through the chapters to start applying what you learn for some real-world usage.
I've had this book for years and it's still as useful as when it first came out, but the only exception I've seen is that the Twitter API tutorial in the book no longer applies to the paid format that Twitter now uses to access that feature. The tutorial is still good for learning how API's get put to use.
Once you've read this book and have gotten familiar with all it has to offer, your next step will probably involve looking into a book about how to actually use pre-built data science libraries (like what you find in the Anaconda distribution of Python).
This book may turn out to be heavily responsible for my first startup, but that's a story for later.
22 people found this helpfulSending feedback...Sending feedback...HelpfulThank you for your feedback.Sorry, we failed to record your vote. Please try againThanks, we'll investigate in the next few days.Sorry, We failed to report this review. Please try again - 5 out of 5 stars
Amazing introduction to Data Science
Reviewed in the United States on May 15, 2020Let me start this review by explaining clearly who this book is for: anyone who has had some form of introduction (even if concise) to programming in Python, algebra, statistics, and probability will find this book a great introduction to Data Science. While the author does a great job at having a crash course on these topics (and I even learned a thing or two here and there), I can see the contents being a bit overwhelming if this is your first point of contact with these subjects. However, should you meet the requirements I mentioned above, you'll find this book a breeze! Joel does a good job at explaining the topics using his signature brand of humor, keeping the read entertaining even in the most advanced areas. I'd even say that this is a must read if you are considering going into machine learning, since it teaches you a thing or two in the topic as well. Please keep in mind that the book is monochrome. If that bothers you, consider viewing the electronic version.
TLDR: If you're looking for a concise introduction to data science and have a bit of knowledge of basic Python, algebra, statistics and probability, look no further than this book! Otherwise, come back once you've picked up those tools and you'll feel right at home :)
30 people found this helpfulSending feedback...Sending feedback...HelpfulThank you for your feedback.Sorry, we failed to record your vote. Please try againThanks, we'll investigate in the next few days.Sorry, We failed to report this review. Please try again - 4 out of 5 stars
Good book for startes on AI/ML
Reviewed in the United States on December 26, 2020Good book for someone starting on learning basics of AI/ML
2 people found this helpfulSending feedback...Sending feedback...HelpfulThank you for your feedback.Sorry, we failed to record your vote. Please try againThanks, we'll investigate in the next few days.Sorry, We failed to report this review. Please try again - 5 out of 5 stars
Very very good book!
Reviewed in the United States on March 17, 2020This book is suitable for people with basic python programming skills. It is very good for beginners and advanced users alike. The codes are very clear and without errors. This book teaches you the basics and introduce some expert level topics for you to explore further if keen. If you are a novice data analyst and some harder topics throw you off, you should probably revisit the topics after you have gain more knowledge on data science.
I highly recommend this book as your first book into data science because the codes and thought processes are very clear. 70-80% of the book are data science foundation and basics for you to tackle harder topics later.
8 people found this helpfulSending feedback...Sending feedback...HelpfulThank you for your feedback.Sorry, we failed to record your vote. Please try againThanks, we'll investigate in the next few days.Sorry, We failed to report this review. Please try again - 5 out of 5 stars
Amazing book on Data Science
Reviewed in the United States on May 6, 2025Great book when you want to get into the field of Data Science
One person found this helpfulSending feedback...Sending feedback...HelpfulThank you for your feedback.Sorry, we failed to record your vote. Please try againThanks, we'll investigate in the next few days.Sorry, We failed to report this review. Please try again - 3 out of 5 stars
Great book about the how of Data Science, but not the why.
Reviewed in the United States on June 30, 2020In my personal opinion, this is a book to bridge the gap between an experienced mathematician/statician and practical machine learning. The book is more focused on describing how to implement mathematical formulas in Python than to actually explain the math behind it.
I am a proficient Python Engineer and I can read the code and understand what is being done, but the author makes no effort to explain how it reached to that conclusion, or why it matters. The author does implement the mathematical formulas in Python skillfully, it misses the point of the book though.
Another big problem with this book is that it assumes you can learn mathematics by just doing mathematics without understanding the why. It is frustrating to read and follow the author implement mathematical formulas without explaining why. I believe this is the case because the author DOES assume you have the required math to follow the book. I believe the author should add a section in the preface that list the prerequisites for this book:
- Linear Algebra
- Statistics
- Probability
- Vector Calculus
- Continuous Optimization
Above all, it is a good book if used as an index on where to start to understand Data Science, but it definitely doesn't fulfill the promise of being "from scratch". From scratch IMHO means you dive into the internals of Data Science algorithms. I had the expectation that this book was going to be more like "Designing Data-Intensive Applications" for Data Science where the "why" is as important as the "how". Data Science from scratch is a book about the how, with no effort to dive into the why. The book does provide the vocabulary for me to discuss Data Science with practitioners, but I didn't feel it got me any closer to becoming a practitioner myself.
BTW, the fact that book is monochrome doesn't matter the font and figures are very clear and readable.
73 people found this helpfulSending feedback...Sending feedback...HelpfulThank you for your feedback.Sorry, we failed to record your vote. Please try againThanks, we'll investigate in the next few days.Sorry, We failed to report this review. Please try again - 5 out of 5 stars
Great book, really means from scratch
Reviewed in the United States on July 28, 2021This is a great book. Doing everything from scratch and not just using numpy, sklearn, etc is a great way to learn what's really going on underneath. I'm surprised how far he gets along this path. By the end, you will have implemented a keras-like deep learning setup. It won't be fast enough for production use since it's all using Lists underneath, but you'll be able to see how it all fits together. Also, coming from a more typed language background, I loved the type annotations.
7 people found this helpfulSending feedback...Sending feedback...HelpfulThank you for your feedback.Sorry, we failed to record your vote. Please try againThanks, we'll investigate in the next few days.Sorry, We failed to report this review. Please try again - 5 out of 5 stars
Good Coverage of the "Bare Metal" of basic Data Science
Reviewed in the United States on August 18, 2019If you need a good broad brush to learn from, the second revision (in monochrome) is the book for you!
Yes, there is numpy, pandas, and a host of other packages and frameworks available to perform many of the examples of what is explained in the book. But you need to broaden your knowledge with this material that touches the "bare metal" of Data Science.
Excellent use is made of clear, concise verbiage to make things "black and white". (save the color images and other crutches for the board room stakeholders!).
32 people found this helpfulSending feedback...Sending feedback...HelpfulThank you for your feedback.Sorry, we failed to record your vote. Please try againThanks, we'll investigate in the next few days.Sorry, We failed to report this review. Please try again
Top reviews from other countries
Bal C.5 out of 5 starsBuen producto llego bien, solo no brilla mucho
Reviewed in Mexico on December 13, 2025Buen producto llego bien
Sending feedback...Thanks, we'll investigate in the next few days.Sorry, We failed to report this review. Please try again
Jalal Derakhshani5 out of 5 starsHighly recommended
Reviewed in Germany on April 6, 2026A must-read in this era.
Sending feedback...Thanks, we'll investigate in the next few days.Sorry, We failed to report this review. Please try again
Hans P. Heinrich5 out of 5 starsStart with this book right now!
Reviewed in Canada on August 13, 2019Joel's method of explaining is both entertaining and very useful
Sending feedback...Thanks, we'll investigate in the next few days.Sorry, We failed to report this review. Please try again
Debora Bonini2 out of 5 starsNot bad, but not good either
Reviewed in Italy on October 29, 2021The book is useful to grasp the basic concept behind data science. However it gets pretty messy as the topics become more complex, especially when the python code is shown without too much of explanations. If you need a book to learn python for data science, there are many other alternatives.
Sending feedback...Thanks, we'll investigate in the next few days.Sorry, We failed to report this review. Please try again
Velvetytoast5 out of 5 starsVery good ground up approach to the subject
Reviewed in the United Kingdom on January 25, 2020It’s definitely from the ground up - I found it useful to revisit the maths as well as seeing the code - well with the price of the book
Sending feedback...Thanks, we'll investigate in the next few days.Sorry, We failed to report this review. Please try again











