Download the free Kindle app and start reading Kindle books instantly on your smartphone, tablet, or computer - no Kindle device required.
Read instantly on your browser with Kindle for Web.
Using your mobile phone camera - scan the code below and download the Kindle app.
Follow the author
OK
Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython 2nd Edition
Get complete instructions for manipulating, processing, cleaning, and crunching datasets in Python. Updated for Python 3.6, the second edition of this hands-on guide is packed with practical case studies that show you how to solve a broad set of data analysis problems effectively. You’ll learn the latest versions of pandas, NumPy, IPython, and Jupyter in the process.
Written by Wes McKinney, the creator of the Python pandas project, this book is a practical, modern introduction to data science tools in Python. It’s ideal for analysts new to Python and for Python programmers new to data science and scientific computing. Data files and related material are available on GitHub.
- Use the IPython shell and Jupyter notebook for exploratory computing
- Learn basic and advanced features in NumPy (Numerical Python)
- Get started with data analysis tools in the pandas library
- Use flexible tools to load, clean, transform, merge, and reshape data
- Create informative visualizations with matplotlib
- Apply the pandas groupby facility to slice, dice, and summarize datasets
- Analyze and manipulate regular and irregular time series data
- Learn how to solve real-world data analysis problems with thorough, detailed examples
- ISBN-101491957662
- ISBN-13978-1491957660
- Edition2nd
- PublisherO'Reilly Media
- Publication dateNovember 14, 2017
- LanguageEnglish
- Dimensions7.25 x 1 x 9.5 inches
- Print length550 pages
Customers who viewed this item also viewed
Python for Data Analysis: Data Wrangling with pandas, NumPy, and JupyterPaperbackFREE Shipping by AmazonGet it as soon as Wednesday, Jun 10
Python Crash Course, 3rd Edition: A Hands-On, Project-Based Introduction to ProgrammingPaperbackFREE Shipping on orders over $35 shipped by AmazonGet it as soon as Wednesday, Jun 10
Data Science from Scratch: First Principles with PythonPaperbackFREE Shipping by AmazonGet it as soon as Wednesday, Jun 10
Python Data Science Handbook: Essential Tools for Working with DataPaperbackGet it as soon as Wednesday, Jun 24
Python Data Science Handbook: Essential Tools for Working with DataPaperbackFREE Shipping by AmazonGet it as soon as Wednesday, Jun 10
Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPythonPaperbackFREE ShippingGet it Jun 14 - 16Only 1 left in stock - order soon.
Customers also bought or read
- Python Data Science Handbook: Essential Tools for Working with Data
Paperback$58.90$58.90FREE delivery Jun 24 - 28 - Data Science from Scratch: First Principles with Python
Paperback$38.83$38.83FREE delivery Wed, Jun 10 - Introduction to Machine Learning with Python: A Guide for Data Scientists
Paperback$35.12$35.12FREE delivery Fri, Jun 12 - Python for Data Analysis: Data Wrangling with pandas, NumPy, and Jupyter
Paperback$43.99$43.99FREE delivery Wed, Jun 10 - Practical Statistics for Data Scientists: 50+ Essential Concepts Using R and Python
Paperback$45.25$45.25FREE delivery Wed, Jun 10 - R for Data Science: Import, Tidy, Transform, Visualize, and Model Data#1 Best SellerMathematical & Statistical Software
Paperback$44.99$44.99FREE delivery Wed, Jun 10 - Data Science for Business: What You Need to Know about Data Mining and Data-Analytic Thinking
Paperback$25.50$25.50Delivery Jun 15 - 16 - R for Data Science: Import, Tidy, Transform, Visualize, and Model Data
Paperback$40.10$40.10FREE delivery Wed, Jun 10 - Introducing Python: Modern Computing in Simple Packages
Paperback$50.92$50.92FREE delivery Wed, Jun 10 - Python Data Science Handbook: Essential Tools for Working with Data
Paperback$44.18$44.18FREE delivery Wed, Jun 10 - Mastering Machine Learning with scikit-learn - Second Edition: Apply effective learning algorithms to real-world problems using scikit-learn
Paperback$48.99$48.99FREE delivery Wed, Jun 10 - Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems
Paperback$99.99$99.99$7.90 delivery Jun 15 - 18 - Text Analytics with Python: A Practitioner's Guide to Natural Language Processing
Paperback$24.99$24.99Delivery Wed, Jun 10 - Python Crash Course, 3rd Edition: A Hands-On, Project-Based Introduction to Programming#1 Best SellerPython Programming
Paperback$27.53$27.53Delivery Wed, Jun 10 - Storytelling with Data: A Data Visualization Guide for Business Professionals#1 Best SellerInformation Management
Paperback$23.18$23.18Delivery Wed, Jun 10 - The Data Wrangling Workshop: Create your own actionable insights using data from multiple raw sources, 2nd Edition
Paperback$40.99$40.99FREE delivery Wed, Jun 10 - Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems
Paperback$49.50$49.50FREE delivery Wed, Jun 10 - The Man Who Solved the Market: How Jim Simons Launched the Quant Revolution
Hardcover$16.82$16.82Delivery Jun 13 - 15 - Essential Math for Data Science: Take Control of Your Data with Fundamental Linear Algebra, Probability, and Statistics
Paperback$37.10$37.10FREE delivery Wed, Jun 10 - Practical SQL, 2nd Edition: A Beginner's Guide to Storytelling with Data#1 Best SellerSQL
Paperback$19.99$19.99Delivery Wed, Jun 10 - Fluent Python: Clear, Concise, and Effective Programming
Paperback$62.74$62.74FREE delivery Jun 22 - 30 - Deep Learning (Adaptive Computation and Machine Learning series)
Hardcover$61.00$61.00FREE delivery Wed, Jun 10
From the brand
-
Explore more Data Science
-
Start learning with O'Reilly
-
More From O'Reilly
-
Sharing the knowledge of experts
O'Reilly's mission is to change the world by sharing the knowledge of innovators. For over 40 years, we've inspired companies and individuals to do new things (and do them better) by providing the skills and understanding that are necessary for success.
Our customers are hungry to build the innovations that propel the world forward. And we help them do just that.
From the Publisher
What Is This Book About?
This book is concerned with the nuts and bolts of manipulating, processing, cleaning, and crunching data in Python. My goal is to offer a guide to the parts of the Python programming language and its data-oriented library ecosystem and tools that will equip you to become an effective data analyst. While 'data analysis' is in the title of the book, the focus is specifically on Python programming, libraries, and tools as opposed to data analysis methodology. This is the Python programming you need for data analysis.
New for the Second Edition
The first edition of this book was published in 2012, during a time when open source data analysis libraries for Python (such as pandas) were very new and developing rapidly. In this updated and expanded second edition, I have overhauled the chapters to account both for incompatible changes and deprecations as well as new features that have occurred in the last five years.
I’ve also added fresh content to introduce tools that either did not exist in 2012 or had not matured enough to make the first cut. Finally, I have tried to avoid writing about new or cutting-edge open source projects that may not have had a chance to mature. I would like readers of this edition to find that the content is still almost as relevant in 2020 or 2021 as it is in 2017.
The major updates in this second edition include:
- All code, including the Python tutorial, updated for Python 3.6 (the first edition used Python 2.7)
- Updated Python install instructions for the Anaconda Python Distribution & other Python packages
- Updates for the latest versions of the pandas library in 2017
- A new chapter on some more advanced pandas tools, and some other usage tips
- A brief introduction to using statsmodels and scikit-learn
- Reorganized since from the first edition to make the book more accessible to newcomers.
Editorial Reviews
About the Author
Wes McKinney is a New York?based software developer and entrepreneur. After finishing his undergraduate degree in mathematics at MIT in 2007, he went on to do quantitative finance work at AQR Capital Management in Greenwich, CT. Frustrated by cumbersome data analysis tools, he learned Python and started building what would later become the pandas project. He's now an active member of the Python data community and is an advocate for the use of Python in data analysis, finance, and statistical computing applications.
Wes was later the co-founder and CEO of DataPad, whose technology assets and team were acquired by Cloudera in 2014. He has since become involved in big data technology, joining the Project Management Committees for the Apache Arrow and Apache Parquet projects in the Apache Software Foundation. In 2016, he joined Two Sigma Investments in New York City, where he continues working to make data analysis faster and easier through open source software.
Product details
- Publisher : O'Reilly Media
- Publication date : November 14, 2017
- Edition : 2nd
- Language : English
- Print length : 550 pages
- ISBN-10 : 1491957662
- ISBN-13 : 978-1491957660
- Item Weight : 1.85 pounds
- Dimensions : 7.25 x 1 x 9.5 inches
- Best Sellers Rank: #769,232 in Books (See Top 100 in Books)
- #220 in Data Modeling & Design (Books)
- #331 in Data Processing
- #594 in Python Programming
- Customer Reviews:
About the author

Since 2007, I have been creating fast, easy-to-use data wrangling and statistical computing tools, mostly in the Python programming language. I am best known for creating the pandas project and writing the book Python for Data Analysis. I am also a contributor to the Apache Arrow, Kudu, and Parquet projects within the Apache Software Foundation. I am currently the CTO and Co-founder of Voltron Data, which builds accelerated computing technologies powered by Apache Arrow. I previously worked for Ursa Labs (within RStudio / Posit), Two Sigma, Cloudera, DataPad, and AQR Capital Management.
Customer reviews
Customer Reviews, including Product Star Ratings help customers to learn more about the product and decide whether it is the right product for them.
To calculate the overall star rating and percentage breakdown by star, we don’t use a simple average. Instead, our system considers things like how recent a review is and if the reviewer bought the item on Amazon. It also analyzed reviews to verify trustworthiness.
Learn more how customers reviews work on AmazonCustomers say
Generated from the text of customer reviewsSelect to learn more
Reviews with images
Poor quality binding but great content.
Top reviews from the United States
- 5 out of 5 stars
Awesome Book to Gain Practical Data Skills with Python
Reviewed in the United States on April 5, 2019This book has been my foundation of using python as a data analyst.
This book primarily focuses on the pandas Python library, which is awesome at processing and organizing data (Python pandas is like MS Excel times 100. This is not an exaggeration). It also introduces the reader into numpy (lower level number crunching and arrays), matplotlib (data visualizations), scikitlearn (machine learning), and other useful data science libraries. The book contains other book recommendations for continuing education.
Although this would be a challenging book for a brand new Python user, I would still recommend it, especially if you are currently doing a lot of work in MS Excel and/ or exporting data from databases. I had a few false starts learning Python, and my biggest stumbling block was lack of application in what I was learning. This book puts practical tools in the reader's hands very quickly. I personally don't have time to make goofy games etc. that other books have used as practice examples. Despite other reviews criticizing the use of random data throughout the book, I found the examples easy to follow and useful. I would also argue that learning how to generate random data is useful in itself (thus the purpose of the numpy random library), and that there are practical examples throughout the book. Chapter 14 devoted to real-world data analysis examples.
I am almost finished with my second time through the book, this time working through every example. This book has been well worth the hours spent in it. For context, I previously relied on Excel, SQL, and some AutoHotKey. This book has significantly improved how I work.
Thanks, Wes and team.
36 people found this helpfulSending feedback...Sending feedback...HelpfulThank you for your feedback.Sorry, we failed to record your vote. Please try againThanks, we'll investigate in the next few days.Sorry, We failed to report this review. Please try again - 5 out of 5 stars
A slog, but well worth it
Reviewed in the United States on November 24, 2021I got this book when I was transitioning to doing data science with Python and was struggling to become familiar with standard tools. It's written by the creator of Pandas, and follows the style of the Pandas documentation: dense, telegraphic, peppered with examples.
It's hard work because Wes McKinney often does not articulate why you would need to do something (assuming you are already knowledgeable on the underlying process), and writes like an impatient person who would rather be doing something else. Additionally examples often suffer from being both too long and too short - too long in that almost every example is on a toy dataset created from scratch, too short in that most of those datasets have only 5 or 10 elements and do not always showcase complex operations. Other examples (particularly involving time series) have an overabundance of data that make the critical results hard to spot. Frankly, my first month with Pandas was a miserable one.
But I give the book 5 stars both because I came to love Pandas as I got more familiar with it, and because while McKinney is not fun to read, he does pack the book with useful information and it is (mostly) well organized. If anything it would benefit from being longer and with a more patient treatment of larger and more concrete datasets (eg the Titanic passenger dataset used in the Pandas documentation). The initial chapter on the basics of using Python could go - if you need this book, then you don't want to be trying to learn the rudiments of Python from it. If you can accept that you'll need a lot of bookmarks or margin notes to get through a rather steep learning curve, it will reward your persistence.
11 people found this helpfulSending feedback...Sending feedback...HelpfulThank you for your feedback.Sorry, we failed to record your vote. Please try againThanks, we'll investigate in the next few days.Sorry, We failed to report this review. Please try again - 4 out of 5 stars
Examplescould be improved.
Reviewed in the United States on January 26, 2019This book covers all of the basics that you would want to know to get started in programming in Python for data analysis, as the title implies, but it doesn't really offer compelling real-world examples. The data seem to be made up and the analyses don't go into enough detail to help you really learn how pandas and numpy work. Overall this is a decent starter book but you will have to bookmark the python and pandas documentation online if you want to have a reference to all of the functionality those tools have, and there are many places online where you can get better examples to learn from. If you haven't made your mind up about which tool to use for data analysis, I highly recommend checking out dplyr in R, which has an excellent free book online (R for data science, hadley wickham). I find it very easy to learn and it is much easier to set up R and RStudio than it is to set up Python, even though I love Python and Pandas.
13 people found this helpfulSending feedback...Sending feedback...HelpfulThank you for your feedback.Sorry, we failed to record your vote. Please try againThanks, we'll investigate in the next few days.Sorry, We failed to report this review. Please try again - 5 out of 5 stars
Very simple ,well designed practical book. Recommend to beginners and Intermediates
Reviewed in the United States on December 6, 2017This book gave me my first job. And I am still learning it. It is simple, talks some general idea why functions design like this, and introduces some practical functions. Because in real life real job you always need to look up documentation or to google certain functions, I think the idea why Wes makes functions/variables like this, and what he wants to develop in the future is very important. anyway, I think this book is for data analysis beginner and some intermediate users. I learned Python first so I recommend beginners who want to use Python for Data Analyst/Scientist to learn Python Programming first/simultaneously. At least understand lambda and python expressions, otherwise, you can't feel the full magic.
30 people found this helpfulSending feedback...Sending feedback...HelpfulThank you for your feedback.Sorry, we failed to record your vote. Please try againThanks, we'll investigate in the next few days.Sorry, We failed to report this review. Please try again - 5 out of 5 stars
Practical CS Classics for Data Science Age
Reviewed in the United States on July 21, 2019So far, this book has been an inspiring reading. It contains a huge number of data cleansing, transformation, analysis & etc. code snippets. The code is very clean and - for the most part - self-explaining (at least, for a seasoned software developer). The book step by step displays the motivations behind the design and functionality of center-piece Python modules - and you would not expect anything less from the original designer of Pandas. I feel this wonderful book being a natural extension of ageless Practical CS classics by Niklaus Wirth, Kernighan-Ritchie, and B. Stroustrup for Data Science Age.
Sending feedback...Sending feedback...HelpfulThank you for your feedback.Sorry, we failed to record your vote. Please try againThanks, we'll investigate in the next few days.Sorry, We failed to report this review. Please try again - 5 out of 5 stars
Great book to master Pandas
Reviewed in the United States on August 15, 2018I was looking for a book that could help me to learn python. I gave this book a try and I realized that the data analysis that I learnt from this book is pretty good from a pandas viewpoint (mostly).
It does explain about numpy, matplotlib and seaborn libraries, but most of the time is oriented from the pandas perspective.
Nevertheless, if you want to learn machine learning, numpy and other libraries, consider buying another book.
All in all, I liked the book because it teaches you and really well how to wrangle data. I only had wish it had more numpy and other libraries.
One person found this helpfulSending feedback...Sending feedback...HelpfulThank you for your feedback.Sorry, we failed to record your vote. Please try againThanks, we'll investigate in the next few days.Sorry, We failed to report this review. Please try again - 5 out of 5 stars
Written by the creator of Pandas
Reviewed in the United States on February 14, 2022Well written by the creator of Pandas. The author's copious use of code snippets to illustrate his points makes the material very usable. The snippets are short enough to type by hand so you get the frequent opportunity to play with the code and really understand the tools being presented. And Pandas is awesome!
One person found this helpfulSending feedback...Sending feedback...HelpfulThank you for your feedback.Sorry, we failed to record your vote. Please try againThanks, we'll investigate in the next few days.Sorry, We failed to report this review. Please try again - 5 out of 5 stars
Great introduction
Reviewed in the United States on May 16, 2019I am not a programmer, but have been trying to learn python for data analysis for a while. This book does a great job of explaining some basics that other books/programs tend to skip over. Also seems like python is even easier to work with now than it was just a few years ago. If you have tried to pick up these skills without success before, this book might be a good way to re-enter.
2 people found this helpfulSending feedback...Sending feedback...HelpfulThank you for your feedback.Sorry, we failed to record your vote. Please try againThanks, we'll investigate in the next few days.Sorry, We failed to report this review. Please try again
Top reviews from other countries
Kumar Saksham5 out of 5 starsQualtity of the product is Ausom (as discribed)
Reviewed in India on August 13, 2021The quality of the book is awesome (as described) quality of the packaging is awesome and book.
Nice book, covers all the topics gradually and thoroughly. Just started and liking it already. Will post another review after having read couple of chapters.
Sending feedback...Thanks, we'll investigate in the next few days.Sorry, We failed to report this review. Please try again
Jovial GBA-GOMBO5 out of 5 starsRapide et sûr
Reviewed in France on May 20, 2022Acquisition pour un perfectionnement en tant que Data Analyst
Sending feedback...Thanks, we'll investigate in the next few days.Sorry, We failed to report this review. Please try again
Tamer5 out of 5 starsBest pandas reference book
Reviewed in the United Arab Emirates on December 18, 2019This is the best reference I use for dealing with python, numpy and mainly pandas. Must have for anyone learning or using pandas. The author (who actually wrote pandas)style is into the point, clear and with simple examples that demonstrate the usage in real world.
Also this book has all the info to help you prepare data for sci-kit learn and tf .
Sending feedback...Thanks, we'll investigate in the next few days.Sorry, We failed to report this review. Please try again
Pedro Dias5 out of 5 starsMust have
Reviewed in Spain on January 8, 2021You must have this book if you want to learn Pandas and Data Science.
Sending feedback...Thanks, we'll investigate in the next few days.Sorry, We failed to report this review. Please try again
Lukas Wunderlich5 out of 5 starsHervorragendes Buch
Reviewed in Germany on October 12, 20191. Logisch aufgebaut. Es ist trotzdem möglich quer zu lesen.
2. Neben den behandelten Bibliotheken wird auch Python so vermittelt, dass man als Einsteiger das Wichtigste mitnimmt und als Fortgeschrittener dazu lernt.
3. Ich finde es nicht trocken und es geht nicht zu sehr in die Tiefe. Es wird so kompakt wie möglich das Nötige dargeboten.
Sending feedback...Thanks, we'll investigate in the next few days.Sorry, We failed to report this review. Please try again










![Computer Networking Bible: [3 in 1] The Complete Crash Course to Effectively Design, Implement and Manage Networks. Including Sections on Security, Performance and Scalability](https://m.media-amazon.com/images/I/41H4YJnxKgL._AC_SR100,100_QL65_.jpg)


