100% found this document useful (10 votes)
60 views83 pages

Hands On Machine Learning With Scikit Learn and TensorFlow Concepts Tools and Techniques To Build Intelligent Systems 1st Edition by Aurelien Geron ISBN 1491962291 9781491962299pdf Download

The document provides information about various editions of the book 'Hands-On Machine Learning with Scikit-Learn and TensorFlow' by Aurélien Géron, along with other related machine learning and cybersecurity resources. It includes links for instant access and download of the textbooks in multiple formats. The content covers fundamental concepts, tools, and techniques for building intelligent systems using machine learning.

Uploaded by

rugeroturang
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (10 votes)
60 views83 pages

Hands On Machine Learning With Scikit Learn and TensorFlow Concepts Tools and Techniques To Build Intelligent Systems 1st Edition by Aurelien Geron ISBN 1491962291 9781491962299pdf Download

The document provides information about various editions of the book 'Hands-On Machine Learning with Scikit-Learn and TensorFlow' by Aurélien Géron, along with other related machine learning and cybersecurity resources. It includes links for instant access and download of the textbooks in multiple formats. The content covers fundamental concepts, tools, and techniques for building intelligent systems using machine learning.

Uploaded by

rugeroturang
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 83

Hands On Machine Learning with Scikit Learn and

TensorFlow Concepts Tools and Techniques to


Build Intelligent Systems 1st Edition by
Aurelien Geron ISBN 1491962291 9781491962299
download
https://ebookball.com/product/hands-on-machine-learning-with-
scikit-learn-and-tensorflow-concepts-tools-and-techniques-to-
build-intelligent-systems-1st-edition-by-aurelien-geron-
isbn-1491962291-9781491962299-16116/

Instantly Access and Download Textbook at https://ebookball.com


Get Your Digital Files Instantly: PDF, ePub, MOBI and More
Quick Digital Downloads: PDF, ePub, MOBI and Other Formats

Hands On Machine Learning With Scikit Learn and TensorFlow Techniques


and Tools to Build Learning Machines 3rd Edition by OReilly Media ISBN
9781098122461 1098122461

https://ebookball.com/product/hands-on-machine-learning-with-
scikit-learn-and-tensorflow-techniques-and-tools-to-build-
learning-machines-3rd-edition-by-oreilly-media-
isbn-9781098122461-1098122461-15618/

Hands On Machine Learning With Scikit Learn and TensorFlow Techniques


and Tools to Build Learning Machines 1st Edition by Aurélien Géron
9352135210 9789352135219

https://ebookball.com/product/hands-on-machine-learning-with-
scikit-learn-and-tensorflow-techniques-and-tools-to-build-
learning-machines-1st-edition-by-aura-c-lien-ga-c-
ron-9352135210-9789352135219-16084/

Learning Malware Analysis Explore the Concepts Tools and Techniques to


Analyze and Investigate Windows Malware 1st edition by Monnappa ISBN
1788392507 978-1788392501

https://ebookball.com/product/learning-malware-analysis-explore-
the-concepts-tools-and-techniques-to-analyze-and-investigate-
windows-malware-1st-edition-by-monnappa-
isbn-1788392507-978-1788392501-16514/

Data Mining Practical Machine Learning Tools and Techniques 2nd


Edition by Ian Witten, Eibe Frank ISBN 0120884070 8670120884070

https://ebookball.com/product/data-mining-practical-machine-
learning-tools-and-techniques-2nd-edition-by-ian-witten-eibe-
frank-isbn-0120884070-8670120884070-17118/
Hands On Machine Learning for Cybersecurity Safeguard your system by
making your machines intelligent using the Python ecosystem 1st
edition by Soma Halder, Sinan Ozdemir 9781788990967 178899096X

https://ebookball.com/product/hands-on-machine-learning-for-
cybersecurity-safeguard-your-system-by-making-your-machines-
intelligent-using-the-python-ecosystem-1st-edition-by-soma-
halder-sinan-ozdemir-9781788990967-178899096x-18658/

Machine Learning for Business Analytics Concepts Techniques and


Applications with JMP Pro 2nd Edition by Galit Shmueli, Peter Bruce,
Mia Stephens, Muralidhara Anandamurthy, Nitin Patel ISBN 9781119903857
1119903858
https://ebookball.com/product/machine-learning-for-business-
analytics-concepts-techniques-and-applications-with-jmp-pro-2nd-
edition-by-galit-shmueli-peter-bruce-mia-stephens-muralidhara-
anandamurthy-nitin-patel-isbn-9781119903857/

Machine Learning for Cybersecurity Cookbook Over 80 recipes on how to


implement machine learning algorithms for building security systems
using Python 1st edition by Emmanuel Tsukerman 9781838556341
1838556346
https://ebookball.com/product/machine-learning-for-cybersecurity-
cookbook-over-80-recipes-on-how-to-implement-machine-learning-
algorithms-for-building-security-systems-using-python-1st-
edition-by-emmanuel-tsukerman-9781838556341-1/

Natural Language Processing With PyTorch Build Intelligent Language


Applications Using Deep Learning 1st edition by Delip Rao, Brian
McMahan 9781491978184 149197818X

https://ebookball.com/product/natural-language-processing-with-
pytorch-build-intelligent-language-applications-using-deep-
learning-1st-edition-by-delip-rao-brian-
mcmahan-9781491978184-149197818x-18728/

Machine Learning Step by Step Guide To Implement Machine Learning


Algorithms with Python 1st Edition by Rudolph Russell ISBN
9781719528405

https://ebookball.com/product/machine-learning-step-by-step-
guide-to-implement-machine-learning-algorithms-with-python-1st-
edition-by-rudolph-russell-isbn-9781719528405-16086/
Hands-On
Machine Learning
with Scikit-Learn
& TensorFlow
CONCEPTS, TOOLS, AND TECHNIQUES
TO BUILD INTELLIGENT SYSTEMS

Aurélien Géron
Hands-On Machine Learning with
Scikit-Learn and TensorFlow
Concepts, Tools, and Techniques to
Build Intelligent Systems

Aurélien Géron

Beijing Boston Farnham Sebastopol Tokyo


Hands-On Machine Learning with Scikit-Learn and TensorFlow
by Aurélien Géron
Copyright © 2017 Aurélien Géron. All rights reserved.
Printed in the United States of America.
Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472.
O’Reilly books may be purchased for educational, business, or sales promotional use. Online editions are
also available for most titles (http://oreilly.com/safari). For more information, contact our corporate/insti‐
tutional sales department: 800-998-9938 or [email protected].

Editor: Nicole Tache Indexer: Wendy Catalano


Production Editor: Nicholas Adams Interior Designer: David Futato
Copyeditor: Rachel Monaghan Cover Designer: Randy Comer
Proofreader: Charles Roumeliotis Illustrator: Rebecca Demarest

March 2017: First Edition

Revision History for the First Edition


2017-03-10: First Release

See http://oreilly.com/catalog/errata.csp?isbn=9781491962299 for release details.

The O’Reilly logo is a registered trademark of O’Reilly Media, Inc. Hands-On Machine Learning with
Scikit-Learn and TensorFlow, the cover image, and related trade dress are trademarks of O’Reilly Media,
Inc.
While the publisher and the author have used good faith efforts to ensure that the information and
instructions contained in this work are accurate, the publisher and the author disclaim all responsibility
for errors or omissions, including without limitation responsibility for damages resulting from the use of
or reliance on this work. Use of the information and instructions contained in this work is at your own
risk. If any code samples or other technology this work contains or describes is subject to open source
licenses or the intellectual property rights of others, it is your responsibility to ensure that your use
thereof complies with such licenses and/or rights.

978-1-491-96229-9
[LSI]
Table of Contents

Preface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii

Part I. The Fundamentals of Machine Learning


1. The Machine Learning Landscape. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
What Is Machine Learning? 4
Why Use Machine Learning? 4
Types of Machine Learning Systems 7
Supervised/Unsupervised Learning 8
Batch and Online Learning 14
Instance-Based Versus Model-Based Learning 17
Main Challenges of Machine Learning 22
Insufficient Quantity of Training Data 22
Nonrepresentative Training Data 24
Poor-Quality Data 25
Irrelevant Features 25
Overfitting the Training Data 26
Underfitting the Training Data 28
Stepping Back 28
Testing and Validating 29
Exercises 31

2. End-to-End Machine Learning Project. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33


Working with Real Data 33
Look at the Big Picture 35
Frame the Problem 35
Select a Performance Measure 37

iii
Check the Assumptions 40
Get the Data 40
Create the Workspace 40
Download the Data 43
Take a Quick Look at the Data Structure 45
Create a Test Set 49
Discover and Visualize the Data to Gain Insights 53
Visualizing Geographical Data 53
Looking for Correlations 55
Experimenting with Attribute Combinations 58
Prepare the Data for Machine Learning Algorithms 59
Data Cleaning 60
Handling Text and Categorical Attributes 62
Custom Transformers 64
Feature Scaling 65
Transformation Pipelines 66
Select and Train a Model 68
Training and Evaluating on the Training Set 68
Better Evaluation Using Cross-Validation 69
Fine-Tune Your Model 71
Grid Search 72
Randomized Search 74
Ensemble Methods 74
Analyze the Best Models and Their Errors 74
Evaluate Your System on the Test Set 75
Launch, Monitor, and Maintain Your System 76
Try It Out! 77
Exercises 77

3. Classification. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
MNIST 79
Training a Binary Classifier 82
Performance Measures 82
Measuring Accuracy Using Cross-Validation 83
Confusion Matrix 84
Precision and Recall 86
Precision/Recall Tradeoff 87
The ROC Curve 91
Multiclass Classification 93
Error Analysis 96
Multilabel Classification 100
Multioutput Classification 101

iv | Table of Contents
Exercises 102

4. Training Models. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105


Linear Regression 106
The Normal Equation 108
Computational Complexity 110
Gradient Descent 111
Batch Gradient Descent 114
Stochastic Gradient Descent 117
Mini-batch Gradient Descent 119
Polynomial Regression 121
Learning Curves 123
Regularized Linear Models 127
Ridge Regression 127
Lasso Regression 130
Elastic Net 132
Early Stopping 133
Logistic Regression 134
Estimating Probabilities 134
Training and Cost Function 135
Decision Boundaries 136
Softmax Regression 139
Exercises 142

5. Support Vector Machines. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145


Linear SVM Classification 145
Soft Margin Classification 146
Nonlinear SVM Classification 149
Polynomial Kernel 150
Adding Similarity Features 151
Gaussian RBF Kernel 152
Computational Complexity 153
SVM Regression 154
Under the Hood 156
Decision Function and Predictions 156
Training Objective 157
Quadratic Programming 159
The Dual Problem 160
Kernelized SVM 161
Online SVMs 164
Exercises 165

Table of Contents | v
6. Decision Trees. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
Training and Visualizing a Decision Tree 167
Making Predictions 169
Estimating Class Probabilities 171
The CART Training Algorithm 171
Computational Complexity 172
Gini Impurity or Entropy? 172
Regularization Hyperparameters 173
Regression 175
Instability 177
Exercises 178

7. Ensemble Learning and Random Forests. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181


Voting Classifiers 181
Bagging and Pasting 185
Bagging and Pasting in Scikit-Learn 186
Out-of-Bag Evaluation 187
Random Patches and Random Subspaces 188
Random Forests 189
Extra-Trees 190
Feature Importance 190
Boosting 191
AdaBoost 192
Gradient Boosting 195
Stacking 200
Exercises 202

8. Dimensionality Reduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205


The Curse of Dimensionality 206
Main Approaches for Dimensionality Reduction 207
Projection 207
Manifold Learning 210
PCA 211
Preserving the Variance 211
Principal Components 212
Projecting Down to d Dimensions 213
Using Scikit-Learn 214
Explained Variance Ratio 214
Choosing the Right Number of Dimensions 215
PCA for Compression 216
Incremental PCA 217
Randomized PCA 218

vi | Table of Contents
Kernel PCA 218
Selecting a Kernel and Tuning Hyperparameters 219
LLE 221
Other Dimensionality Reduction Techniques 223
Exercises 224

Part II. Neural Networks and Deep Learning


9. Up and Running with TensorFlow. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229
Installation 232
Creating Your First Graph and Running It in a Session 232
Managing Graphs 234
Lifecycle of a Node Value 235
Linear Regression with TensorFlow 235
Implementing Gradient Descent 237
Manually Computing the Gradients 237
Using autodiff 238
Using an Optimizer 239
Feeding Data to the Training Algorithm 239
Saving and Restoring Models 241
Visualizing the Graph and Training Curves Using TensorBoard 242
Name Scopes 245
Modularity 246
Sharing Variables 248
Exercises 251

10. Introduction to Artificial Neural Networks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253


From Biological to Artificial Neurons 254
Biological Neurons 255
Logical Computations with Neurons 256
The Perceptron 257
Multi-Layer Perceptron and Backpropagation 261
Training an MLP with TensorFlow’s High-Level API 264
Training a DNN Using Plain TensorFlow 265
Construction Phase 265
Execution Phase 269
Using the Neural Network 270
Fine-Tuning Neural Network Hyperparameters 270
Number of Hidden Layers 270
Number of Neurons per Hidden Layer 272
Activation Functions 272

Table of Contents | vii


Exercises 273

11. Training Deep Neural Nets. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275


Vanishing/Exploding Gradients Problems 275
Xavier and He Initialization 277
Nonsaturating Activation Functions 279
Batch Normalization 282
Gradient Clipping 286
Reusing Pretrained Layers 286
Reusing a TensorFlow Model 287
Reusing Models from Other Frameworks 288
Freezing the Lower Layers 289
Caching the Frozen Layers 290
Tweaking, Dropping, or Replacing the Upper Layers 290
Model Zoos 291
Unsupervised Pretraining 291
Pretraining on an Auxiliary Task 292
Faster Optimizers 293
Momentum optimization 294
Nesterov Accelerated Gradient 295
AdaGrad 296
RMSProp 298
Adam Optimization 298
Learning Rate Scheduling 300
Avoiding Overfitting Through Regularization 302
Early Stopping 303
ℓ1 and ℓ2 Regularization 303
Dropout 304
Max-Norm Regularization 307
Data Augmentation 309
Practical Guidelines 310
Exercises 311

12. Distributing TensorFlow Across Devices and Servers. . . . . . . . . . . . . . . . . . . . . . . . . . . 313


Multiple Devices on a Single Machine 314
Installation 314
Managing the GPU RAM 317
Placing Operations on Devices 318
Parallel Execution 321
Control Dependencies 323
Multiple Devices Across Multiple Servers 323
Opening a Session 325

viii | Table of Contents


The Master and Worker Services 325
Pinning Operations Across Tasks 326
Sharding Variables Across Multiple Parameter Servers 327
Sharing State Across Sessions Using Resource Containers 328
Asynchronous Communication Using TensorFlow Queues 329
Loading Data Directly from the Graph 335
Parallelizing Neural Networks on a TensorFlow Cluster 342
One Neural Network per Device 342
In-Graph Versus Between-Graph Replication 343
Model Parallelism 345
Data Parallelism 347
Exercises 352

13. Convolutional Neural Networks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 353


The Architecture of the Visual Cortex 354
Convolutional Layer 355
Filters 357
Stacking Multiple Feature Maps 358
TensorFlow Implementation 360
Memory Requirements 362
Pooling Layer 363
CNN Architectures 365
LeNet-5 366
AlexNet 367
GoogLeNet 368
ResNet 372
Exercises 376

14. Recurrent Neural Networks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 379


Recurrent Neurons 380
Memory Cells 382
Input and Output Sequences 382
Basic RNNs in TensorFlow 384
Static Unrolling Through Time 385
Dynamic Unrolling Through Time 387
Handling Variable Length Input Sequences 387
Handling Variable-Length Output Sequences 388
Training RNNs 389
Training a Sequence Classifier 389
Training to Predict Time Series 392
Creative RNN 396
Deep RNNs 396

Table of Contents | ix
Distributing a Deep RNN Across Multiple GPUs 397
Applying Dropout 399
The Difficulty of Training over Many Time Steps 400
LSTM Cell 401
Peephole Connections 403
GRU Cell 404
Natural Language Processing 405
Word Embeddings 405
An Encoder–Decoder Network for Machine Translation 407
Exercises 410

15. Autoencoders. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 411


Efficient Data Representations 412
Performing PCA with an Undercomplete Linear Autoencoder 413
Stacked Autoencoders 415
TensorFlow Implementation 416
Tying Weights 417
Training One Autoencoder at a Time 418
Visualizing the Reconstructions 420
Visualizing Features 421
Unsupervised Pretraining Using Stacked Autoencoders 422
Denoising Autoencoders 424
TensorFlow Implementation 425
Sparse Autoencoders 426
TensorFlow Implementation 427
Variational Autoencoders 428
Generating Digits 431
Other Autoencoders 432
Exercises 433

16. Reinforcement Learning. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 437


Learning to Optimize Rewards 438
Policy Search 440
Introduction to OpenAI Gym 441
Neural Network Policies 444
Evaluating Actions: The Credit Assignment Problem 447
Policy Gradients 448
Markov Decision Processes 453
Temporal Difference Learning and Q-Learning 457
Exploration Policies 459
Approximate Q-Learning 460
Learning to Play Ms. Pac-Man Using Deep Q-Learning 460

x | Table of Contents
Exercises 469
Thank You! 470

A. Exercise Solutions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 471

B. Machine Learning Project Checklist. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 497

C. SVM Dual Problem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 503

D. Autodiff. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 507

E. Other Popular ANN Architectures. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 515

Index. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 525

Table of Contents | xi
Preface

The Machine Learning Tsunami


In 2006, Geoffrey Hinton et al. published a paper1 showing how to train a deep neural
network capable of recognizing handwritten digits with state-of-the-art precision
(>98%). They branded this technique “Deep Learning.” Training a deep neural net
was widely considered impossible at the time,2 and most researchers had abandoned
the idea since the 1990s. This paper revived the interest of the scientific community
and before long many new papers demonstrated that Deep Learning was not only
possible, but capable of mind-blowing achievements that no other Machine Learning
(ML) technique could hope to match (with the help of tremendous computing power
and great amounts of data). This enthusiasm soon extended to many other areas of
Machine Learning.
Fast-forward 10 years and Machine Learning has conquered the industry: it is now at
the heart of much of the magic in today’s high-tech products, ranking your web
search results, powering your smartphone’s speech recognition, and recommending
videos, beating the world champion at the game of Go. Before you know it, it will be
driving your car.

Machine Learning in Your Projects


So naturally you are excited about Machine Learning and you would love to join the
party!
Perhaps you would like to give your homemade robot a brain of its own? Make it rec‐
ognize faces? Or learn to walk around?

1 Available on Hinton’s home page at http://www.cs.toronto.edu/~hinton/.


2 Despite the fact that Yann Lecun’s deep convolutional neural networks had worked well for image recognition
since the 1990s, although they were not as general purpose.

xiii
Or maybe your company has tons of data (user logs, financial data, production data,
machine sensor data, hotline stats, HR reports, etc.), and more than likely you could
unearth some hidden gems if you just knew where to look; for example:

• Segment customers and find the best marketing strategy for each group
• Recommend products for each client based on what similar clients bought
• Detect which transactions are likely to be fraudulent
• Predict next year’s revenue
• And more

Whatever the reason, you have decided to learn Machine Learning and implement it
in your projects. Great idea!

Objective and Approach


This book assumes that you know close to nothing about Machine Learning. Its goal
is to give you the concepts, the intuitions, and the tools you need to actually imple‐
ment programs capable of learning from data.
We will cover a large number of techniques, from the simplest and most commonly
used (such as linear regression) to some of the Deep Learning techniques that regu‐
larly win competitions.
Rather than implementing our own toy versions of each algorithm, we will be using
actual production-ready Python frameworks:

• Scikit-Learn is very easy to use, yet it implements many Machine Learning algo‐
rithms efficiently, so it makes for a great entry point to learn Machine Learning.
• TensorFlow is a more complex library for distributed numerical computation
using data flow graphs. It makes it possible to train and run very large neural net‐
works efficiently by distributing the computations across potentially thousands
of multi-GPU servers. TensorFlow was created at Google and supports many of
their large-scale Machine Learning applications. It was open-sourced in Novem‐
ber 2015.

The book favors a hands-on approach, growing an intuitive understanding of


Machine Learning through concrete working examples and just a little bit of theory.
While you can read this book without picking up your laptop, we highly recommend
you experiment with the code examples available online as Jupyter notebooks at
https://github.com/ageron/handson-ml.

xiv | Preface
Prerequisites
This book assumes that you have some Python programming experience and that you
are familiar with Python’s main scientific libraries, in particular NumPy, Pandas, and
Matplotlib.
Also, if you care about what’s under the hood you should have a reasonable under‐
standing of college-level math as well (calculus, linear algebra, probabilities, and sta‐
tistics).
If you don’t know Python yet, http://learnpython.org/ is a great place to start. The offi‐
cial tutorial on python.org is also quite good.
If you have never used Jupyter, Chapter 2 will guide you through installation and the
basics: it is a great tool to have in your toolbox.
If you are not familiar with Python’s scientific libraries, the provided Jupyter note‐
books include a few tutorials. There is also a quick math tutorial for linear algebra.

Roadmap
This book is organized in two parts. Part I, The Fundamentals of Machine Learning,
covers the following topics:

• What is Machine Learning? What problems does it try to solve? What are the
main categories and fundamental concepts of Machine Learning systems?
• The main steps in a typical Machine Learning project.
• Learning by fitting a model to data.
• Optimizing a cost function.
• Handling, cleaning, and preparing data.
• Selecting and engineering features.
• Selecting a model and tuning hyperparameters using cross-validation.
• The main challenges of Machine Learning, in particular underfitting and overfit‐
ting (the bias/variance tradeoff).
• Reducing the dimensionality of the training data to fight the curse of dimension‐
ality.
• The most common learning algorithms: Linear and Polynomial Regression,
Logistic Regression, k-Nearest Neighbors, Support Vector Machines, Decision
Trees, Random Forests, and Ensemble methods.

Preface | xv
Part II, Neural Networks and Deep Learning, covers the following topics:

• What are neural nets? What are they good for?


• Building and training neural nets using TensorFlow.
• The most important neural net architectures: feedforward neural nets, convolu‐
tional nets, recurrent nets, long short-term memory (LSTM) nets, and autoen‐
coders.
• Techniques for training deep neural nets.
• Scaling neural networks for huge datasets.
• Reinforcement learning.

The first part is based mostly on Scikit-Learn while the second part uses TensorFlow.

Don’t jump into deep waters too hastily: while Deep Learning is no
doubt one of the most exciting areas in Machine Learning, you
should master the fundamentals first. Moreover, most problems
can be solved quite well using simpler techniques such as Random
Forests and Ensemble methods (discussed in Part I). Deep Learn‐
ing is best suited for complex problems such as image recognition,
speech recognition, or natural language processing, provided you
have enough data, computing power, and patience.

Other Resources
Many resources are available to learn about Machine Learning. Andrew Ng’s ML
course on Coursera and Geoffrey Hinton’s course on neural networks and Deep
Learning are amazing, although they both require a significant time investment
(think months).
There are also many interesting websites about Machine Learning, including of
course Scikit-Learn’s exceptional User Guide. You may also enjoy Dataquest, which
provides very nice interactive tutorials, and ML blogs such as those listed on Quora.
Finally, the Deep Learning website has a good list of resources to learn more.
Of course there are also many other introductory books about Machine Learning, in
particular:

• Joel Grus, Data Science from Scratch (O’Reilly). This book presents the funda‐
mentals of Machine Learning, and implements some of the main algorithms in
pure Python (from scratch, as the name suggests).
• Stephen Marsland, Machine Learning: An Algorithmic Perspective (Chapman and
Hall). This book is a great introduction to Machine Learning, covering a wide

xvi | Preface
range of topics in depth, with code examples in Python (also from scratch, but
using NumPy).
• Sebastian Raschka, Python Machine Learning (Packt Publishing). Also a great
introduction to Machine Learning, this book leverages Python open source libra‐
ries (Pylearn 2 and Theano).
• Yaser S. Abu-Mostafa, Malik Magdon-Ismail, and Hsuan-Tien Lin, Learning from
Data (AMLBook). A rather theoretical approach to ML, this book provides deep
insights, in particular on the bias/variance tradeoff (see Chapter 4).
• Stuart Russell and Peter Norvig, Artificial Intelligence: A Modern Approach, 3rd
Edition (Pearson). This is a great (and huge) book covering an incredible amount
of topics, including Machine Learning. It helps put ML into perspective.

Finally, a great way to learn is to join ML competition websites such as Kaggle.com


this will allow you to practice your skills on real-world problems, with help and
insights from some of the best ML professionals out there.

Conventions Used in This Book


The following typographical conventions are used in this book:
Italic
Indicates new terms, URLs, email addresses, filenames, and file extensions.
Constant width
Used for program listings, as well as within paragraphs to refer to program ele‐
ments such as variable or function names, databases, data types, environment
variables, statements and keywords.
Constant width bold
Shows commands or other text that should be typed literally by the user.
Constant width italic
Shows text that should be replaced with user-supplied values or by values deter‐
mined by context.

This element signifies a tip or suggestion.

Preface | xvii
This element signifies a general note.

This element indicates a warning or caution.

Using Code Examples


Supplemental material (code examples, exercises, etc.) is available for download at
https://github.com/ageron/handson-ml.
This book is here to help you get your job done. In general, if example code is offered
with this book, you may use it in your programs and documentation. You do not
need to contact us for permission unless you’re reproducing a significant portion of
the code. For example, writing a program that uses several chunks of code from this
book does not require permission. Selling or distributing a CD-ROM of examples
from O’Reilly books does require permission. Answering a question by citing this
book and quoting example code does not require permission. Incorporating a signifi‐
cant amount of example code from this book into your product’s documentation does
require permission.
We appreciate, but do not require, attribution. An attribution usually includes the
title, author, publisher, and ISBN. For example: “Hands-On Machine Learning with
Scikit-Learn and TensorFlow by Aurélien Géron (O’Reilly). Copyright 2017 Aurélien
Géron, 978-1-491-96229-9.”
If you feel your use of code examples falls outside fair use or the permission given
above, feel free to contact us at [email protected].

O’Reilly Safari
Safari (formerly Safari Books Online) is a membership-based
training and reference platform for enterprise, government,
educators, and individuals.

Members have access to thousands of books, training videos, Learning Paths, interac‐
tive tutorials, and curated playlists from over 250 publishers, including O’Reilly
Media, Harvard Business Review, Prentice Hall Professional, Addison-Wesley Profes‐
sional, Microsoft Press, Sams, Que, Peachpit Press, Adobe, Focal Press, Cisco Press,

xviii | Preface
John Wiley & Sons, Syngress, Morgan Kaufmann, IBM Redbooks, Packt, Adobe
Press, FT Press, Apress, Manning, New Riders, McGraw-Hill, Jones & Bartlett, and
Course Technology, among others.
For more information, please visit http://oreilly.com/safari.

How to Contact Us
Please address comments and questions concerning this book to the publisher:

O’Reilly Media, Inc.


1005 Gravenstein Highway North
Sebastopol, CA 95472
800-998-9938 (in the United States or Canada)
707-829-0515 (international or local)
707-829-0104 (fax)

We have a web page for this book, where we list errata, examples, and any additional
information. You can access this page at http://bit.ly/hands-on-machine-learning-
with-scikit-learn-and-tensorflow.
To comment or ask technical questions about this book, send email to bookques‐
[email protected].
For more information about our books, courses, conferences, and news, see our web‐
site at http://www.oreilly.com.
Find us on Facebook: http://facebook.com/oreilly
Follow us on Twitter: http://twitter.com/oreillymedia
Watch us on YouTube: http://www.youtube.com/oreillymedia

Acknowledgments
I would like to thank my Google colleagues, in particular the YouTube video classifi‐
cation team, for teaching me so much about Machine Learning. I could never have
started this project without them. Special thanks to my personal ML gurus: Clément
Courbet, Julien Dubois, Mathias Kende, Daniel Kitachewsky, James Pack, Alexander
Pak, Anosh Raj, Vitor Sessak, Wiktor Tomczak, Ingrid von Glehn, Rich Washington,
and everyone at YouTube Paris.
I am incredibly grateful to all the amazing people who took time out of their busy
lives to review my book in so much detail. Thanks to Pete Warden for answering all
my TensorFlow questions, reviewing Part II, providing many interesting insights, and
of course for being part of the core TensorFlow team. You should definitely check out

Preface | xix
his blog! Many thanks to Lukas Biewald for his very thorough review of Part II: he left
no stone unturned, tested all the code (and caught a few errors), made many great
suggestions, and his enthusiasm was contagious. You should check out his blog and
his cool robots! Thanks to Justin Francis, who also reviewed Part II very thoroughly,
catching errors and providing great insights, in particular in Chapter 16. Check out
his posts on TensorFlow!
Huge thanks as well to David Andrzejewski, who reviewed Part I and provided
incredibly useful feedback, identifying unclear sections and suggesting how to
improve them. Check out his website! Thanks to Grégoire Mesnil, who reviewed
Part II and contributed very interesting practical advice on training neural networks.
Thanks as well to Eddy Hung, Salim Sémaoune, Karim Matrah, Ingrid von Glehn,
Iain Smears, and Vincent Guilbeau for reviewing Part I and making many useful sug‐
gestions. And I also wish to thank my father-in-law, Michel Tessier, former mathe‐
matics teacher and now a great translator of Anton Chekhov, for helping me iron out
some of the mathematics and notations in this book and reviewing the linear algebra
Jupyter notebook.
And of course, a gigantic “thank you” to my dear brother Sylvain, who reviewed every
single chapter, tested every line of code, provided feedback on virtually every section,
and encouraged me from the first line to the last. Love you, bro!
Many thanks as well to O’Reilly’s fantastic staff, in particular Nicole Tache, who gave
me insightful feedback, always cheerful, encouraging, and helpful. Thanks as well to
Marie Beaugureau, Ben Lorica, Mike Loukides, and Laurel Ruma for believing in this
project and helping me define its scope. Thanks to Matt Hacker and all of the Atlas
team for answering all my technical questions regarding formatting, asciidoc, and
LaTeX, and thanks to Rachel Monaghan, Nick Adams, and all of the production team
for their final review and their hundreds of corrections.
Last but not least, I am infinitely grateful to my beloved wife, Emmanuelle, and to our
three wonderful kids, Alexandre, Rémi, and Gabrielle, for encouraging me to work
hard on this book, asking many questions (who said you can’t teach neural networks
to a seven-year-old?), and even bringing me cookies and coffee. What more can one
dream of?

xx | Preface
PART I
The Fundamentals of
Machine Learning
CHAPTER 1
The Machine Learning Landscape

When most people hear “Machine Learning,” they picture a robot: a dependable but‐
ler or a deadly Terminator depending on who you ask. But Machine Learning is not
just a futuristic fantasy, it’s already here. In fact, it has been around for decades in
some specialized applications, such as Optical Character Recognition (OCR). But the
first ML application that really became mainstream, improving the lives of hundreds
of millions of people, took over the world back in the 1990s: it was the spam filter.
Not exactly a self-aware Skynet, but it does technically qualify as Machine Learning
(it has actually learned so well that you seldom need to flag an email as spam any‐
more). It was followed by hundreds of ML applications that now quietly power hun‐
dreds of products and features that you use regularly, from better recommendations
to voice search.
Where does Machine Learning start and where does it end? What exactly does it
mean for a machine to learn something? If I download a copy of Wikipedia, has my
computer really “learned” something? Is it suddenly smarter? In this chapter we will
start by clarifying what Machine Learning is and why you may want to use it.
Then, before we set out to explore the Machine Learning continent, we will take a
look at the map and learn about the main regions and the most notable landmarks:
supervised versus unsupervised learning, online versus batch learning, instance-
based versus model-based learning. Then we will look at the workflow of a typical ML
project, discuss the main challenges you may face, and cover how to evaluate and
fine-tune a Machine Learning system.
This chapter introduces a lot of fundamental concepts (and jargon) that every data
scientist should know by heart. It will be a high-level overview (the only chapter
without much code), all rather simple, but you should make sure everything is
crystal-clear to you before continuing to the rest of the book. So grab a coffee and let’s
get started!

3
If you already know all the Machine Learning basics, you may want
to skip directly to Chapter 2. If you are not sure, try to answer all
the questions listed at the end of the chapter before moving on.

What Is Machine Learning?


Machine Learning is the science (and art) of programming computers so they can
learn from data.
Here is a slightly more general definition:
[Machine Learning is the] field of study that gives computers the ability to learn
without being explicitly programmed.
—Arthur Samuel, 1959

And a more engineering-oriented one:


A computer program is said to learn from experience E with respect to some task T
and some performance measure P, if its performance on T, as measured by P, improves
with experience E.
—Tom Mitchell, 1997

For example, your spam filter is a Machine Learning program that can learn to flag
spam given examples of spam emails (e.g., flagged by users) and examples of regular
(nonspam, also called “ham”) emails. The examples that the system uses to learn are
called the training set. Each training example is called a training instance (or sample).
In this case, the task T is to flag spam for new emails, the experience E is the training
data, and the performance measure P needs to be defined; for example, you can use
the ratio of correctly classified emails. This particular performance measure is called
accuracy and it is often used in classification tasks.
If you just download a copy of Wikipedia, your computer has a lot more data, but it is
not suddenly better at any task. Thus, it is not Machine Learning.

Why Use Machine Learning?


Consider how you would write a spam filter using traditional programming techni‐
ques (Figure 1-1):

1. First you would look at what spam typically looks like. You might notice that
some words or phrases (such as “4U,” “credit card,” “free,” and “amazing”) tend to
come up a lot in the subject. Perhaps you would also notice a few other patterns
in the sender’s name, the email’s body, and so on.

4 | Chapter 1: The Machine Learning Landscape


2. You would write a detection algorithm for each of the patterns that you noticed,
and your program would flag emails as spam if a number of these patterns are
detected.
3. You would test your program, and repeat steps 1 and 2 until it is good enough.

Figure 1-1. The traditional approach

Since the problem is not trivial, your program will likely become a long list of com‐
plex rules—pretty hard to maintain.
In contrast, a spam filter based on Machine Learning techniques automatically learns
which words and phrases are good predictors of spam by detecting unusually fre‐
quent patterns of words in the spam examples compared to the ham examples
(Figure 1-2). The program is much shorter, easier to maintain, and most likely more
accurate.

Figure 1-2. Machine Learning approach

Why Use Machine Learning? | 5


Moreover, if spammers notice that all their emails containing “4U” are blocked, they
might start writing “For U” instead. A spam filter using traditional programming
techniques would need to be updated to flag “For U” emails. If spammers keep work‐
ing around your spam filter, you will need to keep writing new rules forever.
In contrast, a spam filter based on Machine Learning techniques automatically noti‐
ces that “For U” has become unusually frequent in spam flagged by users, and it starts
flagging them without your intervention (Figure 1-3).

Figure 1-3. Automatically adapting to change

Another area where Machine Learning shines is for problems that either are too com‐
plex for traditional approaches or have no known algorithm. For example, consider
speech recognition: say you want to start simple and write a program capable of dis‐
tinguishing the words “one” and “two.” You might notice that the word “two” starts
with a high-pitch sound (“T”), so you could hardcode an algorithm that measures
high-pitch sound intensity and use that to distinguish ones and twos. Obviously this
technique will not scale to thousands of words spoken by millions of very different
people in noisy environments and in dozens of languages. The best solution (at least
today) is to write an algorithm that learns by itself, given many example recordings
for each word.
Finally, Machine Learning can help humans learn (Figure 1-4): ML algorithms can be
inspected to see what they have learned (although for some algorithms this can be
tricky). For instance, once the spam filter has been trained on enough spam, it can
easily be inspected to reveal the list of words and combinations of words that it
believes are the best predictors of spam. Sometimes this will reveal unsuspected cor‐
relations or new trends, and thereby lead to a better understanding of the problem.
Applying ML techniques to dig into large amounts of data can help discover patterns
that were not immediately apparent. This is called data mining.

6 | Chapter 1: The Machine Learning Landscape


Figure 1-4. Machine Learning can help humans learn

To summarize, Machine Learning is great for:

• Problems for which existing solutions require a lot of hand-tuning or long lists of
rules: one Machine Learning algorithm can often simplify code and perform bet‐
ter.
• Complex problems for which there is no good solution at all using a traditional
approach: the best Machine Learning techniques can find a solution.
• Fluctuating environments: a Machine Learning system can adapt to new data.
• Getting insights about complex problems and large amounts of data.

Types of Machine Learning Systems


There are so many different types of Machine Learning systems that it is useful to
classify them in broad categories based on:

• Whether or not they are trained with human supervision (supervised, unsuper‐
vised, semisupervised, and Reinforcement Learning)
• Whether or not they can learn incrementally on the fly (online versus batch
learning)
• Whether they work by simply comparing new data points to known data points,
or instead detect patterns in the training data and build a predictive model, much
like scientists do (instance-based versus model-based learning)

These criteria are not exclusive; you can combine them in any way you like. For
example, a state-of-the-art spam filter may learn on the fly using a deep neural net‐

Types of Machine Learning Systems | 7


work model trained using examples of spam and ham; this makes it an online, model-
based, supervised learning system.
Let’s look at each of these criteria a bit more closely.

Supervised/Unsupervised Learning
Machine Learning systems can be classified according to the amount and type of
supervision they get during training. There are four major categories: supervised
learning, unsupervised learning, semisupervised learning, and Reinforcement Learn‐
ing.

Supervised learning
In supervised learning, the training data you feed to the algorithm includes the desired
solutions, called labels (Figure 1-5).

Figure 1-5. A labeled training set for supervised learning (e.g., spam classification)

A typical supervised learning task is classification. The spam filter is a good example
of this: it is trained with many example emails along with their class (spam or ham),
and it must learn how to classify new emails.
Another typical task is to predict a target numeric value, such as the price of a car,
given a set of features (mileage, age, brand, etc.) called predictors. This sort of task is
called regression (Figure 1-6).1 To train the system, you need to give it many examples
of cars, including both their predictors and their labels (i.e., their prices).

1 Fun fact: this odd-sounding name is a statistics term introduced by Francis Galton while he was studying the
fact that the children of tall people tend to be shorter than their parents. Since children were shorter, he called
this regression to the mean. This name was then applied to the methods he used to analyze correlations
between variables.

8 | Chapter 1: The Machine Learning Landscape


In Machine Learning an attribute is a data type (e.g., “Mileage”),
while a feature has several meanings depending on the context, but
generally means an attribute plus its value (e.g., “Mileage =
15,000”). Many people use the words attribute and feature inter‐
changeably, though.

Figure 1-6. Regression

Note that some regression algorithms can be used for classification as well, and vice
versa. For example, Logistic Regression is commonly used for classification, as it can
output a value that corresponds to the probability of belonging to a given class (e.g.,
20% chance of being spam).
Here are some of the most important supervised learning algorithms (covered in this
book):

• k-Nearest Neighbors
• Linear Regression
• Logistic Regression
• Support Vector Machines (SVMs)
• Decision Trees and Random Forests
• Neural networks2

2 Some neural network architectures can be unsupervised, such as autoencoders and restricted Boltzmann
machines. They can also be semisupervised, such as in deep belief networks and unsupervised pretraining.

Types of Machine Learning Systems | 9


Unsupervised learning
In unsupervised learning, as you might guess, the training data is unlabeled
(Figure 1-7). The system tries to learn without a teacher.

Figure 1-7. An unlabeled training set for unsupervised learning

Here are some of the most important unsupervised learning algorithms (we will
cover dimensionality reduction in Chapter 8):

• Clustering
— k-Means
— Hierarchical Cluster Analysis (HCA)
— Expectation Maximization
• Visualization and dimensionality reduction
— Principal Component Analysis (PCA)
— Kernel PCA
— Locally-Linear Embedding (LLE)
— t-distributed Stochastic Neighbor Embedding (t-SNE)
• Association rule learning
— Apriori
— Eclat

For example, say you have a lot of data about your blog’s visitors. You may want to
run a clustering algorithm to try to detect groups of similar visitors (Figure 1-8). At
no point do you tell the algorithm which group a visitor belongs to: it finds those
connections without your help. For example, it might notice that 40% of your visitors
are males who love comic books and generally read your blog in the evening, while
20% are young sci-fi lovers who visit during the weekends, and so on. If you use a
hierarchical clustering algorithm, it may also subdivide each group into smaller
groups. This may help you target your posts for each group.

10 | Chapter 1: The Machine Learning Landscape


Figure 1-8. Clustering

Visualization algorithms are also good examples of unsupervised learning algorithms:


you feed them a lot of complex and unlabeled data, and they output a 2D or 3D rep‐
resentation of your data that can easily be plotted (Figure 1-9). These algorithms try
to preserve as much structure as they can (e.g., trying to keep separate clusters in the
input space from overlapping in the visualization), so you can understand how the
data is organized and perhaps identify unsuspected patterns.

Figure 1-9. Example of a t-SNE visualization highlighting semantic clusters3

3 Notice how animals are rather well separated from vehicles, how horses are close to deer but far from birds,
and so on. Figure reproduced with permission from Socher, Ganjoo, Manning, and Ng (2013), “T-SNE visual‐
ization of the semantic word space.”

Types of Machine Learning Systems | 11


A related task is dimensionality reduction, in which the goal is to simplify the data
without losing too much information. One way to do this is to merge several correla‐
ted features into one. For example, a car’s mileage may be very correlated with its age,
so the dimensionality reduction algorithm will merge them into one feature that rep‐
resents the car’s wear and tear. This is called feature extraction.

It is often a good idea to try to reduce the dimension of your train‐


ing data using a dimensionality reduction algorithm before you
feed it to another Machine Learning algorithm (such as a super‐
vised learning algorithm). It will run much faster, the data will take
up less disk and memory space, and in some cases it may also per‐
form better.

Yet another important unsupervised task is anomaly detection—for example, detect‐


ing unusual credit card transactions to prevent fraud, catching manufacturing defects,
or automatically removing outliers from a dataset before feeding it to another learn‐
ing algorithm. The system is trained with normal instances, and when it sees a new
instance it can tell whether it looks like a normal one or whether it is likely an anom‐
aly (see Figure 1-10).

Figure 1-10. Anomaly detection

Finally, another common unsupervised task is association rule learning, in which the
goal is to dig into large amounts of data and discover interesting relations between
attributes. For example, suppose you own a supermarket. Running an association rule
on your sales logs may reveal that people who purchase barbecue sauce and potato
chips also tend to buy steak. Thus, you may want to place these items close to each
other.

12 | Chapter 1: The Machine Learning Landscape


Semisupervised learning
Some algorithms can deal with partially labeled training data, usually a lot of unla‐
beled data and a little bit of labeled data. This is called semisupervised learning
(Figure 1-11).
Some photo-hosting services, such as Google Photos, are good examples of this. Once
you upload all your family photos to the service, it automatically recognizes that the
same person A shows up in photos 1, 5, and 11, while another person B shows up in
photos 2, 5, and 7. This is the unsupervised part of the algorithm (clustering). Now all
the system needs is for you to tell it who these people are. Just one label per person,4
and it is able to name everyone in every photo, which is useful for searching photos.

Figure 1-11. Semisupervised learning

Most semisupervised learning algorithms are combinations of unsupervised and


supervised algorithms. For example, deep belief networks (DBNs) are based on unsu‐
pervised components called restricted Boltzmann machines (RBMs) stacked on top of
one another. RBMs are trained sequentially in an unsupervised manner, and then the
whole system is fine-tuned using supervised learning techniques.

Reinforcement Learning
Reinforcement Learning is a very different beast. The learning system, called an agent
in this context, can observe the environment, select and perform actions, and get
rewards in return (or penalties in the form of negative rewards, as in Figure 1-12). It
must then learn by itself what is the best strategy, called a policy, to get the most
reward over time. A policy defines what action the agent should choose when it is in a
given situation.

4 That’s when the system works perfectly. In practice it often creates a few clusters per person, and sometimes
mixes up two people who look alike, so you need to provide a few labels per person and manually clean up
some clusters.

Types of Machine Learning Systems | 13


Figure 1-12. Reinforcement Learning

For example, many robots implement Reinforcement Learning algorithms to learn


how to walk. DeepMind’s AlphaGo program is also a good example of Reinforcement
Learning: it made the headlines in March 2016 when it beat the world champion Lee
Sedol at the game of Go. It learned its winning policy by analyzing millions of games,
and then playing many games against itself. Note that learning was turned off during
the games against the champion; AlphaGo was just applying the policy it had learned.

Batch and Online Learning


Another criterion used to classify Machine Learning systems is whether or not the
system can learn incrementally from a stream of incoming data.

Batch learning
In batch learning, the system is incapable of learning incrementally: it must be trained
using all the available data. This will generally take a lot of time and computing
resources, so it is typically done offline. First the system is trained, and then it is
launched into production and runs without learning anymore; it just applies what it
has learned. This is called offline learning.
If you want a batch learning system to know about new data (such as a new type of
spam), you need to train a new version of the system from scratch on the full dataset
(not just the new data, but also the old data), then stop the old system and replace it
with the new one.
Fortunately, the whole process of training, evaluating, and launching a Machine
Learning system can be automated fairly easily (as shown in Figure 1-3), so even a

14 | Chapter 1: The Machine Learning Landscape


batch learning system can adapt to change. Simply update the data and train a new
version of the system from scratch as often as needed.
This solution is simple and often works fine, but training using the full set of data can
take many hours, so you would typically train a new system only every 24 hours or
even just weekly. If your system needs to adapt to rapidly changing data (e.g., to pre‐
dict stock prices), then you need a more reactive solution.
Also, training on the full set of data requires a lot of computing resources (CPU,
memory space, disk space, disk I/O, network I/O, etc.). If you have a lot of data and
you automate your system to train from scratch every day, it will end up costing you a
lot of money. If the amount of data is huge, it may even be impossible to use a batch
learning algorithm.
Finally, if your system needs to be able to learn autonomously and it has limited
resources (e.g., a smartphone application or a rover on Mars), then carrying around
large amounts of training data and taking up a lot of resources to train for hours
every day is a showstopper.
Fortunately, a better option in all these cases is to use algorithms that are capable of
learning incrementally.

Online learning
In online learning, you train the system incrementally by feeding it data instances
sequentially, either individually or by small groups called mini-batches. Each learning
step is fast and cheap, so the system can learn about new data on the fly, as it arrives
(see Figure 1-13).

Figure 1-13. Online learning

Online learning is great for systems that receive data as a continuous flow (e.g., stock
prices) and need to adapt to change rapidly or autonomously. It is also a good option

Types of Machine Learning Systems | 15


if you have limited computing resources: once an online learning system has learned
about new data instances, it does not need them anymore, so you can discard them
(unless you want to be able to roll back to a previous state and “replay” the data). This
can save a huge amount of space.
Online learning algorithms can also be used to train systems on huge datasets that
cannot fit in one machine’s main memory (this is called out-of-core learning). The
algorithm loads part of the data, runs a training step on that data, and repeats the
process until it has run on all of the data (see Figure 1-14).

This whole process is usually done offline (i.e., not on the live sys‐
tem), so online learning can be a confusing name. Think of it as
incremental learning.

Figure 1-14. Using online learning to handle huge datasets

One important parameter of online learning systems is how fast they should adapt to
changing data: this is called the learning rate. If you set a high learning rate, then your
system will rapidly adapt to new data, but it will also tend to quickly forget the old
data (you don’t want a spam filter to flag only the latest kinds of spam it was shown).
Conversely, if you set a low learning rate, the system will have more inertia; that is, it
will learn more slowly, but it will also be less sensitive to noise in the new data or to
sequences of nonrepresentative data points.
A big challenge with online learning is that if bad data is fed to the system, the sys‐
tem’s performance will gradually decline. If we are talking about a live system, your
clients will notice. For example, bad data could come from a malfunctioning sensor
on a robot, or from someone spamming a search engine to try to rank high in search

16 | Chapter 1: The Machine Learning Landscape


results. To reduce this risk, you need to monitor your system closely and promptly
switch learning off (and possibly revert to a previously working state) if you detect a
drop in performance. You may also want to monitor the input data and react to
abnormal data (e.g., using an anomaly detection algorithm).

Instance-Based Versus Model-Based Learning


One more way to categorize Machine Learning systems is by how they generalize.
Most Machine Learning tasks are about making predictions. This means that given a
number of training examples, the system needs to be able to generalize to examples it
has never seen before. Having a good performance measure on the training data is
good, but insufficient; the true goal is to perform well on new instances.
There are two main approaches to generalization: instance-based learning and
model-based learning.

Instance-based learning
Possibly the most trivial form of learning is simply to learn by heart. If you were to
create a spam filter this way, it would just flag all emails that are identical to emails
that have already been flagged by users—not the worst solution, but certainly not the
best.
Instead of just flagging emails that are identical to known spam emails, your spam
filter could be programmed to also flag emails that are very similar to known spam
emails. This requires a measure of similarity between two emails. A (very basic) simi‐
larity measure between two emails could be to count the number of words they have
in common. The system would flag an email as spam if it has many words in com‐
mon with a known spam email.
This is called instance-based learning: the system learns the examples by heart, then
generalizes to new cases using a similarity measure (Figure 1-15).

Figure 1-15. Instance-based learning

Types of Machine Learning Systems | 17


Model-based learning
Another way to generalize from a set of examples is to build a model of these exam‐
ples, then use that model to make predictions. This is called model-based learning
(Figure 1-16).

Figure 1-16. Model-based learning

For example, suppose you want to know if money makes people happy, so you down‐
load the Better Life Index data from the OECD’s website as well as stats about GDP
per capita from the IMF’s website. Then you join the tables and sort by GDP per cap‐
ita. Table 1-1 shows an excerpt of what you get.

Table 1-1. Does money make people happier?


Country GDP per capita (USD) Life satisfaction
Hungary 12,240 4.9
Korea 27,195 5.8
France 37,675 6.5
Australia 50,962 7.3
United States 55,805 7.2

Let’s plot the data for a few random countries (Figure 1-17).

18 | Chapter 1: The Machine Learning Landscape


Figure 1-17. Do you see a trend here?

There does seem to be a trend here! Although the data is noisy (i.e., partly random), it
looks like life satisfaction goes up more or less linearly as the country’s GDP per cap‐
ita increases. So you decide to model life satisfaction as a linear function of GDP per
capita. This step is called model selection: you selected a linear model of life satisfac‐
tion with just one attribute, GDP per capita (Equation 1-1).

Equation 1-1. A simple linear model


li f e_satis f action = θ0 + θ1 × GDP_ per_capita

This model has two model parameters, θ0 and θ1.5 By tweaking these parameters, you
can make your model represent any linear function, as shown in Figure 1-18.

Figure 1-18. A few possible linear models

5 By convention, the Greek letter θ (theta) is frequently used to represent model parameters.

Types of Machine Learning Systems | 19


Before you can use your model, you need to define the parameter values θ0 and θ1.
How can you know which values will make your model perform best? To answer this
question, you need to specify a performance measure. You can either define a utility
function (or fitness function) that measures how good your model is, or you can define
a cost function that measures how bad it is. For linear regression problems, people
typically use a cost function that measures the distance between the linear model’s
predictions and the training examples; the objective is to minimize this distance.
This is where the Linear Regression algorithm comes in: you feed it your training
examples and it finds the parameters that make the linear model fit best to your data.
This is called training the model. In our case the algorithm finds that the optimal
parameter values are θ0 = 4.85 and θ1 = 4.91 × 10–5.
Now the model fits the training data as closely as possible (for a linear model), as you
can see in Figure 1-19.

Figure 1-19. The linear model that fits the training data best

You are finally ready to run the model to make predictions. For example, say you
want to know how happy Cypriots are, and the OECD data does not have the answer.
Fortunately, you can use your model to make a good prediction: you look up Cyprus’s
GDP per capita, find $22,587, and then apply your model and find that life satisfac‐
tion is likely to be somewhere around 4.85 + 22,587 × 4.91 × 10-5 = 5.96.
To whet your appetite, Example 1-1 shows the Python code that loads the data, pre‐
pares it,6 creates a scatterplot for visualization, and then trains a linear model and
makes a prediction.7

6 The code assumes that prepare_country_stats() is already defined: it merges the GDP and life satisfaction
data into a single Pandas dataframe.
7 It’s okay if you don’t understand all the code yet; we will present Scikit-Learn in the following chapters.

20 | Chapter 1: The Machine Learning Landscape


Example 1-1. Training and running a linear model using Scikit-Learn
import matplotlib
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import sklearn

# Load the data


oecd_bli = pd.read_csv("oecd_bli_2015.csv", thousands=',')
gdp_per_capita = pd.read_csv("gdp_per_capita.csv",thousands=',',delimiter='\t',
encoding='latin1', na_values="n/a")

# Prepare the data


country_stats = prepare_country_stats(oecd_bli, gdp_per_capita)
X = np.c_[country_stats["GDP per capita"]]
y = np.c_[country_stats["Life satisfaction"]]

# Visualize the data


country_stats.plot(kind='scatter', x="GDP per capita", y='Life satisfaction')
plt.show()

# Select a linear model


lin_reg_model = sklearn.linear_model.LinearRegression()

# Train the model


lin_reg_model.fit(X, y)

# Make a prediction for Cyprus


X_new = [[22587]] # Cyprus' GDP per capita
print(lin_reg_model.predict(X_new)) # outputs [[ 5.96242338]]

If you had used an instance-based learning algorithm instead, you


would have found that Slovenia has the closest GDP per capita to
that of Cyprus ($20,732), and since the OECD data tells us that
Slovenians’ life satisfaction is 5.7, you would have predicted a life
satisfaction of 5.7 for Cyprus. If you zoom out a bit and look at the
two next closest countries, you will find Portugal and Spain with
life satisfactions of 5.1 and 6.5, respectively. Averaging these three
values, you get 5.77, which is pretty close to your model-based pre‐
diction. This simple algorithm is called k-Nearest Neighbors regres‐
sion (in this example, k = 3).
Replacing the Linear Regression model with k-Nearest Neighbors
regression in the previous code is as simple as replacing this line:
clf = sklearn.linear_model.LinearRegression()
with this one:
clf = sklearn.neighbors.KNeighborsRegressor(n_neighbors=3)

Types of Machine Learning Systems | 21


If all went well, your model will make good predictions. If not, you may need to use
more attributes (employment rate, health, air pollution, etc.), get more or better qual‐
ity training data, or perhaps select a more powerful model (e.g., a Polynomial Regres‐
sion model).
In summary:

• You studied the data.


• You selected a model.
• You trained it on the training data (i.e., the learning algorithm searched for the
model parameter values that minimize a cost function).
• Finally, you applied the model to make predictions on new cases (this is called
inference), hoping that this model will generalize well.

This is what a typical Machine Learning project looks like. In Chapter 2 you will
experience this first-hand by going through an end-to-end project.
We have covered a lot of ground so far: you now know what Machine Learning is
really about, why it is useful, what some of the most common categories of ML sys‐
tems are, and what a typical project workflow looks like. Now let’s look at what can go
wrong in learning and prevent you from making accurate predictions.

Main Challenges of Machine Learning


In short, since your main task is to select a learning algorithm and train it on some
data, the two things that can go wrong are “bad algorithm” and “bad data.” Let’s start
with examples of bad data.

Insufficient Quantity of Training Data


For a toddler to learn what an apple is, all it takes is for you to point to an apple and
say “apple” (possibly repeating this procedure a few times). Now the child is able to
recognize apples in all sorts of colors and shapes. Genius.
Machine Learning is not quite there yet; it takes a lot of data for most Machine Learn‐
ing algorithms to work properly. Even for very simple problems you typically need
thousands of examples, and for complex problems such as image or speech recogni‐
tion you may need millions of examples (unless you can reuse parts of an existing
model).

22 | Chapter 1: The Machine Learning Landscape


The Unreasonable Effectiveness of Data
In a famous paper published in 2001, Microsoft researchers Michele Banko and Eric
Brill showed that very different Machine Learning algorithms, including fairly simple
ones, performed almost identically well on a complex problem of natural language
disambiguation8 once they were given enough data (as you can see in Figure 1-20).

Figure 1-20. The importance of data versus algorithms9

As the authors put it: “these results suggest that we may want to reconsider the trade-
off between spending time and money on algorithm development versus spending it
on corpus development.”
The idea that data matters more than algorithms for complex problems was further
popularized by Peter Norvig et al. in a paper titled “The Unreasonable Effectiveness
of Data” published in 2009.10 It should be noted, however, that small- and medium-
sized datasets are still very common, and it is not always easy or cheap to get extra
training data, so don’t abandon algorithms just yet.

8 For example, knowing whether to write “to,” “two,” or “too” depending on the context.
9 Figure reproduced with permission from Banko and Brill (2001), “Learning Curves for Confusion Set Disam‐
biguation.”
10 “The Unreasonable Effectiveness of Data,” Peter Norvig et al. (2009).

Main Challenges of Machine Learning | 23


Nonrepresentative Training Data
In order to generalize well, it is crucial that your training data be representative of the
new cases you want to generalize to. This is true whether you use instance-based
learning or model-based learning.
For example, the set of countries we used earlier for training the linear model was not
perfectly representative; a few countries were missing. Figure 1-21 shows what the
data looks like when you add the missing countries.

Figure 1-21. A more representative training sample

If you train a linear model on this data, you get the solid line, while the old model is
represented by the dotted line. As you can see, not only does adding a few missing
countries significantly alter the model, but it makes it clear that such a simple linear
model is probably never going to work well. It seems that very rich countries are not
happier than moderately rich countries (in fact they seem unhappier), and conversely
some poor countries seem happier than many rich countries.
By using a nonrepresentative training set, we trained a model that is unlikely to make
accurate predictions, especially for very poor and very rich countries.
It is crucial to use a training set that is representative of the cases you want to general‐
ize to. This is often harder than it sounds: if the sample is too small, you will have
sampling noise (i.e., nonrepresentative data as a result of chance), but even very large
samples can be nonrepresentative if the sampling method is flawed. This is called
sampling bias.

A Famous Example of Sampling Bias


Perhaps the most famous example of sampling bias happened during the US presi‐
dential election in 1936, which pitted Landon against Roosevelt: the Literary Digest
conducted a very large poll, sending mail to about 10 million people. It got 2.4 million
answers, and predicted with high confidence that Landon would get 57% of the votes.

24 | Chapter 1: The Machine Learning Landscape


Instead, Roosevelt won with 62% of the votes. The flaw was in the Literary Digest’s
sampling method:

• First, to obtain the addresses to send the polls to, the Literary Digest used tele‐
phone directories, lists of magazine subscribers, club membership lists, and the
like. All of these lists tend to favor wealthier people, who are more likely to vote
Republican (hence Landon).
• Second, less than 25% of the people who received the poll answered. Again, this
introduces a sampling bias, by ruling out people who don’t care much about poli‐
tics, people who don’t like the Literary Digest, and other key groups. This is a spe‐
cial type of sampling bias called nonresponse bias.

Here is another example: say you want to build a system to recognize funk music vid‐
eos. One way to build your training set is to search “funk music” on YouTube and use
the resulting videos. But this assumes that YouTube’s search engine returns a set of
videos that are representative of all the funk music videos on YouTube. In reality, the
search results are likely to be biased toward popular artists (and if you live in Brazil
you will get a lot of “funk carioca” videos, which sound nothing like James Brown).
On the other hand, how else can you get a large training set?

Poor-Quality Data
Obviously, if your training data is full of errors, outliers, and noise (e.g., due to poor-
quality measurements), it will make it harder for the system to detect the underlying
patterns, so your system is less likely to perform well. It is often well worth the effort
to spend time cleaning up your training data. The truth is, most data scientists spend
a significant part of their time doing just that. For example:

• If some instances are clearly outliers, it may help to simply discard them or try to
fix the errors manually.
• If some instances are missing a few features (e.g., 5% of your customers did not
specify their age), you must decide whether you want to ignore this attribute alto‐
gether, ignore these instances, fill in the missing values (e.g., with the median
age), or train one model with the feature and one model without it, and so on.

Irrelevant Features
As the saying goes: garbage in, garbage out. Your system will only be capable of learn‐
ing if the training data contains enough relevant features and not too many irrelevant
ones. A critical part of the success of a Machine Learning project is coming up with a
good set of features to train on. This process, called feature engineering, involves:

Main Challenges of Machine Learning | 25


• Feature selection: selecting the most useful features to train on among existing
features.
• Feature extraction: combining existing features to produce a more useful one (as
we saw earlier, dimensionality reduction algorithms can help).
• Creating new features by gathering new data.

Now that we have looked at many examples of bad data, let’s look at a couple of exam‐
ples of bad algorithms.

Overfitting the Training Data


Say you are visiting a foreign country and the taxi driver rips you off. You might be
tempted to say that all taxi drivers in that country are thieves. Overgeneralizing is
something that we humans do all too often, and unfortunately machines can fall into
the same trap if we are not careful. In Machine Learning this is called overfitting: it
means that the model performs well on the training data, but it does not generalize
well.
Figure 1-22 shows an example of a high-degree polynomial life satisfaction model
that strongly overfits the training data. Even though it performs much better on the
training data than the simple linear model, would you really trust its predictions?

Figure 1-22. Overfitting the training data

Complex models such as deep neural networks can detect subtle patterns in the data,
but if the training set is noisy, or if it is too small (which introduces sampling noise),
then the model is likely to detect patterns in the noise itself. Obviously these patterns
will not generalize to new instances. For example, say you feed your life satisfaction
model many more attributes, including uninformative ones such as the country’s
name. In that case, a complex model may detect patterns like the fact that all coun‐
tries in the training data with a w in their name have a life satisfaction greater than 7:
New Zealand (7.3), Norway (7.4), Sweden (7.2), and Switzerland (7.5). How confident

26 | Chapter 1: The Machine Learning Landscape


are you that the W-satisfaction rule generalizes to Rwanda or Zimbabwe? Obviously
this pattern occurred in the training data by pure chance, but the model has no way
to tell whether a pattern is real or simply the result of noise in the data.

Overfitting happens when the model is too complex relative to the


amount and noisiness of the training data. The possible solutions
are:

• To simplify the model by selecting one with fewer parameters


(e.g., a linear model rather than a high-degree polynomial
model), by reducing the number of attributes in the training
data or by constraining the model
• To gather more training data
• To reduce the noise in the training data (e.g., fix data errors
and remove outliers)

Constraining a model to make it simpler and reduce the risk of overfitting is called
regularization. For example, the linear model we defined earlier has two parameters,
θ0 and θ1. This gives the learning algorithm two degrees of freedom to adapt the model
to the training data: it can tweak both the height (θ0) and the slope (θ1) of the line. If
we forced θ1 = 0, the algorithm would have only one degree of freedom and would
have a much harder time fitting the data properly: all it could do is move the line up
or down to get as close as possible to the training instances, so it would end up
around the mean. A very simple model indeed! If we allow the algorithm to modify θ1
but we force it to keep it small, then the learning algorithm will effectively have some‐
where in between one and two degrees of freedom. It will produce a simpler model
than with two degrees of freedom, but more complex than with just one. You want to
find the right balance between fitting the data perfectly and keeping the model simple
enough to ensure that it will generalize well.
Figure 1-23 shows three models: the dotted line represents the original model that
was trained with a few countries missing, the dashed line is our second model trained
with all countries, and the solid line is a linear model trained with the same data as
the first model but with a regularization constraint. You can see that regularization
forced the model to have a smaller slope, which fits a bit less the training data that the
model was trained on, but actually allows it to generalize better to new examples.

Main Challenges of Machine Learning | 27


Figure 1-23. Regularization reduces the risk of overfitting

The amount of regularization to apply during learning can be controlled by a hyper‐


parameter. A hyperparameter is a parameter of a learning algorithm (not of the
model). As such, it is not affected by the learning algorithm itself; it must be set prior
to training and remains constant during training. If you set the regularization hyper‐
parameter to a very large value, you will get an almost flat model (a slope close to
zero); the learning algorithm will almost certainly not overfit the training data, but it
will be less likely to find a good solution. Tuning hyperparameters is an important
part of building a Machine Learning system (you will see a detailed example in the
next chapter).

Underfitting the Training Data


As you might guess, underfitting is the opposite of overfitting: it occurs when your
model is too simple to learn the underlying structure of the data. For example, a lin‐
ear model of life satisfaction is prone to underfit; reality is just more complex than
the model, so its predictions are bound to be inaccurate, even on the training exam‐
ples.
The main options to fix this problem are:

• Selecting a more powerful model, with more parameters


• Feeding better features to the learning algorithm (feature engineering)
• Reducing the constraints on the model (e.g., reducing the regularization hyper‐
parameter)

Stepping Back
By now you already know a lot about Machine Learning. However, we went through
so many concepts that you may be feeling a little lost, so let’s step back and look at the
big picture:

28 | Chapter 1: The Machine Learning Landscape


Other documents randomly have
different content
261
Cf. Narshakhi, ed. Schefer, p. 234.
262
We are told by this same author that they had caused
much depredation among the Mohammedans, which
seems inconsistent with what has been said of them
before.
263
S. Lane-Poole gives the date of Boghrā Khān’s death as
435, and makes no mention of his son Ibrāhīm.
264
Narshakhi, ed. Schefer, reads this name Tumghāch.
265
S. Lane-Poole (loc. cit.) says Ibrāhīm died in 460, and
was succeeded by his son Nasr, who died in 472. It will
be seen that great confusion exists with regard to these
Khāns. Major Raverty, in his translation of the Tabakāt-i-
Nāsiri, furnishes a long list of Ilik Khāns; but it is hard to
reconcile any two accounts, so much do the names and
dates differ.
266
S. Lane-Poole (Mohammedan Dynasties, p. 135) says
Mahmūd Khān II.
267
S. Lane-Poole (loc. cit.) reads Mahmūd Khān III., and
from this point the list he gives no longer corresponds
with Narshakhi’s account.
268
Mīrkhwānd (Vüllers, Historia Seldschukidarum, p. 176),
and Vambéry following him, say that Mohammad was
reinstated.
269
The modern Khiva.
270
See chap. XX.
271
This history, by Hamdullah Mustawfi, is one of the most
important Persian chronicles. The whole text has never
yet been published, but the portion relating to the
Seljūks was edited and translated by M. Defrémery.
272
There is some confusion as to the precise origin of this
branch of the Turks. Aug. Müller says that during the
disorders which attended the downfall of the Sāmānides
and the struggles between the Ghaznavides and the
Khāns of Kāshghar, the Ghuz, through internal
dissensions, became split up into subdivisions. The
foremost of these was a branch who in A.H. 345 (956)
settled down in Jend (east of Khwārazm). They received
the name of Seljūk from their chief, who had been
compelled to quit the court of his master Pighu Khān of
the Kipchāk Turks. He is said to have embraced Islām
(Müller, Islām, ii. 74).
273
He was the first prince to bear the title of Sultān. Cf.
Gibbon, chap. 47.
274
Malcolm, op. cit. i. p. 195.
275
Cf. Müller, op. cit. ii. p. 76.
276
The son of Altuntāsh mentioned above, p. 123.
277
Gibbon (chap, lvii.) speaks of this victory as the
“memorable day of Qandacan” which “founded in Persia
the dynasty of the shepherd kings.” He gives the date as
A.D. 1038.
278
Mohammad, who, as stated above, had been nominated
by his father Mahmūd to succeed him in Ghazna, had
been almost immediately deposed by his brother
Mas`ūd.
279
Malcolm, op. cit. i. p. 199.
280
Müller, op. cit. i. 77.
281
Vide supra, p. 112, note 1.
282
Cf. Gibbon, chap. lvii. De Guignes gives a somewhat
different version of the relations between the Emperor
and the Turk (vol. iii. p. 191). He says: “Constantin-
Monomaque qui regnoit alors à Constantinople, ne crut
pas devoir négliger l’alliance d’un prince qui faisoit
trembler toute l’Asie: il lui envoya des ambassadeurs
pour lui proposer de faire la paix, et Thogrulbegh y
consentit.” This difference is due to the fact that
Gibbon’s authorities were Byzantine, while De Guignes’
were Mohammedan.
283
It would, however, be wrong to regard these Turks as
uncultured people; for though few traces of their early
literature have come down to us, testimony is not
wanting to the fact that they had, long before they
began their westward migrations, a written language
and perhaps a literature.
284
He was not received in audience by the Caliph till A.H.
451 (1059). In 455 (1063), in spite of his outward show
of respect, Toghrul Beg practically forced the Caliph to
give him his daughter in marriage. But, in the same year,
as Toghrul was about to claim his bride, fortune
suddenly deserted him, and he died at the age of
seventy in Ray, where, according to Mīrkhwānd (see ed.
Vüllers, p. 65), he wished to celebrate his nuptials.
285
His name is familiar to the English public through the
medium of `Omar Khayyām. All who have read
Fitzgerald’s admirable translation of the Rubaiyāt know
the story of the three famous schoolfellows—`Omar
Khayyām, the poet; Nizām ul-Mulk, the statesman; and
Hasan ibn Sabbāh, “the Old Man of the Mountain.”
These three, as schoolboys at Nīshāpūr, had sworn that
whichever of them should rise highest in the world
should help the others. Of two of them we shall have to
speak below.
286
His was not actually their first expedition, for, in 1050,
parts of Armenia had been laid waste and countless
Christians massacred by the Turks. Cf. Gibbon, chap.
xlvii.
287
We refer the reader to Gibbon’s 57th chapter for a vivid
account of Alp Arslān’s dealings with the Romans (see
also Malcolm, op. cit. i. 209–213).
288
This was a chief named Yūsuf, who had long held out
against the Sultan in his fortress of Berzem in
Khwārazm. Cf. Malcolm, op. cit. i. 213; and De Guignes,
iii. 213.
289
Notably his uncle Kāwurd (see Müller, op. cit. ii. 94),—
whom Vambéry calls Kurd; and Vüllers (in Mīrkhwānd’s
Seljūks), Kādurd; and Malcolm (op. cit. i. 216), Cawder.
290
Müller, op. cit. ii. 94.
291
See below, chap. xix.
292
Vambéry (op. cit. p. 100) qualifies these statements as
the “mere fabrications of partial Arab and Persian
writers.”
293
Op. cit. ii. 95.
294
This assassin was one of the emissaries (or fadāwi) of
Hasan ibn Sabbāh, Nizām ul-Mulk’s old school friend. For
an account of the Assassins we refer the reader to the
article under that heading in the Encyclopædia
Britannica. For more than a century the devotees of the
Old Man of the Mountain played a part in politics not
dissimilar to that of the Jesuits at certain periods in
Europe. See J. von Hammer’s Hist. de l’Ordre des
Assassins (Paris, 1833); S. Guyard’s “Un Grand Maître
des Assassins,” Journal Asiatique, 1877; and an article
by Mr. E. G. Browne in St. Bartholomew’s Hosp. Journ.,
March 1897.
295
The history of the remaining Seljūk kings (of the original
branch) is so admirably epitomised by Malcolm that it
was considered unnecessary in this place to do more
than quote from his well-known History of Persia (vol. ii.
p. 222 et seq.). These sons were Berkiyāruk,
Mohammad, Sanjar, and Mahmūd.
296
He was himself but fourteen years of age at the time of
his father’s death.
297
A.H. 487–498 (1094–1104). Malcolm throughout his
otherwise excellent history scarcely ever condescends to
supply the reader with a date of any kind.
298
He died of consumption at the early age of twenty-seven
(perhaps even younger). Cf. Müller, op. cit. ii. 120.
299
He allowed his nephew the two `Irāks on condition that
his (Sanjar’s) name should be mentioned first in the
public prayers (cf. Habīb-us-Siyar).
300
The modern Khānate of Khiva.
301
The Khāns of Khiva still bear the title of Ewer-bearers to
the Sultan of Constantinople.
302
About A.H. 470 (1077).
303
He was a descendant in the eighth generation of T’ai-
tsu, or Apaoki, the first Liao emperor. Cf. Bretschneider,
op. cit. i. 211; Visdelou, p. 28. For the various forms his
name has taken, cf. Howorth on the “Kara-Khitāy,”
J.R.A.S., New Series VIII. 273, 274.
304
De Guignes called him Taigir.
305
Called by the Mohammedans Churché, which
corresponds to the Niuchi of Chinese historians. Cf.
Bretschneider, op. cit. i. 224, note.
306
Cf. d’Ohsson, Histoire des Mongols, i. 163.
307
Some scholars have wished to identify this name with
Kirmān in Persia, but this seems most improbable.
Bretschneider (op. cit. i. 216, note) suggests Kerminé,
which is the site of the summer quarters of the present
Amīr of Bokhārā. Cf. also Howorth, loc. cit.
308
P. 134.
309
Cf. De Guignes, iii. pt. ii. p. 253.
310
Some confusion exists as to whether Kāshghar or
Balāsāghūn was his residence. It seems improbable that
he should have changed in so short a space.
311
A.H. 521 (1127).
312
A.H. 533 (1138).
313
Il-Kilij, the son of Atsiz, perished in the battle.
314
Cf. d’Herbelot, article “Atsiz”; and De Guignes, vol. ii. pt.
ii. p. 254.
315
Thus, according to Narshakhi (p. 243). The statements
of historians are somewhat conflicting in this place. De
Guignes, following Abulfidā, says that Ye-liu Ta-shi
(whom he calls Taigir) died in 1136, when about to
abandon Kāshghar and return to his ancient settlements
in Tartary. The Khitāys then set upon the throne his
infant son, Y-li, with his mother Liao-chi as queen-
regent. Bretschneider has translated a Chinese work
which gives a list of all the line of Kara-Khitāy rulers,
whose dynasty became extinct about 1203. We have not
thought it necessary to reproduce a list of their names in
this place. It may be mentioned, however, that
Bretschneider’s account does not agree with De
Guignes.
316
Cf. De Guignes, vol. iii. pt. i. p. 254; Müller, op. cit. vol.
ii. p. 173. Rashīd ud-Dīn tells us he had drawn auxiliaries
from all parts of his dominions.
317
The Kara-Khitāys were Buddhists.
318
Cf. Müller, loc. cit.
319
A.H. 537 (1142).
320
Cf. De Guignes, loc. cit.; and Müller, ii. p. 174.
321
Cf. De Guignes, iii. pt. i. pp. 256, 257.
322
De Guignes (following Abulfidā) says A.H. 550 (1155).
323
Cf. Müller, op. cit. ii. 173.
324
Mīrkhwānd (ed. Vüllers, p. 183). Khwāndamīr (Habīb-us-
Siyar) adds “Kunduz and Baklān” to the list.
325
The word used is Khānsālār, which means the “Taster,”
or “Table-Decker of the Household.”
326
Mīrkhwānd (ed. Vüllers, p. 185) says that Kamāj and his
son perished in this battle, but Hamdullah Mustawfi, in
the Tārīkh-i-Guzīda, says they were spared.
327
De Guignes, vol. iii. pt. i. p. 256.
328
Mīrkhwānd relates (ed. Vüllers, p. 188) that when Sanjar
fled with his army, and was hotly pursued by the Ghuz, a
man who bore a striking resemblance to the Sultan was
captured. Say what he might, the Ghuz would not be
convinced that this was not Sanjar, and paid him all the
respect due to royalty, until finally some one recognised
him as the son of Sanjar’s cook, whereupon he was
beheaded.
329
Professor Shukovski, of St. Petersburg, published in
1894 an excellent and exhaustive monograph on the
ruins and past history of Merv, under the title Razvilini
starago Merva, “The Ruins of Old Merv.”
330
Ed. Vüllers, p. 189.
331
Mīrkhwānd has in this place evidently followed Hafiz
Abru (the author of the Zubdat-ut-Tawārīkh), who says
that the first day of plunder was devoted to articles of
gold, brass, and silver; the second to bronzes, carpets,
and vases; and the third to whatever of value was left,
such as cotton-stuffs, glass, wooden doors, and the like.
Cf. Professor Shukovski’s Ruins of Old Merv, pp. 29, 30.
332
He is said to have been kept in a cage at night. Cf. De
Guignes, iii. pt. i. 257. Mīrkhwānd has been followed in
this relation, and we have seen what he considered to
be the cause of the hostilities between the Ghuz and
Sanjar. From Ibn el-Athīr (Tārīkh-i-Kāmil, xi. 118, as
quoted by Professor Shukovski, Merv, p. 29) it would
appear that the cause of the conflict was Sanjar’s refusal
to give up Merv to the Ghuz, on the plea that he could
not be expected to abandon his royal residence. De
Guignes (iii. pt. i. p. 257) introduces this anecdote after
the capture of Sanjar.
333
Many say he died of an internal malady, A.H. 552 (1157).
He was in his seventy-third year.
334
The modern Chārjūy.
335
Cf. De Guignes, iii. pt. ii. p. 258.
336
Cf. De Guignes, loc. cit.
337
He entered into a union with the Khān of the Kipchāk,
named Ikrān, and married his daughter, who became
the mother of the famous Sultan Mohammad Khwārazm
Shāh; cf. Tabakāt-i-Nāsiri, Raverty’s translation, i. 240.
This Khān of the Kipchāks is called, on p. 254 of the
same work, Kadr Khān, a discrepancy which escaped the
notice of Major Raverty, who, however, calls attention to
three different Kadr Khāns in one chapter (see op. cit. p.
267, note).
338
Cf. Habīb-us-Siyar.
339
In this account of the reign of Tekish we have followed
the Habīb-us-Siyar. There is, however, a great
discrepancy in this part of the history, for in one place
Khwāndamīr says that the hostilities lasted only ten
years (A.H. 568–578), when they were brought to a close
by a treaty between the two brothers, in which Tekish
granted the rule of certain towns in Khorāsān to his
brother. An account of Sultan Shāh Mahmūd may be
found in the Tabakāt-i-Nāsiri, trans., i. 245–249.
340
There is a misprint in d’Ohsson, op. cit. i. 180, the date
being given as 1149. He also waged war on the
Assassins in `Irāk and Kūhistan, and took from them
their strongest fort, Arslān Kushāy.
341
Tārīkh-i-Jahān-Kushāy, as quoted by Bretschneider, op.
cit. i. 229, from d’Ohsson.
342
Cf. d’Ohsson, op. cit. i. 180; and Tabakāt-i-Nāsiri, trans.,
i. 253–260.
343
He had solicited the hand of a daughter of the Gūr-
Khān, and, having been refused, had become his secret
enemy. Howorth, J.R.A.S., New Series VIII. p. 282.
344
Cf. d’Ohsson (op. cit. i. 181), who does not quote his
authority.
345
Thus according to d’Ohsson. But De Guignes gives a
very different account of Mohammad’s first Eastern
campaign, which he dates A.H. 604 (1209). He says that
Bokhārā and Samarkand were delivered over to him by
the friendly Turkish princes, that on entering the Kara-
Khitāy territory he gained a splendid victory. Thus the
first disastrous campaign is wholly ignored. De Guignes,
op. cit. i. pt. ii. pp. 266, 267.
346
Cf. De Guignes, i. pt. ii. p. 267. d’Ohsson says as far as
Uzkend, op. cit. p. 182.
347
The name of this famous conqueror has been spelled in
many different ways,—e.g., Genghiz (De Guignes),
Gengis (Voltaire, in his tragedy of that name), Zingis
(Gibbon), Tchinguiz (d’Ohsson), etc. We have adopted
the one which most nearly approaches the Turkish and
Persian pronunciation of the name. For authorities we
would refer the reader to Sir H. Howorth’s History of the
Mongols, part i. (1876); R. K. Douglas, Life of Jinghiz
Khān (1877); an article by same author in the
Encyclopædia Britannica; Erdmann’s Temudschin der
Unerschütterliche (1862); and d’Ohsson and De Guignes
(vol. iv.). The principal original sources for the history of
Chingiz Khān are: (1) the Chinese account of a
contemporary named Men-Hun, which has been
translated into Russian by Professor Vassilief, and
published in his History and Antiquities of the Eastern
Part of Central Asia (see Transactions of Oriental Section
of the Russian Archæological Society, vol. iv.); and (2)
the Tabakāt-i-Nāsiri of Juzjānī, translated by Major
Raverty. This important work comprises a collection of
the accounts of Chingiz Khān written by his
Mohammedan contemporaries. Other Chinese and
Persian sources might be mentioned, but the above are
the most important.
One very important authority for the Mongol period is
the compilation, from Chinese sources, by Father
Hyacinth, entitled History of the first four Khāns of the
House of Chingiz, St. Petersburg, 1829. This Russian
work is comparatively little known outside Russia. Both
Erdmann and d’Ohsson often lay it under contribution. It
may be added that Sir Henry Howorth, in his first
volume on the Mongols (published in 1876), gives a
complete bibliography of all the available sources for the
history of Chingiz and his successors.
348
M. Barthold, of the St. Petersburg University, has
devoted much time to the study of the Mongol period in
Central Asia, the fruits of which he has not yet published
on an extended scale, though some shorter articles of
great value have appeared in Baron Rosen’s Zapiski. The
expeditions of Chingiz Khān and Tamerlane were
admirably treated by M. M. I. Ivanin in a work published
after his death, entitled On the Military Art and
Conquests of the Mongol-Tatars under Chingiz Khān and
Tamerlane, St. Petersburg.
349
Since the discovery and decipherment of the Orkon
inscriptions it may be regarded as certain that the form
Khitan, or Kidan, is but the Chinese transcription of the
word Kitai, which is the name of a people, most
probably of Manchurian origin, who, as is well known,
ruled over Northern China during the tenth, eleventh,
and twelfth centuries. It was borrowed by some of the
tribes inhabiting those parts. Cf. note on p. 106 of vol. x.
of Baron Rosen’s Zapiski, article by M. Barthold.
350
Precisely the same thing occurred in the case of the
Yué-Chi and the Kushans.
351
This admirable summary is taken from S. Lane-Poole’s
Catalogue of Oriental Coins in the British Museum, vol.
vi. (also reprinted in his Mohammedan Dynasties, pp.
201, 202). It is a condensation of what may be read in
great detail in Howorth’s Mongols, vol. i. pp. 27–50. Cf.
also De Guignes, vol. iv. p. 1 et seq.; and d’Ohsson, vol.
i. chaps. i. and ii.
352
For information with regard to this name, cf. d’Ohsson,
op. cit. vol. i. pp 36, 37, note.
353
Thus according to the Chinese authorities. The
Mohammedan historians give the date of his birth as A.H.
550 (1155).
354
The above remarks on the Mongols have been translated
from an article in Russian by M. Barthold in Baron
Rosen’s Zapiski, vol. x. (St. Petersburg, 1897) pp. 107–8.
355
Rashīd ud-Dīn, Jāmi`-ut-Tawārikh, Berezine’s ed. i. 89.
356
The Chinese and Persian authorities are here again at
variance.
357
They had been converted to Christianity by the
Nestorians at the beginning of the eleventh century. See
very interesting note in d’Ohsson, op. cit. i. p. 48. This
Toghrul received the title of Oang, or King, and called
himself Oang-Khān. The similarity of this in sound to the
name Johan, or Johannes (John), led to the fabulous
personage so familiar in Marco Polo and other travellers,
as Prester John. Cf. Yule’s Cathay and Marco Polo,
passim.
358
Cf. d’Ohsson, i. p. 47.
359
Cf. S. Lane-Poole, loc. cit.
360
The exact date is uncertain.
361
This word may be read either Kuriltāy or Kurultāy. Cf.
Pavet de Courteille, Dictionnaire Turk-Oriental, p. 429.
362
Cf. d’Ohsson, i. 86.
363
Ibid. p. 89.
364
Cf. Howorth, J.R.A.S., New Series VIII. p. 283.
365
The above facts are from the Jahān-Kushāy. Cf.
Bretschneider, op. cit. i. 230, 231; the Tarikh-i-Rashidi,
p. 289; and d’Ohsson, op. cit. i. 166 et seq.
366
Cf. d’Ohsson, i. 170 et seq.; Bretschneider, op. cit. i.
231.
367
This occupied him between the years 1210 and 1214.
368
S. Lane-Poole, loc. cit. See also Gibbon’s 64th chapter.
369
Cf. Bretschneider, loc. cit.; and on the subject of the
religious tolerance of Chingiz, Gibbon, chap. lxiv.
370
Cf. d’Ohsson, i. 204.
371
He had put his former ally `Othman to death in A.H. 607
(1210). See d’Ohsson, i. 183.
372
Abū-l-Ghāzi, ed. Desmaisons, p. 99.
373
Abū-l-Ghāzi, ed. Desmaisons, p. 100.
374
Abū-l-Ghāzi, loc. cit.
375
Abū-l-Ghāzi, pp. 101–103 of Desmaison’s text.
376
The route he took was Kazwīn, Gilān, and Māzenderān
(Tarikh-i-Mukīm Khānī).
377
He is said to have died a lunatic. The island in question
has long since been swallowed up by the sea. Cf.
Tabakāt-i-Nāsiri, Major Raverty’s trans., vol. i. p. 278,
note.
378
We refer the reader especially to Müller’s Geschichte des
Islams, pp. 213–225.
379
Mohammedan Dynasties, p. 204.
380
The best account of this offshoot is to be found in an
excellent paper entitled “The Chaghatai Mughals,” by
W. E. E. Oliver, in the Journal of the Royal Asiatic
Society, vol. xx. New Series, p. 72, sec. 9. It will be
found in a condensed form in Ney Elias and Ross’s
Introduction to the Tarikh-i-Rashidi, or “History of the
Mughals of Central Asia.”
381
Vide ante on p. 155.
382
In the valley of the Upper Ili, near the site of the
present Kulja.
383
During the reign of Chaghatāy Khān a curious rising
occurred in the province of Bokhārā. A half-witted sieve-
maker, from a village near Bokhārā, managed by various
impostures to gather round him a number of disciples
from among the common people, and so numerous and
powerful did they become that in 630 (1232) they drove
the Chaghatāy government out of the country, and,
assuming the government of Bokhārā, proceeded to put
to death many of its most distinguished citizens. They at
first successfully repulsed the Mongol forces sent against
them, but were finally vanquished, and order was again
restored in Bokhārā. For this episode consult Vambéry,
op. cit. p. 143 et seq.; Major Price’s Mohammedan
History, iii. 2.
384
Tarikh-i-Rashidi, Introduction, p. 32.
385
Chaghatāy is said to have died from grief at his brother’s
death (Habīb-us-Siyar).
386
For historical data we have already referred the reader
to Mr. Oliver’s paper and Vambéry’s Bokhara. S. Lane-
Poole, in his Mohammedan Dynasties, gives a list of
twenty-six Khāns of this house who ruled in Central Asia
from A.H. 624 to 771 (A.D. 1227 to 1358), i.e. 140 years.
The Zafar-Nāmé of Nizām Shāmī (see note below, p.
168) gives a list of thirty-one Khāns of this line.
387
Cf. Müller’s Geschichte des Islams, ii. p. 217.
388
In A.H. 671 (1273) Bokhārā was sacked by the Mongols
of Persia (Müller, op. cit. ii. p. 260).
389
Bokhara, pp. 159–60.
390
This Khānate embraced the present Zungaria and the
greater part of Eastern and Western Turkestān; but the
exact meaning of this geographical term is still
undetermined. The subject has been fully discussed in
the Tarikh-i-Rashidi (passim). Cf. also Bretschneider, op.
cit. ii. 225 et seq.
391
See Tarikh-i-Rashidi, Introduction, p. 37.
392
The Calcutta text of the Zafar-Nāmé of Sheref ud-Dīn
`Alī Yazdī, the famous biographer of Tīmūr, reads
throughout Karān. S. Lane-Poole, op. cit., gives the date
of his accession as 744 (A.D. 1343),—upon what
authority it is not clear. Price (following the Khulāsat ul-
Akhbār) is in agreement with the Zafar-Nāmé. We are,
moreover, expressly told that he ruled fourteen years,
and died in 747.
393
Zafar-Nāmé (ed. Calcutta), i. p. 27.
394
This took place in the plains round the village of Dara-
Zangi (Zafar-Nāmé, ii. p. 28).
395
The third son of Chingiz, who had inherited the kingdom
of Mongolia proper.
396
Zafar-Nāmé (ed. Calcutta) reads Dānishmand Oghlān.
397
Perhaps a corruption of the older form Berūlās.
398
The modern Shahr-i-Sabz.
399
Sheref ud-Dīn affirms that his love of wine was so
inveterate that he was not sober for a week in the whole
year (Zafar-Nāmé (Calcutta edition), i. p. 41).
400
He was born in A.H. 730. In 748 he became Khān of
Jatah; in 754 he was converted to Islām; in 764 he died.
His history, and the story of his conversion, is told at
some length in the Tarikh-i-Rashidi, pp. 5–23.
401
Our readers will have traced for themselves the parallel
afforded by France, exhausted by the horrors of the
Revolution at the outset of Napoleon’s career.
402
The sources for the biography of Tīmūr are plentiful. The
best known, both in the East and in Europe, is the Zafar-
Nāmé, by Sheref ud-Dīn `Alī, of Yezd. This was
completed in 1424 by the order of Ibrāhīm, the son of
Shāh Rukh, the son of Tīmūr. It was first translated into
French in 1722 by M. Petis de la Croix, whose work was
in turn englished shortly afterwards. It is this history
that has served as a basis for all European historians,
Gibbon included. There is, however, an older biography
of Tīmūr, which, owing to its scarcity, is very little
known. The only MS. in Europe is in the British Museum.
It, too, bears the title of Zafar-Nāmé, or Book of Victory.
It was compiled at Tīmūr’s own order by a certain Nizām
Shāmī, and is brought down to A.H. 806, i.e. one year
before Tīmūr’s death. The MS. itself bears the date of
A.H. 838 (1434). Owing to the vast interest attaching to
such a contemporary account, Professor Denison Ross
has undertaken to prepare an edition of the text for the
St. Petersburg Academy of Sciences.
403
He had gained the sobriquet “Leng” from a wound which
caused him to halt through life, inflicted during the siege
of Sīstān (Wolff, Bokhara, p. 243).
404
For example, the names Jalā´ir, Berūlās, and Seldūz are
those of well-known Turkish tribes.
405
According to the Zafar-Nāmé of Sheref ud-Dīn `Alī Yazdi,
and other historians who follow him, Hāji Birlās was the
uncle of Tīmūr. The Zafar-Nāmé of Nizām Shāmī,
however, states that he was Tīmūr’s brother.
406
He was at this period about twenty-seven years of age,
and had served with some distinction under Amīr
Kazghan (Wolff, Bokhara, p. 245).
407
We refer the reader to Gibbon’s 65th chapter for a
striking account of Tīmūr’s wanderings in the desert, and
to Petis de la Croix’s translation of the Zafar-Nāmé for
Tīmūr’s thrilling adventures with his friend Amīr Husayn.
408
Bokhara, p. 244.
409
The famous order of dervishes called Nakshabandi was
founded in Tīmūr’s reign by a certain Khwāja Bahā ud-
Dīn, who died in A.H. 791 (1388). The three saints held in
reverence by the dervishes next after him are Khwāja
Ahrār (whose mausoleum is to be seen a few miles
outside Samarkand), Ishān Mahzūm Kāshāni, and Sūfi
Allah Yār. It is a group of members of this mendicant
brotherhood which forms the subject of the frontispiece
to this work by M. Verestchagin. There are two other
sects of dervishes in Samarkand—(1) the Kādiriyya,
whose founder was `Abd el-Kādiri Gīlāni, and (2) the Alf
Tsāni, an order whereof the founder seems to be
unknown, and which is sparsely represented.
410
“He was of great stature, of an extraordinary large head,
open forehead, of a beautiful red and white complexion,
and with long hair—white from his birth, like Zal, the
renowned hero of Persian history. In his ears he wore
two diamonds of great value. He was of a serious and
gloomy expression of countenance; an enemy to every
kind of joke or jest, but especially to falsehood, which he
hated to such a degree that he preferred a disagreeable
truth to an agreeable lie,—in this respect far different
from the character of Alexander, who put to death Clitus,
his friend and companion in arms, as well as the
philosopher Callisthenes, for uttering disagreeable truths
to him. Tīmūr never relinquished his purpose or
countermanded his order; never regretted the past, nor
rejoiced in the anticipation of the future; he neither
loved poets nor buffoons, but physicians, astronomers,
and lawyers, whom he frequently desired to carry on
discussions in his presence; but most particularly he
loved those dervishes whose fame of sanctity paved his
way to victory by their blessing. His most darling books
were histories of wars and biographies of warriors and
other celebrated men. His learning was confined to the
knowledge of reading and writing, but he had such a
retentive memory that whatever he read or heard once
he never forgot. He was only acquainted with three
languages—the Turkish, Persian, and Mongolian. The
Arabic was foreign to him. He preferred the Tora of
Chingiz Khān to the Koran, so that the Ulemas found it
necessary to issue a Fetwa by which they declared those
to be infidels who preferred human laws to the divine.
He completed Chingiz Khān’s Tora by his own code,
called Tuzukat, which comprised the degrees and ranks
of his officers. Without the philosophy of Antonius or the
pedantry of Constantine, his laws exhibit a deep
knowledge of military art and political science. Such
principles were imitated successfully by his successors,
Shāh Baber and the great Shāh Akbar, in Hindustān. The
power of his civil as well as military government
consisted in a deep knowledge of other countries, which
he acquired by his interviews with travellers and
dervishes, so that he was fully acquainted with all the
plans, manœuvres, and political movements of foreign
courts and armies. He himself despatched travellers to
various parts, who were ordered to lay before him the
maps and descriptions of other foreign countries”
(Wolff’s Bokhara, p. 243).
411
Shāh Rukh was Tīmūr’s favourite son. He derived his
name, which means “King and Castle,” from a well-
known move in chess, which royal game was one of
Tīmūr’s few amusements (Wolff’s Bokhara, p. 244).
412
Cf. Price’s Mohammedan History, iii. 492, quoting the
Khulāsat-ul-Akhbār. As a fact, Pīr Mohammad only
obtained the government of Balkh, and was murdered in
Kandahār in A.H. 809 (1406). Cf. De Guignes, v. 79.
413
Cf. De Guignes, v. 81.
414
De Guignes, v. 81. Khalīl spent some years in
Moghūlistan, but, unable to bear a longer separation
from Shād Mulk, joined her in Herāt. Shāh Rukh gave
him the government of Khorāsān, and he died the same
year (A.H. 812).
415
His astronomical tables are amongst the most accurate
and complete that come down to us from Eastern
sources. They treat of the measurement of time, the
course of the planets, and of the position of fixed stars.
The best editions are those printed in Latin in 1642–48
by an Oxford professor named Greaves, and reprinted in
1767. The remains of his celebrated observatory still
crown the hill known as Chupān Ata in an eastern
suburb of Samarkand.
416
Shāh Rukh’s authority, to judge by the coins which have
come down to us, extended nearly as far as his more
celebrated father’s. We have his superscription on the
issues of mints as widely distant as Shīrāz, Kaswīn,
Sabzawār, Herāt, Kum, Shuster, and Astarābād.
417
Vambéry’s Bokhara, p. 223.
418
Ibid. p. 244.
419
The young prince was born in 1483, the son of `Omar
Shaykh Mīrzā, whom he succeeded in the sovereignty of
the eastern portion of Tīmūr’s dominions. His conquest
of India, and foundation of the Moghul dynasty of Delhi,
do not come within the scope of this work. He was
equally great in war, administration, and literature:
perhaps the most remarkable figure of his age.
420
A.H. 903 (1497).
421
An excellent table, showing the ramifications of the
Tīmūrides, will be found in vol. vii. of the Mohammedan
Coins of the British Museum.
422
In the case of possessive pronouns and verbal
inflexions, for example, we find direct and obvious
imitations of the Turkish grammar.
423
The “Great Caan” of Marco Polo.
424
Cf. Bretschneider, op. cit. ii. pp. 139, 140.
425
Cf. Bretschneider, loc. cit.
426
Idem. Tūkā Tīmūr, from whom sprang the Khāns of the
Crimea, was the youngest son of Jūjī. Cf. Lane-Poole’s
Mohammedan Dynasties, p. 233. Tokhtamish, the
inveterate foe of Tamerlane, belonged to the Crimean
branch of the Khāns of Dasht-i-Kipchāk. The Khānate of
Kazan was founded in 1439, on the remains of the
Bulgarian Empire, by Ulugh Mohammed of the same
line.
427
Bretschneider, loc. cit.
428
There seems some confusion on this point; I have
followed Veliaminof-Zernof, but Bretschneider does not
call this movement a migration of Uzbegs but a flight of
the White Horde, whom he says were expelled from
their original seats by Abū-l-Khayr. Cf. Tarikh-i-Rashidi, p.
82.
429
The results of M. Veliaminof-Zernof’s careful researches
into the history of the Kazāks were published in three
volumes of the Memoirs of the Eastern Branch of St.
Petersburg Archæological Society, under the title of The
Emperors and Princes of the Line of Kasim. He called
this dynasty the Kasimovski, after Kāsim Khān, the son
of Jānībeg. Cf. also Levshin’s Description of the Hordes
and Steppes of the Kirghiz-Kazaks, St. Petersburg, 1864.
Mīrzā Haydar says: “The Kazāk Sultans began to reign in
A.H. 870 (1465), and continued to enjoy absolute power
in the greater part of Uzbegistān till the year A.H. 940”
(1533). See Tarikh-i-Rashidi, p. 82.
430
Tarikh-i-Rashidi, pp. 82 and 92.
431
Thus according to both the Tārikh-i-Tīmūrī and the
Tārīkh-i-Abū-l-Khayr, quoted by Howorth, op. cit. ii. 695.
432
There is in the British Museum a silver coin of Shaybānī
Khān, dated A.H. 910: Merv.
433
An account of this campaign will be found in the Tarikh-
i-Rashidi, p. 243 et seq. The account of the Emperor
Bāber’s doings at this period are all the more interesting
and valuable from the fact that in the famous Memoirs
of Baber a break occurs from the year 1508 to the
beginning of the year 1519; though an account is also
given in the Tārīkh-i-Ālam-Ārāy of Mirza Sikandar, which
was used by Erskine in his History of India.
434
Lubb ut-Tawārīkh, book III. pt. iii. chap. vi.
435
Cf. Veliaminof-Zernof, op. cit. p. 247.
436
Tarikh-i-Rashidi, p. 245.
437
Cf. Tarikh-i-Rashidi, p. 259. Cf. also Veliaminof-Zernof (p.
353), who bases his statements on the `Abdullah Nāmé
of Hāfiz ibn Tānish. Copies of this valuable work are very
scarce. Its scope and contents have been described
(from a copy in the Imperial Academy in St. Petersburg)
by M. Veliaminof-Zernof. See Mélanges Asiatiques de St.
Petersburg, vol. iii. p. 258 et seq.
438
“The Seven Wells.” V.-Zernof reads Yati Kurūk, which
might mean “the Seven Walls.” The former reading
seems more probable.
439
On the locality of this place, cf. Vambéry’s Bokhara, p.
257.
440
Cf. Tarikh-i-Rashidi, p. 260.
441
Probably to be identified with Panjakand, in the
Zarafshān valley, forty miles east of Samarkand.
442
Some distance north of Bokhārā.
443
Cf. Tarikh-i-Rashidi, p. 261. Howorth (ii. 713) says
`Ubaydullah was in this fort.
444
Mirza Haydar does hesitate to speak thus of the fortunes
of his own cousin Bāber, who had in his opinion sold
himself to the heretic Persians.
445
As Grigorieff suggested, the name Abū-l-Khayride would
fit this dynasty far better than that of Shaybānide.
446
“Bokharan and Khivan Coins,” a monograph published in
the Memoirs of the Eastern Branch of the Russian
Archæological Society, vol. iv., St. Petersburg, 1859. This
excellent and original monograph is extensively laid
under contribution in the present chapter, as it was also
by Sir H. Howorth in his chapter on the Shaybānides, pt.
ii. div. ii. chap. ix.
447
See note, p. 190.
448
The Tazkira Mukīm Khānī, being a history of the
appanage of Bokhārā, makes no mention of Kuchunji, or
Abū Sa`īd, who ruled in Samarkand, though they both
attained the position of Khākān. Cf. Histoire de la Grande
Bokharie, par Mouhamed Joussouf el-Munshi, etc., par
Senkovsky, St. Petersburg, 1824.
449
Their names were—Abū Sa`īd, `Ubaydullah, `Abdullah
I., `Abd ul-Latīf, Nawrūz Ahmed, Pīr Mohammad, and
Iskandar. All are described at some length by Vambéry
and Howorth, the latter basing his account on a great
variety of authorities.
450
P. 284 et seq.
451
Cat. Coins Brit. Mus. vii.
452
Cf. Howorth, ii. 876.
453
Khwārazm had never properly belonged to Chaghatāy’s
territories in Transoxiana, and accordingly it is a
common mint name on coinage of the Golden Horde
(Cat. Orient. Coins Brit. Mus. vii. p. 26).
454
Vide ante, p. 169.
455
His genealogy is very doubtful; but, according to the
best authorities, his ancestor was Jūjī Khān, one of the
mighty conqueror’s sons, who had predeceased him
(note at p. 304 of Vambéry’s History of Bokhara). Cf.
Howorth’s Mongols, part ii. p. 744.
456
Vambéry relates that when, in the great mosque of
Bokhārā, the public prayers were read for the first time
for the new ruler, the whole congregation burst into sobs
and bitter tears (History of Bokhara, p. 319).
457
Vambéry, p. 323.
458
This prince was famed throughout the East for his love
of letters. He was a poet of no mean skill, and an adept
at prose composition. His end was untimely. Enticed to
give a private interview to some of his brother Subhān
Kulī Khān’s party, he was foully murdered by them
(Vambéry, P. 323).
459
Vambéry tells us that he was a man of amazing
corpulence; and one of his historians avers that a child
four years old could find accommodation in one of his
boots! (History of Bokhara, p. 325).
460
Vambéry, History of Bokhara, p. 333.
461
History of Bokhara, p. 339.
462
Page 95, History of Central Asia, by `Abd ul-Kerīm
Bokhārī; translated into French by Charles Schefer, Paris,
1876.
463
This throne was “so called from its having the figures of
two peacocks standing behind it, their tails being
expanded, and the whole so inlaid with sapphires,
rubies, emeralds, pearls, and other precious stones of
appropriate colours as to represent life. The throne itself
was six feet long by four broad; it stood on six massive
feet, which, with the body, were of solid gold, inlaid with
rubies, emeralds, and diamonds. It was surmounted by
a canopy of gold supported by twelve pillars, all richly
emblazoned with costly gems, and a fringe of pearls
ornamented the borders of the canopy. Between the two
peacocks stood the figure of a parrot of the ordinary
size, said to have been carved out of a single emerald.
On either side of the throne stood an umbrella, one of
the Oriental emblems of royalty. They were formed of
crimson velvet, richly embroidered and fringed with
pearls. The handles were eight feet high, of solid gold
and studded with diamonds. The cost of this superb
work of art has been variously stated at sums varying
from one to six millions sterling. It was planned and
executed under the supervision of Austin de Bordeaux,
already mentioned as the artist who executed the
Mosaic work in the Ám Khás” (Beresford’s Delhi, quoted
by Mr. H. G. Keene at p. 20 of the third edition of his
Handbook for Visitors to Delhi, Calcutta, 1876).
Tavernier, who was himself a jeweller, and visited India
in 1665, valued this piece of extravagance at two
hundred million of livres, £8,000,000; Jonas Hanway
estimated it as worth, with nine other thrones,
£11,250,000 (Travels, ii. 383). It stood on a white
marble plinth, on which are still to be deciphered the
world-renowned motto in flowing Persian characters: “If
there be a paradise on earth, it is even this, even this,
even this.”
Agar Fardawsi ba ruyi zamīn ast:
Hamīn ast, hamīn ast, hamīn ast.

464
`Abd ul-Kerīm Bokhārī, p. 106.
465
Vambéry gives the date of this coup d’état as 1737 (p.
343); but `Abd ul-Kerīm Bokhārī makes it follow the
assassination of Nādir Shāh, the epoch of which is not
open to question (p. 110). The dates of events of the
eighteenth century in Bokhārā are strangely uncertain,
contemporary chroniclers rarely deigning to aid posterity
by recording them.
466
“Bi” is an Uzbeg word meaning “judge.” It is not spelt
“bai,” nor does it mean “superior grey-beard,” as M.
Vambéry supposes (History of Bokhara, p. 347).
467
There are many versions of the death of `Abd ul-
Mū`min. The most probable is that related by `Abd ul-
Kerīm of Bokhārā, at p. 115, which is to the effect that
Rahīm Bi had the young prince taken by his own
followers on a pleasure-party, and then pushed into a
well while he was dreamily peering into its depths.
468
This is the highest degree in the Bokhārān official
hierarchy (see Khanikoff’s Bokhara: its Amir and People,
p. 239; Meyendorff’s Voyage à Bokhara, p. 259).
469
Note at p. 120 of Schefer’s edition of `Abd ul-Kerīm
Chronicles.
470
See note at p. 135, ibid. The editor corrects an obvious
lapsus calami,—A.H. 1148 for 1184.
471
With characteristic Pharisaism, `Abd ul-Kerīm tells us
that “fear and terror fell upon Ma´sūm’s brethren, even
as they had possessed the brethren of Joseph. He set
himself to repress their iniquities, and had their
accomplices in crime put to death. He suppressed
prostitution, and tolerated no disorders condemned by
law. Bokhārā became the image of Paradise!” (p. 125).
472
`Abd ul-Kerīm, p. 132.
473
His mother belonged to the noble Salor tribe, ibid.
474
`Abd ul-Kerīm, p. 137. For descriptions of ancient Merv
the reader is referred to vol. v. Dictionnaire
géographique de la Perse, by C. Barbier de Meynard, p.
526; Burnes’ Travels into Bokhara, London, 1834;
Khanikoff’s Mémoire sur la partie Méridionale de l’Asie
Centrale, pp. 53, 57, 113, and 128; and Prof. Shukovski’s
exhaustive work referred to on p. 144—note 3, supra.
475
`Abd ul-Kerīm assures us that this prince was the Plato
of the century, a man full of wisdom and knowledge (p.
135).
476
`Abd ul-Kerīm tells us that the number of families then
deported was 17,000, which would give a total of about
85,000 individuals (p. 142).
477
Vambéry, History of Bokhara, p. 354.
478
`Abd ul-Kerīm (p. 151) gives the date as Friday, 14th
Rajab A.H. 1214. Vambéry is apparently in error in
placing it as 1802 (p. 360).
479
P. 151.
480
See Meyendorff’s Voyage d’Orenbourg à Boukhara en
1820, p. 281; Bokhara: its Amir and People, by
Khanikoff, p. 248; Vambéry, History of Bokhara, p. 360.
481
Amīr Haydar was the first of the present dynasty to
assume the title of Pādishāh.
482
`Abd ul-Kerīm, pp. 154–156. Vambéry gives a different
version (History of Bokhara, p. 462), but we prefer to
follow the native chronicler, who held high diplomatic
posts in Bokhārā at the commencement of the century,
and may be presumed to have had personal knowledge
of the events which he records (see M. Charles Schefer’s
Introduction to his Chronicle, p. iii).
483
`Abd ul-Kerīm, pp. 163, 164.
484
“He always has four legitimate wives: when he wishes to
espouse a new wife he divorces one of her
predecessors, giving her a house and pension
corresponding with her condition. Every month he
receives a young virgin, either as wife or slave. He
marries the slaves who have not given him children,
either to priests or soldiers” (`Abd ul-Kerīm, p. 163).
485
History of Bokhara, p. 365. A long chapter is devoted to
Amīr Nasrullah by Sir H. Howorth. See his History of the
Mongols, part ii. pp. 790–809.
486
“General of artillery.”
487
Khanikoff, Bokhara, p. 296.
488
The Kushbegi was vehemently suspected of removing
him by poison (Khanikoff, p. 298).
Welcome to Our Bookstore - The Ultimate Destination for Book Lovers
Are you passionate about books and eager to explore new worlds of
knowledge? At our website, we offer a vast collection of books that
cater to every interest and age group. From classic literature to
specialized publications, self-help books, and children’s stories, we
have it all! Each book is a gateway to new adventures, helping you
expand your knowledge and nourish your soul
Experience Convenient and Enjoyable Book Shopping Our website is more
than just an online bookstore—it’s a bridge connecting readers to the
timeless values of culture and wisdom. With a sleek and user-friendly
interface and a smart search system, you can find your favorite books
quickly and easily. Enjoy special promotions, fast home delivery, and
a seamless shopping experience that saves you time and enhances your
love for reading.
Let us accompany you on the journey of exploring knowledge and
personal growth!

ebookball.com

You might also like