0% found this document useful (0 votes)
18 views1 page

ML5-Machine Language Techniques

The document discusses dimensionality reduction techniques which are used to reduce the number of features or columns in a dataset. Popular methods mentioned are Principal Component Analysis (PCA) which finds new vectors that maximize variance and t-Stochastic Neighbor Embedding (t-SNE) which is commonly used for data visualization but also machine learning tasks like feature space analysis and clustering. An example using t-SNE projects the MNIST database of handwritten digits from 784 dimensions to 2 dimensions for visualization purposes.

Uploaded by

adura14
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as XLSX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views1 page

ML5-Machine Language Techniques

The document discusses dimensionality reduction techniques which are used to reduce the number of features or columns in a dataset. Popular methods mentioned are Principal Component Analysis (PCA) which finds new vectors that maximize variance and t-Stochastic Neighbor Embedding (t-SNE) which is commonly used for data visualization but also machine learning tasks like feature space analysis and clustering. An example using t-SNE projects the MNIST database of handwritten digits from 784 dimensions to 2 dimensions for visualization purposes.

Uploaded by

adura14
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as XLSX, PDF, TXT or read online on Scribd
You are on page 1/ 1

Dimensionality Reduction

As the name suggests, we use dimensionality reduction to remove the least impo
columns) from a data set. In practice, I often see data sets with hundreds or even
features), so reducing the total number is vital. For instance, images can include
matter to your analysis. Or when testing microchips within the manufacturing p
measurements and tests applied to every chip, many of which provide redundan
dimensionality reduction algorithms to make the data set manageable.

The most popular dimensionality reduction method is Principal Component Ana


dimension of the feature space by finding new vectors that maximize the linear v
dimension of the data dramatically and without losing too much information wh
are strong. (And in fact you can also measure the actual extent of the informatio

Another popular method is t-Stochastic Neighbor Embedding (t-SNE), which do


People typically use t-SNE for data visualization, but you can also use it for mach
feature space and clustering, to mention just a few.

The next plot shows an analysis of the MNIST database of handwritten digits. M
digits from 0 to 9, which researchers use to test their clustering and classificatio
a vectorized version of the original image (size 28 x 28 = 784) and a label for eac
Note that we’re therefore reducing the dimensionality from 784 (pixels) to 2 (dim
Projecting to two dimensions allows us to visualize the high-dimensional origina

You might also like