Or as Analytics India Magazine puts it:
Workhorse dimensionality reduction method: simple, fast, and effective. Can be thought of as freely rotating axes to align with directions of maximum variance. I like this summary:
PCA (Principal Components Analysis) gives us our ‘ideal’ set of features. It creates a set of principal components that are rank ordered by variance (the first component has higher variance than the second, the second has higher variance than the third, and so on), uncorrelated (all components are orthogonal), and low in number (we can throw away the lower ranked components as they usually contain little signal).
But I particularly liked this exposition in Towards Data Science.
from sklearn.decomposition import PCA
pca = PCA(n_components=2)
pca.fit(data)
print(pca.explained_variance_ratio_)
print(pca.singular_values_)
pca.transform(data)
See also: Kernel PCA for non-linear problems.
Why was I banging on about transformations? Well, what does this assume about the data?
Linear dimensionality reduction using Singular Value Decomposition projects data into to a lower dimensional space. The input data is centered but not scaled for each feature before applying the SVD.
PCA is a form of unsupervised learning that does not take output labels into account. Other approaches (such as Linear Discriminant Analysis [note: not Latent Dirichlet Allocation]) consider the output as part of the transformation. PCA is also deterministic.
See this discussion.
t-Distributed Stochastic Neighbour Embedding is best understood as a visualisation technique, not an analytical one. This is because it is probabilistic and not deterministic.
The choice of perplexity
and n_iter
matter, and so does the metric
. In practice you will need to experiment with these.
Non-linear dimensionality reduction that tries to preserve both local and global structure. Puts it between PCA and t-SNE.
The choice of n_neighbors
, min_dist
, and metric
matter. In practice you may need to experiment with these.
t-SNE (less so UMAP) requires very careful handling:
Both likely require repeated testing and experimentation.
sklearn.feature_selection
here)sklearn.decomposition
here, especiall SVD)sklearn.manifold
here)sklearn.random_projection
here)sklearn.svm
here)sklearn.ensemble.ExtraTreesClassifier
and sklearn.ensemble.ExtraTreesRegressor
here and here)Dimensionality • Jon Reades