Principal Component Analysis

The principal components of a collection of points in a real p-space are a sequence of p direction vectors, where the i-th vector is the direction of a line that best fits the data while being orthogonal to the first i-1 vectors.

Machine learningDimensionality reductionUnsupervised learning



Initial contribute: 2020-12-18


Method-focused categoriesData-perspectiveIntelligent computation analysis

Detailed Description

English {{currentDetailLanguage}} English

Quoted from:

PCA was invented in 1901 by Karl Pearson, as an analogue of the principal axis theorem in mechanics; it was later independently developed and named by Harold Hotelling in the 1930s. Depending on the field of application, it is also named the discrete Karhunen–Loève transform (KLT) in signal processing, the Hotelling transform in multivariate quality control, proper orthogonal decomposition (POD) in mechanical engineering, singular value decomposition (SVD) of X (Golub and Van Loan, 1983), eigenvalue decomposition (EVD) of XTX in linear algebra, factor analysis (for a discussion of the differences between PCA and factor analysis see Ch. 7 of Jolliffe's Principal Component Analysis), Eckart–Young theorem (Harman, 1960), or empirical orthogonal functions (EOF) in meteorological science, empirical eigenfunction decomposition (Sirovich, 1987), empirical component analysis (Lorenz, 1956), quasiharmonic modes (Brooks et al., 1988), spectral decomposition in noise and vibration, and empirical modal analysis in structural dynamics.

PCA can be thought of as fitting a p-dimensional ellipsoid to the data, where each axis of the ellipsoid represents a principal component. If some axis of the ellipsoid is small, then the variance along that axis is also small.

To find the axes of the ellipsoid, we must first subtract the mean of each variable from the dataset to center the data around the origin. Then, we compute the covariance matrix of the data and calculate the eigenvalues and corresponding eigenvectors of this covariance matrix. Then we must normalize each of the orthogonal eigenvectors to turn them into unit vectors. Once this is done, each of the mutually orthogonal, unit eigenvectors can be interpreted as an axis of the ellipsoid fitted to the data. This choice of basis will transform our covariance matrix into a diagonalised form with the diagonal elements representing the variance of each axis. The proportion of the variance that each eigenvector represents can be calculated by dividing the eigenvalue corresponding to that eigenvector by the sum of all eigenvalues.

PCA is defined as an orthogonal linear transformation that transforms the data to a new coordinate system such that the greatest variance by some scalar projection of the data comes to lie on the first coordinate (called the first principal component), the second greatest variance on the second coordinate, and so on.

Consider an  data matrix, X, with column-wise zero empirical mean (the sample mean of each column has been shifted to zero), where each of the n rows represents a different repetition of the experiment, and each of the p columns gives a particular kind of feature (say, the results from a particular sensor).

Mathematically, the transformation is defined by a set of size  of p-dimensional vectors of weights or coefficients  that map each row vector  of X to a new vector of principal component scores , given by

in such a way that the individual variables  of t considered over the data set successively inherit the maximum possible variance from X, with each coefficient vector w constrained to be a unit vector (where  is usually selected to be less than  to reduce dimensionality).

First component

In order to maximize variance, the first weight vector w(1) thus has to satisfy

Equivalently, writing this in matrix form gives

Since w(1) has been defined to be a unit vector, it equivalently also satisfies

The quantity to be maximised can be recognised as a Rayleigh quotient. A standard result for a positive semidefinite matrix such as XTX is that the quotient's maximum possible value is the largest eigenvalue of the matrix, which occurs when w is the corresponding eigenvector.

With w(1) found, the first principal component of a data vector x(i) can then be given as a score t1(i) = x(i) ⋅ w(1) in the transformed co-ordinates, or as the corresponding vector in the original variables, {x(i) ⋅ w(1)w(1).

Further components

The kth component can be found by subtracting the first k − 1 principal components from X:

and then finding the weight vector which extracts the maximum variance from this new data matrix

It turns out that this gives the remaining eigenvectors of XTX, with the maximum values for the quantity in brackets given by their corresponding eigenvalues. Thus the weight vectors are eigenvectors of XTX.

The kth principal component of a data vector x(i) can therefore be given as a score tk(i) = x(i) ⋅ w(k) in the transformed co-ordinates, or as the corresponding vector in the space of the original variables, {x(i) ⋅ w(k)w(k), where w(k) is the kth eigenvector of XTX.

The full principal components decomposition of X can therefore be given as

where W is a p-by-p matrix of weights whose columns are the eigenvectors of XTX. The transpose of W is sometimes called the whitening or sphering transformation. Columns of W multiplied by the square root of corresponding eigenvalues, that is, eigenvectors scaled up by the variances, are called loadings in PCA or in Factor analysis.


XTX itself can be recognised as proportional to the empirical sample covariance matrix of the dataset XT.

The sample covariance Q between two of the different principal components over the dataset is given by:

where the eigenvalue property of w(k) has been used to move from line 2 to line 3. However eigenvectors w(j) and w(k) corresponding to eigenvalues of a symmetric matrix are orthogonal (if the eigenvalues are different), or can be orthogonalised (if the vectors happen to share an equal repeated value). The product in the final line is therefore zero; there is no sample covariance between different principal components over the dataset.

Another way to characterise the principal components transformation is therefore as the transformation to coordinates which diagonalise the empirical sample covariance matrix.

In matrix form, the empirical covariance matrix for the original variables can be written

The empirical covariance matrix between the principal components becomes

where Λ is the diagonal matrix of eigenvalues λ(k) of XTX. λ(k) is equal to the sum of the squares over the dataset associated with each component k, that is, λ(k) = Σi tk2(i) = Σi (x(i) ⋅ w(k))2.



Zhen Qian (2020). Principal Component Analysis, Model Item, OpenGMS,


Initial contribute : 2020-12-18


QR Code


{{'; ')}}



Drop the file here, orclick to upload.
Select From My Space
+ add


Cancel Submit
{{htmlJSON.Cancel}} {{htmlJSON.Submit}}
{{htmlJSON.Localizations}} + {{htmlJSON.Add}}
{{ item.label }} {{ item.value }}
{{htmlJSON.Cancel}} {{htmlJSON.Submit}}
名称 别名 {{tag}} +
系列名 版本号 目的 修改内容 创建/修改日期 作者
摘要 详细描述
{{tag}} + 添加关键字
* 时间参考系
* 空间参考系类型 * 空间参考系名称

起始日期 终止日期 进展 开发者
* 是否开源 * 访问方式 * 使用方式 开源协议 * 传输方式 * 获取地址 * 发布日期 * 发布者

编号 目的 修改内容 创建/修改日期 作者

时间分辨率 时间尺度 时间步长 时间范围 空间维度 格网类型 空间分辨率 空间尺度 空间范围
{{tag}} +
* 类型

* 名称 * 描述
示例描述 * 名称 * 类型 * 值/链接 上传

{{htmlJSON.Cancel}} {{htmlJSON.Submit}}
Title Author Date Journal Volume(Issue) Pages Links Doi Operation
{{htmlJSON.Cancel}} {{htmlJSON.Submit}}
{{htmlJSON.Add}} {{htmlJSON.Cancel}}


Authors:  {{articleUploading.authors[0]}}, {{articleUploading.authors[1]}}, {{articleUploading.authors[2]}}, et al.

Journal:   {{articleUploading.journal}}

Date:   {{}}

Page range:   {{articleUploading.pageRange}}

Link:   {{}}

DOI:   {{articleUploading.doi}}

Yes, this is it Cancel

The article {{articleUploading.title}} has been uploaded yet.

{{htmlJSON.Cancel}} {{htmlJSON.Confirm}}