0

I am reading LDA from this website LDA simple steps

It said:

enter image description here

What is the meaning of mean? Is the mean in column or row?

In the website it said mean in columns in X, but I think suppose to be in row.

aan
  • 75
  • 8

2 Answers2

2

According to Standardizing features when using LDA as a pre-processing step, there is no reason for you to standardize data when computing LDA.

To answer your question, however, the mean is meant as the average column-wise of X s.t. the vector $\mu_x$ has size 5 (there are 5 columns) regardless of how many rows you have. In other words, the mean is "the average value of your input samples".

Renthal
  • 326
  • 1
  • 7
  • so my input samples is `6.7 3.0 5.2 2.3 2` and the mean is this average of my input samples? – aan May 14 '20 at 14:13
  • no that is one input sample. Your input samples are the entire table shown there, and the average is the vertical mean (column) of these values – Renthal May 14 '20 at 14:33
  • thanks. But I could not understand why the standardized's mean is the vertical/column of the value which is `6.7 6.3 6.5 6.5 5.9`? – aan May 14 '20 at 15:18
  • the standardized mean must be $0$. The - before standardization - mean, is computed as a vertical average: for column 0 is $\frac{6.7 + 6.3 + 6.5 + 6.2 + 5.9}{5}$. Same applies to the other columns. These however are futher questions, please mark acepted answers once your initial question is answered (and you should do it in your other older questions) – Renthal May 14 '20 at 16:14
  • thanks. The mean of the `vertical/column`. What is the reason we need the `standardized mean must be 0`? – aan May 14 '20 at 16:46
  • that is the goal of the standardization, we don't "need" it to be the case. In fact, for LDA, there is no reason to do it (see my link in the main answer) – Renthal May 15 '20 at 17:02
  • I understand. I am asking for `PCA` case where I need `standarilization my data` – aan May 15 '20 at 20:27
1

What is the meaning of mean? Is the mean in column or row?

The "mean" is for each column (a.k.a "feature" or "parameter"): the average value for the given feature.

It is important to perform "standardization" (a.k.a "center-reduce") so that each feature can be "compared" to each other: when performing LDA, the variance is analysed to a feature that varies between $0$ and $100 000$ would be seen as having a better "discriminating power" compared to a feature that varies between $-1$ and $1$.

See the answer linked to by @Renthal for the math behind it.

Matthieu
  • 318
  • 3
  • 13
  • @Matthiew, thanks. Do you mean that standaridization data, making the data varies between -1 to 1? And also, standardization used in PCA – aan May 14 '20 at 15:31
  • 1
    no, that is normalization. Standardization means making the mean = 0 and standard deviation = 1. PCA is centering the data by subtracting their mean, s.t the mean becomes 0 but does not affect the standard deviation (in the sense that does not enforce a particular value). – Renthal May 14 '20 at 16:16