0

This is the original data histogram, I have a data set and plot by DataFrame.hist():

original data histogram

After that I applied the zscore function to my data set and plot this histogram:

enter image description here

After I have applied zscore, I applied standardscaler to my data set and plot this:

enter image description here

I have some questions:

  1. Why didn't my data distribution change?
  2. Why didn't the standard scaler work?
  3. Should I apply both a normalizing function and scaling function?
Glorfindel
  • 700
  • 1
  • 9
  • 18

1 Answers1

1

Neither min-max scaling, nor $z$-scaling change the shape of the distribution. They do what you see on the plots, scale the values ($x$-axis), where with min-max scaling they scale them to have fixed minimum and maximum, while in $z$-scaling to have mean equal to zero and standard deviation equal to one. More than this, the scalings don't even change the relations between the variables. They just scale them.

Doing both min-max scaling and $z$-scaling makes no sense, because they cancel out each other. Pick the one that is the most appropriate.

Tim
  • 108,699
  • 20
  • 212
  • 390
  • thank you for your tips but how can i change my distribution of data – arsalan game Jan 28 '22 at 16:01
  • 2
    @arsalangame what for..? – Tim Jan 28 '22 at 16:06
  • to normalize data – arsalan game Jan 28 '22 at 17:34
  • @arsalangame what for..? – Tim Jan 28 '22 at 17:58
  • @arsalangame do you think that variables need to have a normal distribution before performing an analysis like linear regression? That's a common misperception, perhaps arising from some ways of teaching regression that start with correlations among normally distributed variables. See [this thread](https://stats.stackexchange.com/q/86835/28500) among [many others on this site](https://stats.stackexchange.com/search?tab=votes&q=normality%20assumption%20linear%20regression), for why that's not necessary. – EdM Jan 28 '22 at 19:20