1

I'm really new to this.. I've tried to normalize the data with log, sqrt, Z, normalize function but doesn't work. It stays strongly right skewed. Data has many 0's in it. How to deal with this?

saeedar
  • 11
  • 1
  • 2
  • 2
    There are many posts related to this on site. Here are some that discuss essentially this issue: 1. https://stats.stackexchange.com/questions/120068/convert-poisson-distribution-to-normal-distribution and 2. https://stats.stackexchange.com/questions/124059/how-to-transform-continuous-data-with-extreme-bimodal-distribution and 3. https://stats.stackexchange.com/a/222188/805 and this comment - 4. https://stats.stackexchange.com/questions/113554/can-i-normalize-ordinal-data#comment217628_113554 – Glen_b Nov 18 '17 at 08:17
  • 1
    Suitable advice might be possible if you explain what you're trying to do and why you need to transform. (It would likely also be useful to have more information about what this variable measures and how it's distributed) – Glen_b Nov 18 '17 at 08:20
  • ok I found your link to be useful... I was thinking to transform EmployeeExpYears for prediction... What I can understand from the first link is that transformation is not needed for discrete? – saeedar Nov 18 '17 at 08:28
  • 2
    The message is that attempting a transformation with many 0's is pointless ("You cannot make discrete data normal") not that it's not needed. It's unclear what you're doing that means it requires transformation. Please respond to my second comment above; in the absence of discussion of those issues, it's difficult to say much of value. Why are you trying to transform something with many 0's? – Glen_b Nov 18 '17 at 08:34
  • I don't know why we transform data. I just knew if we have skewed data, we need to transform it... Is that right? – saeedar Nov 18 '17 at 08:44
  • 1
    No. That's not right. Or, at best, it's extremely incomplete. There are some occasions when you need to transform data. But they are a small subset of the situations in which people think they need to do so. – Peter Flom Nov 18 '17 at 12:56
  • @DICER45 It would be useful to indicate what it is you want to do with the data (what questions you're trying to answer with the data); we may be able to help you figure out what to do. – Glen_b Nov 18 '17 at 13:16
  • my data is having 27 variables and 1200 obs, I have a variable called performance having 3 values -> (good, excellent, outstanding). I'm thinking to predict the performance with other correlated variables. I am able to log/sqrt all other variables except this. I'm dealing with mostly years(0,2..15 etc), everything is discrete... What are the occasions to transform the data? – saeedar Nov 18 '17 at 13:44
  • variable EmployeeExpYears have impact on Performance rating when I ran ran glm found the p value significant – saeedar Nov 18 '17 at 15:07
  • I don't know of any mathematical theories that explain how to take the logarithms or square roots of "good", "excellent", or "outstanding"! The point is that when you use numbers to encode these three adjectives, those numbers have no inherent meaning, and so neither does any transformation of them. You seem to be trying to treat your numerical codes as if they actually were numbers for which addition, multiplication, taking logs, etc., were a meaningful operation. Instead you need to apply statistical procedures for *ordered categorical data.* – whuber Nov 18 '17 at 19:43

0 Answers0