I'm new to statistics.
I created a data set of around 10000 observations and wanted to examine the relation between a variable A with continuous values between 0 and 10 to another B which can assume continuous values between 0 to 1.
However, I noticed that my distribution of the independent variable is heavily left skewed; in over half of the observations it is clumped at 9-10. This is due the nature of the data; I can't collect more observations.
I'm at a bit of a loss how to proceed. Here's what my naivete came up with:
- bin the independent variable into several classes and use under/oversampling techniques
- resample from the observations in a specific ratio to get to a more normal distribution
Would this make sense? Are there other ways to deal with this?