Discretization of skewed data (time durations)

Question

I have data that describes the duration of how long a person views a webpage. This is quite varied and in the context wherein I gathered the data, it was very skewed. People mostly spent short amounts of time in a webpage but sometimes spent a significant amount of time viewing it. I want to discretize the durations (in seconds) into short, medium and long but I don't know how I should do this if the data is skewed.

Initially, I just used tertiles but it seemed kind of off. Tertiles assume equal membership but I'm not sure if this is right because of the skew. Any ideas on a better way of discretizing the values?

EDIT: The reason why I wish to categorize the data is because I want to use it for reinforcement learning. Using the numerical values can increase the search space, so I thought of categorizing the values.

have you considered transforming the data first, maybe taking the log of the data to make the data more 'normal'? — Eric Peterson, Jun 08 '13 at 02:09
i tried taking the log of the data and it does look more "normal" but, how do I go about getting the groupings? — Paul, Jun 08 '13 at 02:14
Why would you want groupings in the first place? It rarely makes sense to categorize continuous data; it may help you to read my answer here: [how-to-choose-between-anova-and-ancova-in-a-designed-experiment](http://stats.stackexchange.com/questions/24077//24080#24080), especially below "update". I readily acknowledge that the context differs from yours, but the idea is that discretizing data isn't generally a good thing to do. — gung - Reinstate Monica, Jun 08 '13 at 02:49
I agree with you @gung. It's my fault for not including it in the description, but the reason why I want to discretize the data is because I'd like to use it as input for a reinforcement algorithm. Using continuous or numerical data would cause a very large state space which could lead to longer time and more examples for the algorithm to converge. — Paul, Jun 08 '13 at 03:12
Please add extra information as an edit to the post, not only in comments. Not everybody reads comments! — kjetil b halvorsen, Mar 22 '21 at 15:27

Discretization of skewed data (time durations)

0 Answers0