I'm a Statistics student, and I'm thinking of writing my master's thesis on clickstream data analysis.
For my analysis I have a pretty big dataset (80 million rows), each of them being a click "impression". The dataset is from a news website and includes information such as:
- User ID when logged on the website
- User ID when NOT logged on the website (like a cookie)
- Time of the visit (hour and date)
- Link of the visited webpage
- "Section" of the link (for example, Sport, News, ... with many categories)
- Number of clicks which led the user to land on that page
What I'd like to do with this data is find the probability that a new user would click on a given new article, in order to recommend the user what to read next. I have in mind something like a score.
Doing my research I found out that a common way to tackle this type of issues is with association rules, path analysis or collaborative filtering.
What I'd like to know is: is it possible to approach the problem with "classic" data mining/machine learning techniques? I'm talking about GLMs, decision trees, neural networks, ... and other similar algorithms for supervised learning.
I ask the question because being each row an impression I have some "path" for each user and I'm not sure if it would not be statistically correct to apply one of the models I mentioned.