1

beginner here. I have a spreadsheet with data from 70 videos. Most of the data is binary (ex: is this mentioned or not) with one outcome variable (conversion rate). What would be a general process to analyze, and how do I measure significance?

My approach:

  1. look at the videos for more context
  2. visualize data using box plots to find interesting relationships (I've been told I should figure which predictor variables are most important but not sure how)
  3. run two way t-tests to assess significance (I've been told to consider logistic regression or one-way ANOVA as well, but not too sure)

Any help would be appreciated.

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
John
  • 11
  • 1
  • What does "conversion rate" look like? You're saying it's continuous. If so, logistic regression is out of the question unless it is a continuous proportion or percent bounded [0, 1] or [0, 100]. – Nick Cox Sep 16 '19 at 18:25
  • It's a percentage. View-to-conversion ratio, more specifically. – John Sep 16 '19 at 18:26
  • How many predictors? With just 70 observations you should be wary of any model including more than a few. – Nick Cox Sep 16 '19 at 18:26
  • Thanks for clarifying. There are about 30 predictors. – John Sep 16 '19 at 18:27
  • So far, I've looked at all the data (using box plots mostly) and found 5-6 predictors that seem more impactful towards conversion rate than the rest. I'm not sure if that's a smart way to find the most important predictor variables, vs running a specific test, but I haven't figured how to do that. – John Sep 16 '19 at 18:29
  • Advice and practice varies. I wouldn't want to use more than about 7 predictors myself for that sample size and would prefer fewer. My own view is that such exploratory data analysis is entirely sensible but you can find statistically-minded people saying the opposite. – Nick Cox Sep 16 '19 at 18:30
  • 2
    You don't have much data, and you have a lot of predictors. I'd be wary of the p-values you extracted from this procedure. It would be better to avoid trying to make strong claims based on small sample sizes, even if you find statistically significant results. Focus on exploring the patterns, but beware of according special status to cases where p < 0.05. – mkt Sep 16 '19 at 18:33
  • This sounds like a good place to use [Regression Trees](https://en.wikipedia.org/wiki/Decision_tree_learning) CART would be a good place to start looking at this. Being an R guy, I would do this in R using `rpart` (which implements CART) – G5W Sep 16 '19 at 18:34
  • Thanks everyone. To summarize, are there any tests I should run (or avoid)? I'll take a look at Regression Trees, although I'm not very familiar with R. I have Tableau and Minitab since they've been easy to play with for a beginner. Or, what visuals should I share at the end? There seems to be so much I could do and I want to focus on the 1-2 that would make the biggest impact. – John Sep 16 '19 at 19:03
  • I'm thinking of visualizing box plots for independent binary variables that seemed to show a large difference in mean outcome, then run a t-test to see if the differences are significant. – John Sep 16 '19 at 19:11
  • Box plots don’t customarily show means: depending on your software, adding them may or may not be easy. – Nick Cox Sep 16 '19 at 21:08
  • As an alternative to modelling conversion *rates*, try Poisson rate regression, with numerator as count and (log) numerator as offset. See https://stats.stackexchange.com/questions/264071/how-is-a-poisson-rate-regression-equal-to-a-poisson-regression-with-correspondin – kjetil b halvorsen Sep 17 '19 at 12:26

0 Answers0