1

Apologies for the rudimentary question. I'm taking on a project at work that's a bit out of my wheelhouse and I want to bounce my ideas off of those more experienced than myself.

We use Salesforce.com at the software company where I work, and I want to identify which lead behaviors (whitepaper downloads, demo views, webinar attendances, etc.) are predictive of those leads turning into qualified sales opportunities. The idea is that we can use this data to create a model, on which we'll base a scoring model going forward. I've identified binary logistic regression, using stepwise selection, as the best choice, based on my research.

Essentially, my thinking is that the dependent variable (opportunity status) is binary (Opportunity = 0, Not an Opportunity = 1), which would indicate that logistic regression would be the best approach. Also, I'm not sure which behaviors and data points will ultimately be predictive of the lead becoming an opportunity, so stepwise selection seems like a good approach.

Can anyone think of a more appropriate analysis technique, or am I on the right track?

ksfowler
  • 23
  • 3
  • 3
    Most now suggest against [[1](http://stats.stackexchange.com/a/20856/4485), [2](http://www.stata.com/support/faqs/statistics/stepwise-regression-problems/)] stepwise selection. LASSO or its generalization in elastic net (for example, `glmnet` in R) are what are recommended if domain-knowledge based model building is difficult. Logistic models are possible with these techniques as well. – Affine Aug 11 '13 at 21:20
  • I think logistic regression sounds fine. – appleLover Aug 11 '13 at 21:37
  • Side note, I'd reverse the coding of your opportunity variable to match human intuition (unless you had some strong reason not to do so) -- so opportunity = 1, no_opportunity = 0. It'll make use of your model and output measures more intuitive, and avoid hard-coding that variable flip into reporting systems using your model's output down the road. – thomas Aug 12 '13 at 16:57
  • @Affine Thanks for the suggestion. I've decided to use R for this project. – ksfowler Oct 08 '13 at 18:40
  • @tabSF Great suggestion. And now that I've pulled the data from Salesforce, I've discovered it's already coded as you suggested. – ksfowler Oct 08 '13 at 18:41

3 Answers3

3

If the outcome variable $Y$ is truly all-or-nothing, like falling off a cliff, then binary logistic model is likely to be appropriate. But stepwise variable selection is an invalid method.

Frank Harrell
  • 74,029
  • 5
  • 148
  • 322
  • Thanks everyone for your responses. Concerning several comments, whether a lead becomes an opportunity is a definite yes/no opportunity. Leads that don't convert to an opportunity are eventually marked as dead. Also, I've done a little more research and see the error of my ways with the use of stepwise selection. Seriously, thanks everyone for taking the time to help out a newbie. – ksfowler Oct 08 '13 at 18:37
0

First, I agree with earlier answers and comments about stepwise.

Second, I am not so sure that binary logistic is the best choice - it may be, it may not. Is "qualified sales opportunity" really a yes/no variable? Might some sales be larger than others? Might some opportunities fail? Perhaps others become long term? All these would argue against binary logistic regression.

Classification trees are another method you might consider, especially if your N is reasonably large. In R I like the party package but other tools are also good. There are also elaborations on trees such as bagging and boosting that may work well since your goal is prediction.

Peter Flom
  • 94,055
  • 35
  • 143
  • 276
0

I would say that logistic regression can be used.

But you could also think other alternatives:

-ordinal logistic regression where there are different opportunities which have rank ordering between them (0=not good, 1=might be a good, 2=good with almost certainly, 3=expectionally good opportunity ect)

-Just give these opportunities customer specific value metric and use some regression method with non-binary dependent variable

Analyst
  • 2,527
  • 10
  • 11