1

I am not a professional statistician and not a native English speaker, but I use simple statistics as a journalist. I am looking for a method to find a relationship between an ordinal variable with multiple choices and a binary one. I thought about Spearman’s ρ first, but I don’t know how to combine this ρ with multiple choices.

To be more detailed: I am investigating a dataset about some companies (namely, real estate agents). For each company, there is a data about typical budget(s) of its sales, given as an ordinal with four levels, like this:

  1. Under €1,000,000
  2. €1,000,000 to €5,000,000
  3. €5,000,000 to €10,000,000
  4. More than €10,000,000

The peculiarity is that a company can work in one or several segments, and in the second case, the segments are always adjacent. I.e., an agent may make deals in categories 1 and 2, or 2, 3, and 4, but not in 1 and 3 only. Some companies perform in all four segments.

Second, for each company there are some attributes given as binary values, for instance:

  • Does this company use a particular business software?
  • Did the sales of this company increase last year?

I am looking for a method to find a relationship between these binary data and company’s budgets. I hope to make conclusions like “The agents in higher price segment tend to use X software more than in the lower ones”.

If Spearman’s ρ can be applied, then how? If not, then which method is suitable?

1 Answers1

0

Normally, is this cases, what I do is regular exploratory analysis, using specially tables about the categorical variables ("confusion matrices"), as a correlation coefficient might be a little tricky to interpret.

Anyway, your question goes in this line: Correlations with categorical variables

Bruna w
  • 513
  • 2
  • 9
  • thank your for your answer, but I am afraid I didn’t fully get the point. Your link leads to a discussion targeting R programming primarily, but this is a bit Greek to me. I calculate statistics in Excel, though it may sound outrageous to true experts. Also, you have mentioned confusion tables, but I fail to understand how to apply them in my case, since I have no predicted values. And I suppose that with confusion tables the ordinal value will turn to a categorical, and this is what I try to avoid. –  Nov 21 '17 at 11:42
  • ok, sorry, I didn't really meant confusion tables, I was trying to say you could just obtain a table of one categorical variable vs other categorical variable and see the counts of each variable in each category, can you understand? – Bruna w Nov 21 '17 at 12:20
  • about the linked question, you can see that there is a discussion about the chi-squared test, which will tell if there is some association between two variables that are not numerical. I don't really know how to perform it on excel, neither recommend it, but I'm sure you can find some formulas in the internet for that. – Bruna w Nov 21 '17 at 12:24