0

I'm fairly certain this will be quickly answered but I couldn't find anything that addressed my specific use case.

I'm running a multivariate linear regression using scikit-learn. I created dummies for two variables (is_listed and paid_vs_public) and have one variable that is a numeric value. I am trying to find the relation to the y, called fill-rate.

Here's the output I get below:

[('publish_event_start_delta',0.017666929441105143),
('is_listed', -0.36775097784982064),
('event_paid_type', -2.9127009412292346)]

How do I interpret the event_paid_type coefficient when I've encoded it as :

final_df['event_paid_type'] = final_df.event_paid_type.map({'free event':0, 'paid event':1})
kkk
  • 188
  • 13
Kevin
  • 1
  • 4

1 Answers1

1

The interpretation of the coefficients in the multiple regression is as follows:

Given: $y = 1 + 10x_{1} + 2x_{2}$

Interpretation: If $x_{2}$ is fixed, then for each change of 1 unit in $x_{1}$, $y$ changes by 10 units.

Lets say that publish_event_start_delta and is_listed are fixed and we will vary only event_paid_type. event_paid_type can take two values: either 0 or 1.

y = publish_event_start_delta * 0.0176 + is_listed * -0.3677 + event_paid_type * -2.912

If event_paid_type = 0 your equation will be:

y = publish_event_start_delta * 0.0176 + is_listed * -0.3677

If event_paid_type = 1 you get

y = publish_event_start_delta * 0.0176 + is_listed * -0.3677 + (1 * -2.912)

That means that whenever you have a paid event (and you fix other variables), your $y$ is decreased by 2.912 units.

kkk
  • 188
  • 13