Sum of individual parts not adding up to the whole

Question

I am analyzing the WoW change in conversion rate (visitors who booked / visitors).

Let's say the WoW change from week 1 to week 2 was -10% (dropped 10%). Now, I want to know what/where this drop in conversion rate came from. I have a hunch/business knowledge that this drop is coming from northeastern states in the US. So, I remove all northeastern states from the data and check the WoW drop. I see, now the drop is just 1%. That means 90% of the WoW drop is coming from NE states. Does this make sense? Is it mathematically correct?

Second part of my question is, I don't get the same answer if I do the same process as above one state at a time. I don't understand why that is the case.

For example, let's say I remove NY first and see the change, it's 5% (so that's 50% of the drop coming from NY). Second, I remove New Jersey, and I get x%, third I remove Massachusetts and I get x%, and so on… If I repeat this process for all NE states and I add up the effect from each one of them, I get a total of 75% (lower than the 90% I got from the first method, where I removed all NE states at once).

Why this is the case? Any help to make me understand this is much appreciated. Note that visitors in one state is not in any other state. Thanks!

Hi: I think ( didn't check this but a simple example would probably show it ) you're one at a time thing isn't working because each state might have a different number of visitors booked, so, when you calculate the resulting "overall percent" drop, the denominator is different each time. you'd have to weight each state's drop by the number of people in the state if you wanted the "one state at a time" approach to give the same results as the "all at once" approach. — mlofton, Feb 25 '21 at 00:52
Thanks . That makes sense. so, removing all states at once is the right approach ? If yes, how can I get the effect of individual states. Can I remove one state, then two, three , etc.. and see the incremental impact for each state ? — bp0308, Feb 25 '21 at 02:04
When you measure the effect of each state *additively* rather than multiplicatively, everything will add up correctly. — whuber, Feb 25 '21 at 13:46
@bpo308: I think what huber is saying is to not use percentages. That's one good suggestion. Removing the states all at once ( in the percentage framework ) is mis-leading because of the possible differences in numbers of bookings. Think of the extreme case where NY has 1 million bookings and New jersey has 10. Then the effect might be due to NY and not the Northeast. So, if you accept the lack of additivity, then looking at individual percentage state effects is better info than adding the states up and then looking at percentages. So, I would stick with the individual state approach. — mlofton, Feb 25 '21 at 14:27
Thanks !. @whuber How do I do this additively Vs. Multiplicatively ? Does additive mean; I remove NY first and let's say for example 50% of the WoW drop is from NY. Then I remove NY & NJ and see that 60% of the drop is from NY & NJ together. That means NY contrbutes 50% and NJ contributes 10% of the drop . Next I remove NY, NJ, MA And so on ... I can continue like this until reach close to a 100% of the WoW drop . Is this reasonable ? — bp0308, Feb 25 '21 at 18:33
"Additively" means you don't measure differences as percents: you just look at the numbers. — whuber, Feb 25 '21 at 18:52
@whuber sorry , not sure if I understood. What I am analyzing is weekly booking rate (visitors who booked / visitors). Goal is to identify where the WoW change is coming from . I want to say x% of the WoW change came from NY . If I just look at the raw numbers , how can I get to this ? For example , week 1 the numerator & denominator 100 &1000 and for week 2 120 & 1500 . I can break out the numerator & denominator raw numbers by state ; but how does that help ? Can you please give me an example to illustrate how the additive method you are proposing will get me the result I want ? . Thanks ! — bp0308, Feb 25 '21 at 23:23
Hi: Say, in the first week to second week, the total went from 1000/10000 to 500/10000. So, it went from 10 percent to 5 percent. But say new york went 200/2000 to 200/4000. So, new york went from 10 percent to 5 percent. So, I would claim that ny's effect is (10-5) percent multiplied by 3000/10000 = 1.5 percent because the denominator needs to be taken into account. The 3000/10000 comes from averaging the ny denominator over the two periods and then dividing by averaging the denominator in the total over the two periods. There may be other ways to do it but that seems reasonable to me. – — mlofton, Feb 26 '21 at 05:19
@mlofton Thanks for the explanation. Are you saying NY accounted for 1.5% of the overall drop (50%) in absolute terms or relative ?. Let me explain how I am doing this taking your example above. The rate went down 50% WoW (10% to 5%) . In order to identify NY's contribution; I will remove NY data. Overall drop is still 50% (after removing NY data). So, I will say NY had no impact on overall drop . Sure, NY itself dropped 50% WoW; but even without NY the overall drop would have been the same. Does this make sense? — bp0308, Feb 26 '21 at 17:41
@bpo308: To be honest, I have no experience with this sort of thing and was just trying to use whatever little common sense I have. So, I'm not claiming that I know what I'm doing. But, to me, saving that the rate had a 50 percent drop is misleading because if it went from 40 to 20, that would be a 50 percent drop also. I see what you're doing but I would stay away from the process of taking percentages of percentages. In the original example, I had, I would say that NY originally contributed 5 percent * 1/5 (1/5 comes from 2000/10000 ) = 1 percent. — mlofton, Feb 27 '21 at 17:54
Then, in the second week, NY contributed 5 percent * 4/10 ( 4/10 comes from 4000 over 10000 ) = 2 percent. So, first NY constributed 1 percent to the 10 percent drop. Then, in the second week, it contributed 2 percent to the 5 percent drop. I don't think taking percentages of percentages ( like in the case where you get 50 percent ) is informative because the number of bookings it can be constantly changing and taking percentages gets rid of that information. Maybe someone else can comment because this is somewhat interesting but I don't feel confident giving advice. — mlofton, Feb 27 '21 at 17:57
@mlofton Thanks for sharing your thoughts . Appreciate it. Part of my job is to monitor the booking rate changes WoW. So, when booking rate changes , let's say from 10% to 12% , we say it went up 20% . I know, it's percentage of percentages ; but that's how it has been communicated. Booking rate improvement goals are also set in terms of percentages; for example improve the booking rate 30% (from 10% to 13%). Similarly there are goals for visitors (denominator of the booking rate calculation). — bp0308, Feb 27 '21 at 21:17
right. I think it's okay when you quote it that way. ( 10 to 12 is 20 percent ). But when you want to look at effects of states, then it gets a little more obscure because you're not including the number of people. That's why I think "weighting" the effect by the number of people in the state, gives a better picture of the effect of that state. Notice that the ny percentages in my example went down when we multiplied by the fraction of people in ny compared to total number of people. So, percentages of percentages could be okay in terms of motivations but I don't think for effects. — mlofton, Mar 01 '21 at 01:47

Sum of individual parts not adding up to the whole

0 Answers0