Economists have been entertaining decompositions like that for several decades now. Some of the favorite types are labor discrimination (gap in the wages of males and females: is it fully explained by the differences in the occupations they take?) and poverty analysis (what are the contributions of the changes in inequality and overall economic growth into poverty changes?) I worked on this many years ago, and Google refers to Tony Shorrocks' paper on Shapley decomposition
The mechanics of Shapley decomposition is as follows. Each contributing factor can be thought of as being "on" or "off", i.e., have only two levels. E.g., in poverty analysis, I can have Sweden income distribution and US income distribution (two levels), Sweden mean income and US mean income (two levels), and two poverty line definitions (half of the median income or a fixed figure like USD 10000 in purchasing parity units). I consider all possible combinations of factors being "on" and "off" (with three factors, I will have 2^3 = 8 possible outcomes, such as poverty rates). I then collect for each factor the marginal effect of the change in the factor, keeping other things equal. Thus, for inequality effect, I would consider the difference in poverty rates induced by changing the inequality from the Swedish levels to the US levels, for four different baseline scenarios (= 2^2 combinations of the remaining two factors, income level and poverty definition). Then I define the contribution of inequality to the poverty rate as the average across those four scenarios. The paper I cited shows the nice mathematical properties of the decomposition, and we had a couple of other papers where we applied what I described above to poverty in Russia (over time and across regions).
In your analysis, you would have to come up with meaningful ways to create the counterfactual results (what would the change in the number of immigration studies be if there were no change in the language? if there were no changes between disciplines? etc.) If you can come up with such counterfactuals, then it will be quite straightforward to apply this methodology.
Another similar decomposition in regression context was proposed by Gary Fields. For a regression equation $y=\beta_0 + \beta_1 x_1 + \ldots + \beta_k x_k + \epsilon$, it essentially boils down to covariance between $y$ and $\hat \beta_k x_k$, as far as I remember. I did not work with this decomposition that much, although I did have it implemented in Stata.