Summing two standardised variables

Question

A question related to this one:

How to sum two variables that are on different scales?

If I have two variables (rainfall and temperature) which has different means and different standard deviations... How do I need to transform two variables so that when I sum the two result is not "driven" by more volatile one. The reason I want to is that I want to create another index which is the sum of rainfall and temperature and I want to study how this index changes.

I did this standardization:

 Rainfall.z = (actual value - minimum value)/(maximum value - minimum value)
 Temperature.z = (actual value - minimum value)/(maximum value - minimum value)

The above standardization brings these variables on a scale 0 - 1. I then added these variables to create an index called

 Planting.Index = Rainfall.z + Temperature.z

The reason I am doing this is because I am studying a location and trying to understand when that location will start planting crop. Planting of the crop will be driven by when rainfall starts + temperature reaches an optimum level.

So in a year when the planting Index reaches the minimum and then starts increasing (i.e. rainfall and temperature reaches a minimum and then start increasing) is when the planting should start.

Can standardized variables be added together?

EDIT

Background of my data. I have 30 years of rainfall, temperature and soil moisture data for a location. I am trying to generate for 30 years, when people should have planted. I know the crop is planted from June onwards when the rainfall kicks in and tmean reaches optimum level

Wouldn't it be better to use data to figure out what a good planting index is? — Andreas Dzemski, Mar 13 '18 at 12:58
I didn't quite understand what you mean. Basically planting can only being when both rainfall and temperature crosses a certain level, Quite often sometimes rainfall crosses that level while temperature is still below the level. That is why I wanted to come with a single index that I can investigate. — 89_Simple, Mar 13 '18 at 13:01
If there is an interaction of this sort then it seems that adding the two quantities doesn't give a good index (index is high if one of the two quantities is high, but you still might not be able to plant). — Andreas Dzemski, Mar 13 '18 at 13:11
I don't think standardizing and adding them is a good approach. Your "actual" and "minimum" value are those by year, month, day? You may end up mixing up data from different "seasons". Growth is mostly moisture driven so I would think an index that gives you an idea how much moisture is in the system would be better... — Stefan, Mar 13 '18 at 13:16
...There are many indices that may be better, e.g. heat:moisture index (dividing daily temperature by precipitation), the [Hargreaves climatic moisture deficit (mm)](https://elibrary.asabe.org/abstract.asp?aid=36722), or a [climate moisture index](https://link.springer.com/article/10.1007/BF01182849), which is precipitation minus potential evapotranspiration. Also [plotting precipitation and temperature in one graph](http://www.zoolex.org/walter.html) (climate diagram by Walter) can help. — Stefan, Mar 13 '18 at 13:16

Bence Mélykúti · Accepted Answer · 2018-03-13T13:28:28.657

1

The normalisation you propose is fine. So is the suggestion on the other page, to centre by sample mean and divide by sample standard deviation (by the uncorrected or the corrected one).

Additional clarifying comments by OP made the following two points redundant for their particular task of finding an optimum in a time interval such that one possesses all the data for the period. The remarks are relevant for on-line detection of some optimum in an ongoing process.

Be aware that just because your samples are currently all in $[0,1]$, they won't necessarily always be because a record (e.g. a temperature record) can be broken in the future.

I would also warn you that what you want to do with your normalised variables afterwards is not straightforward. Hitting the minimum or maximum value of a time series is in general not a stopping time; you never know what the future will be. E.g. if we knew that, then a stock trader could always buy stocks at their minimum price and sell them later at their maximum price. I would additionally compare the current value to historical data (and the time of year) to trigger that decision.

edited Mar 13 '18 at 13:28

answered Mar 13 '18 at 13:02

Bence Mélykúti

445
1
3
10

thank you. I plan to do this each year separately i.e. for each year, standardize the variables , sum them and look at the minima. In this case, does the problem you mention still persists? – 89_Simple Mar 13 '18 at 13:07
1) Do you want to detect when people start planting, 2) do you want to detect when people should have started planting, or 3) do you want to instruct people to start? For 1), statistics doesn't help. For 3), you have the issue I mentioned, that you'd need to be able to see into the future to find the absolute optimum. The data you can physically have is only the past and present. For 2), you can do a _post hoc_ analysis once all the data for a year is in. – Bence Mélykúti Mar 13 '18 at 13:14
I am looking to do (2) i.e. I want to detect when people should have started planting for historical years. To do this, I know the season when the crop is planted (June - September) and I have daily rainfall, soil water, temperature data for all those years. – 89_Simple Mar 13 '18 at 13:22

score 0 · Answer 2 · answered Mar 13 '18 at 13:45

Based on Stefan's comments, instead of summing variables to form an inequality condition, a more natural model would be to combine two inequalities with an AND, so that the soil moisture needs to be in some interval and the temperature needs to be in some other interval. (Probably you need it a little more complicated: the variables are in an optimum window and do not later exit a possibly larger window. E.g. even if the temperature is optimal, it could be important that it doesn't drop below freezing point later.)

An additional statistical problem for a real-world application when to start planting could be to model moisture from temperature and rainfall, e.g. with linear regression. (I assume that the latter two are available online for free, whereas soil moisture is time-consuming for the farmers to measure.)

Summing two standardised variables

2 Answers2