3

How to standardize data by the following example:

+---------------+-------+--------+
|   comments    | input | output |
+---------------+-------+--------+
| min           |     1 |     -1 |
| chosen middle |     2 |      0 |
|               |     3 |        |
|               |     4 |        |
| max           |     5 |      1 |
+---------------+-------+--------+

So we know the min, max values of input and we arbitrary choose "middle". The middle will become 0 in output. Min and max will become -1 and +1 respectively.

I want this to visualize temperature on the map with three different colors.

Update after comments.

The most desired solution would be a single uniform function for all the range [min, max] with three control parameters:

f(min, middle, max)

The constrains are:

  1. the result of the function never goes beyond [-1, +1] range,
  2. function is monotonous, always increasing from min to max.

Something like cubic spline curve (it is just an example).

enter image description here

https://www.desmos.com/calculator/mhsbhxh9hj

Przemyslaw Remin
  • 1,128
  • 10
  • 16
  • Standardize the range $[1,2]$ to $[-1,0]$ and the range $[2,5]$ to $[0,1]$ in any way you like. Consider following each of those with some nonlinear one-to-one transformation of its interval in order to create a suitable color gamut. This is the most general solution. – whuber Nov 26 '19 at 15:02
  • 1
    As I stated, it's no workaround: it's a completely general solution. (For instance, it is the one supported in `ggplot2` for `R`.) If you need something more specific, then please edit your post to include your criteria and constraints. – whuber Nov 26 '19 at 15:15
  • @whuber thank you for calling for precision. I have updated the answer. I would be very grateful, if you might direct me further. Or maybe another call for precision would bring me further to solution. – Przemyslaw Remin Nov 27 '19 at 11:15
  • My first comment describes what many would consider a "single uniform function." I suspect you intend us to understand that the transformation in the $(1,2)$ range and the transformation in the $(2,5)$ range should have similar-looking *formulas.* However, even that is extremely broad. People usually select the transformations with specific graphical objectives in mind, such as optimizing some measure of contrast in the plot or otherwise controlling the visual perception of different colors to improve some aspect of how viewers interpret the plot. – whuber Nov 27 '19 at 17:03
  • Regarding terminology, this is probably closer to normalization than standardization; see e.g. [Statistics How To](https://www.statisticshowto.datasciencecentral.com/normalized/). – Richard Hardy Nov 29 '19 at 09:52

1 Answers1

1

There are an infinite number of monotonic functions that meet your criteria, so I will offer one example which also has some nice smoothness properties. To generate a function with the specified properties, we can start by considering a bijective monotonically increasing mapping $f: [0,1] \rightarrow [-1,1]$ with some "middle" value $0 < m < 1$. The function I propose is:

$$h(z) = \frac{z(1-m) - m(1-z)}{z(1-m) + m(1-z)} \quad \quad \quad \text{for all } 0 \leqslant z \leqslant 1.$$

It is simple to show that this function is monotonically increasing with $f(0)=-1$, $f(m)= 0$ and $f(1)=1$, so it is a bijective monotonically increasing mapping with the appropriate "middle" point. Now, we will scale this function to convert to the requirements specified in your question. If we let $x_0 < x_* < x_1$ denote the minimum, middle, and maximum, respectively, then we can use the conversion:

$$z = \frac{x-x_0}{x_1-x_0} \quad \quad \quad \quad \quad m = \frac{x_*-x_0}{x_1-x_0}.$$

Substituting these values and simplifying gives us the function:

$$\begin{equation} \begin{aligned} f(x) &= \frac{(x-x_0)(x_1-x_*) - (x_*-x_0)(x_1-x)}{(x-x_0)(x_1-x_*) + (x_*-x_0)(x_1-x)} \\[6pt] &= \frac{(x-x_*)(x_1-x_0)}{(x+x_*) (x_1 + x_0) - 2 (x_0 x_1 + x_* x)}. \\[6pt] \end{aligned} \end{equation}$$

This is a nice smooth function that has the properties you have specified in your question. The underlying function is formed analogously to the updating mechanism in Bayes' theorem, where the input $z$ is the prior probability and $1 + \tfrac{1}{2} h(z)$ is the posterior probability (with the likelihood ratio $L = \tfrac{1-m}{m}$).

Ben
  • 91,027
  • 3
  • 150
  • 376