Data dimension in machine learning

Question

I am working on Ml project and I Have 4-d dataset. I wanted to use dimensionality reduction algoritm And suddenly a question made me stop

Here is my dilemma

Is there difference between dimension definition in mathematics and machine learning word?

For example if i have variable.

Like 5×60000×900×300.

In mathematics, i say i have 4-D data or 4 dimensional data.
And in each dimension we have different size.
For example in 1st dimension i have size of 5

And in machine learning

What we say?

Is our data's dimension is 4?

If yes, so by using dimensionality reduction algorithm we convert this data to a new 2-D data??

Or no

As i understand in dimensionality reduction algorithm, we try to reduce the size
i.e. Some thing like Reduce 9000 to 50 in a data of 9000×60.

So how can we explain this in a 4-d matrix like previous example of 4-d data 5×60000×900×300

You wouldn’t call that four-dimensional. You would call that a four-tensor. Consider a vector that is $3\times 1$. That is a one-tensor, but what is the dimension, three or one? — Dave, Jan 17 '21 at 15:01
@Dave, i said a vector of 3×1 is 2-d vector with size of 3 in 1st dimension and size of 1 in 2nd dimension, but its look like i am wrong. — maia, Jan 18 '21 at 06:47
Maybe this clears it up: a $3 \times 2$ matrix is a $3 \cdot 2 \times 1$ vector, the dimension is $6$. There is an isomorphism between $\mathbb{R}^3 \otimes \mathbb{R}^2$ (tensor product) and $\mathbb{R}^6$. — displayname, Jan 18 '21 at 14:39

score 2 · Answer 1 · answered Jan 17 '21 at 14:58

From a linear algebra perspective, we are dealing here with vector spaces.

For example, $T : \mathbb{R}^4 \to \mathbb{R}^2$ with $T(x) = Ax$ (transformation matrix). The matrix $A$ has size $2 \times 4$. You enter a 4d coordinate and get a 2d coordinate out. Your input has four features and you transform it into two features. If you have more than one input e.g. 400 inputs, then $AX$ where $X$ is a $4 \times 400$ matrix. This can be also written as $X^TA^T$. Then $X^T$ is $400 \times 4$ (400 inputs, 4 features) and $A^T$ (4 input dimension, 2 output dimension).

When you write $5\times 60000\times 900 \times 300$, this corresponds to the cartesian product $\mathbb{R}^5 \times \mathbb{R}^{60000} \times \mathbb{R}^{900} \times \mathbb{R}^{300} = \mathbb{R}^{5 \cdot 60000 \cdot 900 \cdot 300} = \mathbb{R}^{81000000000}$ i.e. 81000000000 dimensional vector space over the field $\mathbb{R}$.

Besides the regular matrices, that one uses for linear regression or simple feed-forward neural networks, there are also tensors. In ML, a tensor is simply a multi-dimensional matrix. In mathematics and physics tensors have additional properties, but we are normally not interested in transformation laws, etc.

So your $5\times 60000\times 900 \times 300$ would also correspond to a tensor. A tensor of order two is a matrix (here it is 4). PyTorch / Tensorflow calls the order "dimension" / "axis". In deep learning, tensors are useful for performing fast matrix multiplication. For example, consider the input $10 \times 300 \times 2$: 10 inputs, 300 time steps, 2 features. We can perform 10 multiplications on the matrix $300 \times 2$ or a single one on the whole input.

Data dimension in machine learning

1 Answers1