Data compression is a process used to reduce the number of bits used to store a "message". Compression can be lossless or lossy. Lossy compression is an option for audio and visual data, whereas many other applications require lossless compression.
Questions tagged [compression]
47 questions
26
votes
4 answers
Kullback-Leibler divergence WITHOUT information theory
After much trawling of Cross Validated, I still don't feel like I'm any closer to understanding KL divergence outside of the realm of information theory. It's rather odd as somebody with a Math background to find it much easier to understand the…

gazza89
- 1,734
- 1
- 9
- 17
8
votes
1 answer
Comparison of entropy and distribution of bytes in compressed/encrypted data
I have some question which occupies myself for a while.
The entropy test is often used to identify encrypted data. The entropy reaches its maximum when the bytes of the analyzed data are distributed uniformely. The entropy test identifies encrypted…

tommynogger
- 181
- 1
- 5
7
votes
2 answers
Compression theory, practice, for time series with values in a space of distributions (say of a real random variable)
Example of problem: Part of our research team is working on providing operationally wind power forecast. Usually, since there are different time scalse that interest forecast user, a forecast is issued every 15 min (it has even happened that 5…

robin girard
- 6,335
- 6
- 46
- 60
6
votes
2 answers
When and why do we use sparse coding?
Sparse coding is described as "given an input $X$, finding a latent representation $h$ such that h is sparse and the input can be reconstructed as well as possible." (source: https://www.youtube.com/watch?v=7a0_iEruGoM)
My question is why do we want…

Sofia693
- 173
- 8
5
votes
1 answer
Why low rank expansions can exploit the redundancy that exist between different feature channels and filters?
I read Jaderberg et al., 2014 paper about Speeding up Convolutional Neural Network with Low Rank Expansions. In the introduction, it is written in bold font:
Our key insight is to exploit the redundancy that exists between feature channels and…

Kalkaneus
- 143
- 6
4
votes
1 answer
Weights of random sets of random 32-bit strings
I have random sets of $N$ random 32-bit strings,
where all bits are i.i.d. with $\mathbb{P}(0) = \mathbb{P}(1) = 1/2$.
Define
$\ \ \ \ $weight( 32-bit x ) = number of 1 bits in x, i.e. Hamming distance to 0
$\ \ \ \ $minweight( set $X$ ) =…

denis
- 3,187
- 20
- 34
4
votes
4 answers
Ultimate compression algorithm
I was not sure where to put this question, so I put it here. Feel free to move it to another stack exchange site moderators.
Lets say I have a 10 gigs of pictures (or for that matter any type of data, please don't answer the question specifically…

SamB
- 143
- 5
3
votes
0 answers
From a deep learning point of view, is there a lower limit on the number of hours of speech needed to train a neural net
From a deep learning practitioner's point of view, is there a lower limit on the number of hours of speech needed to train a neural net to translate speech to text? An estimate from CMU is 3000-5000 hours for 90% accuracy
commercial quality speech…

Lars Ericson
- 361
- 3
- 8
3
votes
3 answers
How does SVD save space?
We start with an $m \times n$ matrix before SVD. After SVD, we have three matrices of sizes, $m \times m$, $n \times n$ and $m \times n$. How do we save space then if now we have three matrices instead of one and more numbers to store? Why are we…

cerebrou
- 215
- 2
- 6
3
votes
1 answer
Compressed Sensing: Missing Fourier Coefficients?
This question is regarding the problem of reconstructing a signal given only a subset of the Fourier coefficients are observed:
$$\min_x \|x\|_1 \text{ subject to } y = Ax$$
where $x = (x_1,x_2,\dots,x_t)$ is a time-domain representation of our…

Mustafa Eisa
- 1,302
- 9
- 19
3
votes
2 answers
How to compress sets of integer series?
I have a set of integer series $S_1$, $S_2$, ... $S_n$. Each series has 3600 data points. Each data point is a positive integer. Each data point is stored as an unsigned int requiring 4 bytes. So, storing the entire series requires 4 * 3600 bytes.…

Nikhil
- 73
- 6
2
votes
0 answers
Analyzing 3D data: What can be done?
I am new to this kind of analysis, and I want to know what values I can look at in 3D data.
The data itself is a 3D volume $(x,y,z)$ with a floating point value in every coordinate.
It is a hyperspectral image, meaning: the $z$-space is the same…

reBourne
- 33
- 4
2
votes
1 answer
How to compute theoretical compression limit?
Assume we have a sensor field with dimension M*M. In order to apply any data compression technique, first I want to know what is the compression limit or minimum entropy of the entire sensor field. How could I compute the minimum entropy or…

user2384
- 21
- 1
2
votes
0 answers
Optimal compressibility and PCA
I have a population $\mathcal{X}$ of $N$ samples extracted from a multivariate gaussian random variable $\mathbf{x} \in \mathbb{R}^d$. Let us define a transformation $f_{d\rightarrow r} (\mathbf{x}) = \mathbf{x'}$ which performs a dimensionality…

David Shor
- 21
- 2
2
votes
0 answers
Data compression for graph plotting
I am using Google Charts to plot a large data set. The database contains one record for every two seconds; five minutes' worth of data yields 150 records (data points) and the result is acceeptable. However, my client wants to be able to visualize a…

developer1405
- 29
- 2