Accidentally asked this question in the general area and was told to ask here, so...
I've been trying to develop a lightweight, relatively-fast-to-decode sound compression format for use in my gaming projects (perfect reproduction is not needed so I only use 16 bit data).
The idea is to split the sound data into 14-sample frames and use linear prediction to reduce the data size by only storing the residuals. To make it even lighter, the residuals are then quantized to 4 bits per sample by reducing their precision so that they are stored as scaled residuals, where the scale is dictated by the frame header. To make the signal less noisy, 16 linear prediction models are generated that best suit the signal in the file.
Each frame ends up being 64 bits (4 bits for the LP model selection, 4 bits for the residual scale (2^n, so only n is stored), and 14*4 bits for the residual data).
What I've been doing is simply solving the Yule-Walker equations for each frame to get the coefficients and then saving them in an array. After calculating the coefficients for all the frames, I then try to create 16 coefficients based on the Euclidean distance from each LP model and averaging them out. That is to say...
Zero-out target buffer
For i = 1, i <= 16, i++
Create and zero-out a temporary buffer for i LP models
For each saved LP model
Find the model in the original target buffer [up to i] with the smallest Euclidean distance between it and this model
Accumulate the coefficients to the temporary buffer
Average out the temporary buffer and store it to the final LP model array [again up to i]
As expected, this doesn't yield very good results at all if only because I imagine it's similar to uniform colour quantization - it has very little relation to the actual data.
After Googling for a few months, the only thing I can come up with is 'sparse convolution' but I'm not entirely sure what is meant by that when it comes to LPC or if it's even what I'm after.
Given these parameters (that is: generate 16 LPC models that will minimize the residual error given that the frames are 14 samples long), how would you go about doing it?