Linear prediction question
Posted: Fri Mar 23, 2012 8:26 am
Hey guys. Newbie here.
Sorry that my first post is a request for help but I'm almost literally going insane over this.
I've been programming a sound format that achieves 32:9 compression by the use of linear prediction.
It's basically a challenge to myself to see whether a format that achieves such compression can be done with little loss in signal.
The general idea for the format is: [beware really verbose descriptions]
While I've read a bit of 'Numerical Recipes in C', in particular the linear prediction section, it never covers what I'm trying to do.
What it 'talks' about is how to find the prediction coefficients for a given set of data, but it never mentions how to reduce these predictors to, say, 16 or so in such a way that the error is minimized.
Any ideas on the latter?
Sorry that my first post is a request for help but I'm almost literally going insane over this.
I've been programming a sound format that achieves 32:9 compression by the use of linear prediction.
It's basically a challenge to myself to see whether a format that achieves such compression can be done with little loss in signal.
The general idea for the format is: [beware really verbose descriptions]
Code: Select all
//! These types just make the code easier to understand
//! ... for me, anyway =P
typedef unsigned char u8; //! Unsigned 8-bit
typedef unsigned short u16; //! Unsigned 16-bit
typedef unsigned int u32; //! Unsigned 32-bit
typedef unsigned long long u64; //! Unsigned 64-bit
//! Sound type
typedef struct {
u32 numBlk; //! Number of sample blocks
float pCoef[16][2]; //! Global prediction coefficients [16 pairs]
//! Each sample block is composed of 32/8 [4] frames of sample data.
//! Each frame, in turn, is composed of 64/4 [16] sample nibbles and
//! an 8-bit header:
//! Bit0-3: Coefficient selection
//! Bit4-7: Shift scale
//! Basically, the 'coefficient selection' chooses the pair of
//! coefficients [from the header] that best predicts the data in
//! the frame, while the 'shift scale' provides error correction
//! scaling. [the data is in the range of -32768..+32767]
//! Since there's only 16 coefficient pairs, prediction will never
//! be exact, so there's error correction.
//! Each nibble is a value [-8..+7] that is multiplied by the shift
//! scale [which is really 2^n] in order to more closely match the
//! original sound data.
//! The data is packed in a nifty way so that it is properly aligned
//! [if you remember, a frame has an 8-bit header] to a DWORD.
struct {
u8 frameHeader[32/8]; //! 32 = sizeof(DWORD). Four frames.
u64 frameData [32/8]; //! Ditto.
} block[];
} sound_t;
//! Decoding the sample data
void Decode(sound_t *snd, float *dest) {
//! storage for the last two samples [for prediction]
float old[2] = {0.f,0.f};
//! decode each block
for(int blk = 0 ; blk < snd->numBlk ; blk++) {
//! decode each frame
for(int frm = 0 ; frm < 32/8; frm++) {
//! read frame header
u32 frameHead = snd->block[blk].frameHeader[frm];
u32 coefSelect = frameHead & 0xF;
float errorScale = pow(2.f, (float)(frameHead>>4));
//! read frame data
u64 frameData = snd->block[blk].frameData[frm];
//! make a shortcut to the coefficients
//! [makes code slightly cleaner]
float *coef = snd->pCoef[coefSelect];
//! decode each sample
for(int i = 0 ; i < 64/4; i++) {
//! find predicted data
float predict = coef[0]*old[0] + coef[1]*old[1];
//! find the sample correction data
float corNib = (float)(frameData & 0xF) * errorScale;
//! find the corrected sample
float sample = predict + corNib;
//! output this sample
*dest++ = sample;
//! shuffle the data
old[1] = old[0],
old[0] = sample;
//! shift out the old nibble to get the next sample
frameData >>= 4ULL;
}
}
}
}What it 'talks' about is how to find the prediction coefficients for a given set of data, but it never mentions how to reduce these predictors to, say, 16 or so in such a way that the error is minimized.
Any ideas on the latter?