Sorry that my first post is a request for help but I'm almost literally going insane over this.
I've been programming a sound format that achieves 32:9 compression by the use of linear prediction.
It's basically a challenge to myself to see whether a format that achieves such compression can be done with little loss in signal.
The general idea for the format is: [beware really verbose descriptions]
Code: Select all
//! These types just make the code easier to understand
//! ... for me, anyway =P
typedef unsigned char u8; //! Unsigned 8-bit
typedef unsigned short u16; //! Unsigned 16-bit
typedef unsigned int u32; //! Unsigned 32-bit
typedef unsigned long long u64; //! Unsigned 64-bit
//! Sound type
typedef struct {
u32 numBlk; //! Number of sample blocks
float pCoef[16][2]; //! Global prediction coefficients [16 pairs]
//! Each sample block is composed of 32/8 [4] frames of sample data.
//! Each frame, in turn, is composed of 64/4 [16] sample nibbles and
//! an 8-bit header:
//! Bit0-3: Coefficient selection
//! Bit4-7: Shift scale
//! Basically, the 'coefficient selection' chooses the pair of
//! coefficients [from the header] that best predicts the data in
//! the frame, while the 'shift scale' provides error correction
//! scaling. [the data is in the range of -32768..+32767]
//! Since there's only 16 coefficient pairs, prediction will never
//! be exact, so there's error correction.
//! Each nibble is a value [-8..+7] that is multiplied by the shift
//! scale [which is really 2^n] in order to more closely match the
//! original sound data.
//! The data is packed in a nifty way so that it is properly aligned
//! [if you remember, a frame has an 8-bit header] to a DWORD.
struct {
u8 frameHeader[32/8]; //! 32 = sizeof(DWORD). Four frames.
u64 frameData [32/8]; //! Ditto.
} block[];
} sound_t;
//! Decoding the sample data
void Decode(sound_t *snd, float *dest) {
//! storage for the last two samples [for prediction]
float old[2] = {0.f,0.f};
//! decode each block
for(int blk = 0 ; blk < snd->numBlk ; blk++) {
//! decode each frame
for(int frm = 0 ; frm < 32/8; frm++) {
//! read frame header
u32 frameHead = snd->block[blk].frameHeader[frm];
u32 coefSelect = frameHead & 0xF;
float errorScale = pow(2.f, (float)(frameHead>>4));
//! read frame data
u64 frameData = snd->block[blk].frameData[frm];
//! make a shortcut to the coefficients
//! [makes code slightly cleaner]
float *coef = snd->pCoef[coefSelect];
//! decode each sample
for(int i = 0 ; i < 64/4; i++) {
//! find predicted data
float predict = coef[0]*old[0] + coef[1]*old[1];
//! find the sample correction data
float corNib = (float)(frameData & 0xF) * errorScale;
//! find the corrected sample
float sample = predict + corNib;
//! output this sample
*dest++ = sample;
//! shuffle the data
old[1] = old[0],
old[0] = sample;
//! shift out the old nibble to get the next sample
frameData >>= 4ULL;
}
}
}
}What it 'talks' about is how to find the prediction coefficients for a given set of data, but it never mentions how to reduce these predictors to, say, 16 or so in such a way that the error is minimized.
Any ideas on the latter?