next up previous contents
Next: Speex wideband mode (sub-band Up: The Speex Codec Manual Previous: Introduction to CELP Coding   Contents

Subsections


Speex narrowband mode

This section looks at how Speex works for narrowband ( $ 8\: \mathrm{kHz}$ sampling rate) operation. The frame size for this mode is $ 20\: \mathrm{ms}$, corresponding to 160 samples. Each frame is also subdivided into 4 sub-frames of 40 samples each.

Also many design decisions were based on the original goals and assumptions:

LPC Analysis

An LPC analysis is first performed on a (Hamming) window that spans all the current frame and half a frame in advance. The LPC coefficients are then converted to Line Spectral Pair (LSP), a representation that is more robust to quantization. The LSP's are quantized using 30 bits for higher quality modes and 18 bits for lower quality. The quantized LSP's are considered to be associated to the $ 4^{th}$ sub-frames and the LSP's associated to the first 3 sub-frames are linearly interpolated using the current and previous LSP's.

The perceptual weighting filter $ W(z)$ used by Speex corresponds to the one described by eq. 1 with $ \gamma _{1}=0.9$ and $ \gamma _{2}=0.6$. We can use the unquantized $ A(z)$ filter since the weighting filter is only used in the encoder.

Pitch Prediction (adaptive codebook)

Speex uses a 3-tap prediction for pitch. That is,

$\displaystyle e(n)=\beta _{0}e(n-T-1)+\beta _{1}e(n-T)+\beta _{2}e(n-T+1)+c(n)$

where $ T$ is the pitch period and the $ \beta _{i}$ are the prediction (filter) taps. The period and quantized gains are determined in closed loop.

Innovation Codebook

In Speex, the innovation signal is quantized using shape-only vector quantization (VQ). That means that the codebooks that are used represent both the shape and the gain at the same time. This save many bits that would otherwise be allocated for a separate gain at the price of a slight increase in complexity. Except for the absence of (backward-adaptive) gain, the technique used in Speex is similar to G.728 (LD-CELP). However since we do not have a low-delay constraint, the search can be made more ``global'' and make use of the whole information available for a sub-frame.

Bit allocation

There are 7 different narrowband bit-rates defined for Speex, ranging from 200 bps to 18.15 kbps, although the modes below 5.9 kbps should not be used for speech. The bit-allocation for each mode is detailed in table 1. Each frame starts with the mode ID encoded with 4 bits which allows a range from 0 to 15, though only the first 7 values are used (the others are reserved). The parameters are listed in the table in the order they are packed in the bit-stream. All frame-based parameters are packed before sub-frame parameters. The parameters for a certain sub-frame are all packed before the following sub-frame is packed. Note that the ``OL'' in the parameter description means the the parameter is an open loop estimation based on the whole frame.


Table 1: Bit allocation for narrowband modes
Parameter Update rate 0 1 2 3 4 5 6 7
Wideband bit frame 1 1 1 1 1 1 1 1
Mode ID frame 4 4 4 4 4 4 4 4
LSP frame 0 18 18 18 18 30 30 30
OL pitch frame 0 7 7 0 0 0 0 0
OL pitch gain frame 0 4 0 0 0 0 0 0
OL Exc gain frame 0 5 5 5 5 5 5 5
Fine pitch sub-frame 0 0 0 7 7 7 7 7
Pitch gain sub-frame 0 0 5 5 5 7 7 7
Innovation gain sub-frame 0 1 0 1 1 3 3 3
Innovation VQ sub-frame 0 0 16 20 35 48 64 96
Total frame 5 43 119 160 220 300 364 492


So far, no MOS (mean opinion score) subjective evaluation has been performed for Speex. In order to give an idea of the quality achivable with it, table 2 presents my own subjective opinion on it. It sould be noted that different people will perceive the quality differently and that the person that designed the codec often has a bias (one way or another) when it comes to subjective evaluation. Last thing, it should be noted that for most codecs (including Speex) encoding quality sometimes varies depending on the input.


Table 2: Quality versus bit-rate
Mode Bitrate (bps) Quality/description
0 250 No sound (VBR only)
1 2,150 Comfort noise only (VBR only)
2 5,950 Very noticeable artifacts/noise, good intelligibility
3 8,000 Artifacts/noise sometimes noticeable
4 11,000 Artifacts usually noticeable only with headphones
5 15,000 Need good headphones to tell the difference
6 18,200 Hard to tell the difference even with good headphones
7 24,600 Completely transparent for voice, good quality music
8 N/A reserved
9 N/A reserved
10 N/A reserved
11 N/A reserved
12 N/A reserved
13 N/A Application-defined, Speex should never see it
14 N/A In-band signaling, Speex will skip it
15 N/A Terminator code


Perceptual enhancement

This part of the codec only applies to the decoder and can even be changed without affecting inter-operability. For that reason, the implementation provided and described here should only be considered as a reference implementation. The enhancement system is devided in two parts. First, the synthesis filter $ S(z)=1/A(z)$ is replaced by an enhanced filter

$\displaystyle S'(z)=\frac{A\left(z/a_{2}\right)A\left(z/a_{3}\right)}{A\left(z\right)A\left(z/a_{1}\right)}$

where $ a_{1}$ and $ a_{2}$ depend on the mode in use and $ a_{3}=\frac{1}{r}\left(1-\frac{1-ra_{1}}{1-ra_{2}}\right)$ with $ r=.9$. The second part of the enhancement consists of using a comb filter to enhance the pitch in the excitation domain.


next up previous contents
Next: Speex wideband mode (sub-band Up: The Speex Codec Manual Previous: Introduction to CELP Coding   Contents
Jean-Marc Valin 2002-08-27