Speex narrowband mode

This section looks at how Speex works for narrowband ( $8\: \mathrm{kHz}$ sampling rate) operation. The frame size for this mode is $20\: \mathrm{ms}$ , corresponding to 160 samples. Each frame is also subdivided into 4 sub-frames of 40 samples each.

LPC Analysis

An LPC analysis is first performed on a (Hamming) window that spans all the current frame and half a frame in advance. The LPC coefficients are then converted to Line Spectral Pair (LSP), a representation that is more robust to quantization. The LSP's are quantized using 30 bits for higher quality modes and 18 bits for lower quality. The quantized LSP's are considered to be associated to the $4^{th}$ sub-frames and the LSP's associated to the first 3 sub-frames are linearly interpolated using the current and previous LSP's.

The perceptual weighting filter

used by Speex corresponds to the one described by eq. 1 with $\gamma _{1}=0.9$ and $\gamma _{2}=0.6$ . We can use the unquantized

filter since the weighting filter is only used in the encoder.

Pitch Prediction (adaptive codebook)

where

is the pitch period and the $\beta _{i}$ are the prediction (filter) taps. The period and quantized gains are determined in closed loop.

Innovation Codebook

In Speex, the innovation signal is quantized using shape-only vector quantization (VQ). That means that the codebooks that are used represent both the shape and the gain at the same time. This save many bits that would otherwise be allocated for a separate gain at the price of a slight increase in complexity. Except for the absence of (backward-adaptive) gain, the technique used in Speex is similar to G.728 (LD-CELP). However since we do not have a low-delay constraint, the search can be made more ``global'' and make use of the whole information available for a sub-frame.

Bit allocation

There are 7 different narrowband bit-rates defined for Speex, ranging from 200 bps to 18.15 kbps, although the modes below 5.9 kbps should not be used for speech. The bit-allocation for each mode is detailed in table 1. Each frame starts with the mode ID encoded with 4 bits which allows a range from 0 to 15, though only the first 7 values are used (the others are reserved). The parameters are listed in the table in the order they are packed in the bit-stream. All frame-based parameters are packed before sub-frame parameters. The parameters for a certain sub-frame are all packed before the following sub-frame is packed. Note that the ``OL'' in the parameter description means the the parameter is an open loop estimation based on the whole frame.

Table 1: Bit allocation for narrowband modes

Parameter	Update rate	0	1	2	3	4	5	6	7
Wideband bit	frame	1	1	1	1	1	1	1	1
Mode ID	frame	4	4	4	4	4	4	4	4
LSP	frame	0	18	18	18	18	30	30	30
OL pitch	frame	0	7	7	0	0	0	0	0
OL pitch gain	frame	0	4	0	0	0	0	0	0
OL Exc gain	frame	0	5	5	5	5	5	5	5
Fine pitch	sub-frame	0	0	0	7	7	7	7	7
Pitch gain	sub-frame	0	0	5	5	5	7	7	7
Innovation gain	sub-frame	0	1	0	1	1	3	3	3
Innovation VQ	sub-frame	0	0	16	20	35	48	64	96
Total	frame	5	43	119	160	220	300	364	492

So far, no MOS (mean opinion score) subjective evaluation has been performed for Speex. In order to give an idea of the quality achivable with it, table 2 presents my own subjective opinion on it. It sould be noted that different people will perceive the quality differently and that the person that designed the codec often has a bias (one way or another) when it comes to subjective evaluation. Last thing, it should be noted that for most codecs (including Speex) encoding quality sometimes varies depending on the input.

Table 2: Quality versus bit-rate

Mode	Bitrate (bps)	Quality/description
0	250	No sound (VBR only)
1	2,150	Comfort noise only (VBR only)
2	5,950	Very noticeable artifacts/noise, good intelligibility
3	8,000	Artifacts/noise sometimes noticeable
4	11,000	Artifacts usually noticeable only with headphones
5	15,000	Need good headphones to tell the difference
6	18,200	Hard to tell the difference even with good headphones
7	24,600	Completely transparent for voice, good quality music
8	N/A	reserved
9	N/A	reserved
10	N/A	reserved
11	N/A	reserved
12	N/A	reserved
13	N/A	Application-defined, Speex should never see it
14	N/A	In-band signaling, Speex will skip it
15	N/A	Terminator code

Perceptual enhancement

This part of the codec only applies to the decoder and can even be changed without affecting inter-operability. For that reason, the implementation provided and described here should only be considered as a reference implementation. The enhancement system is devided in two parts. First, the synthesis filter

is replaced by an enhanced filter