Speex can encode speech in both narrowband and wideband and provides different bit-rates. However not all features must be supported by a certain implementation or device. In order to be said ``Speex compatible'', an implementation must implement at least a basic set of features.
At the minimum, all narrowband and wideband modes of operation MUST be supported at the decoder. For wideband, a decoder MAY either be able to decode all modes or be able to decode the embedded narrowband part of all modes (which includes ignoring the high-band bits).
For encoders, at least one narrowband and one wideband mode MUST be supported. Note that the wideband mode MAY be the ``null highband'' mode1. The main reason why all encoding modes do not have to be supported is that some platforms may not be able to handle the complexity of encoding in some modes.
Since Speex encoded frames already contain mode information, they can be sent without any other information in an RTP packet. If more than one frame is transmitted, no byte padding is performed at the end of frames, except the last one. The number of frames contained in each packet MUST be transmitted out-of-band.
Speex bit-streams can be stored in Ogg files. In this case, the first packet of the Ogg file contains the Speex header described in table 4. All integer fields in the headers are stored as little-endian. The speex_string field must contain the ``Speex `` (with 3 training spaces), which identifies the bit-stream. The next field, speex_version contains the version of Speex that encoded the file. For now, refer to speex_header.[ch] for more info. The beginning of stream (b_o_s) flag is set to 1 for the header. The header packet has packetno=0 and granulepos=0.
The second packet contains a user-comment string, without terminating null. The content/format of the comment string is not defined. This packet has packetno=1 and granulepos=0.
The third and subsequant packets each contain one or more (number found in header) Speex frames. These are identified with packetno starting from 2 and the granulepos is the number of the first sample encoded in that packet.
The stream is terminated by a packet containing the string ``END OF STREAM'' (without terminating null). The end of stream (e_o_s) flag is set to 1. The decoder should rely on the e_o_s flag and not on the content of the packet.
|