Made to Order Software Corporation Logo

DefineSound

Tag Info
Tag Number: 
14
Tag Type: 
Define
Tag Flash Version: 
2
Unknown SWF Tag: 
This tag is defined by the Flash documentation by Adobe
Brief Description: 

Declare a sound effect. This tag defines sound samples that can later be played back using either a StartSound or a DefineButtonSound. Note that the same DefineSound block can actually include multiple sound files and only part of the entire sound can be played back as required.

Tag Structure: 
struct swf_definesound {
	swf_tag			f_tag;		/* 14 */
	unsigned short		f_sound_id;
	unsigned		f_sound_format : 4;
	unsigned		f_sound_rate : 2;
	unsigned		f_sound_is_16bits : 1;
	unsigned		f_sound_is_stereo : 1;
	unsigned long		f_sound_samples_count;
	unsigned char		f_sound_data[<variable size>];
};

A DefineSound tag declares a set of samples of a sound effect or a music.

The sound samples can be compressed or not, stereo or not and 8 or 16 bits. The different modes are not all available in version 2, although the same tag is used in newer versions with additional capabilities.

The f_sound_is_16bits is always set to 1 (16bits samples) if the samples are compressed (neither Rawnor Uncompressed).

The f_sound_rate represents the rate at which the samples are defined. The rate at which it will be played on the target computers may differ. The following equation can be used to determine the rate:

	rate = 5512.5 * 2 ** f_sound_rate

It yields the following values (the rate of 5512.5 is rounded down to 5512):

f_sound_rate Rate in
bytes per seconds
0 5512
1 11025
2 22050
3 44100

The f_sound_samples_count value is the exact number of samples not the size of the data in byte. Thus, in stereo, it represents the number of pairs. To know the byte size, use the total size of the tag minus the header (11 or 13 depending on whether the size of the tag is larger than 62 - it is more than likely that it will be 13).

The f_sound_format can be one of the following values:

Value Name Comment Version
0 Raw 16 bits uncompressed samples are not specified as being saved in little or big endian. The endianess of the processor on which the movie is being played will be used. Thus you should never use this format with 16 bits samples. 2
1 ADPCM Audio differential pulse code modulation compression scheme. 2
2 MP3 High ratio of compression with very good quality sound. Use MP3 if you can save a V4.x or better movie. 4
3 Uncompressed Uncompressed samples which are always saved in little endian. This is similar to the format 0 except you can be sure the data will be properly played on any system. 4
6 Nellymoser Good quality sound compression for voices. Use Nellymoser if you can save a V6.x or better movie and the sound is actually a voice or animal roar, squeek, etc. This is a single channel compression. 6

The f_sound_data depends on the sound format. The following describes the different formats as used in the DefineSound and the SoundStreamBlock tags.

  • 8 bits

8 bits data is saved in an array of signed char. The value 0 represents silence. The samples can otherwise have values between -128 and +127.

  • 16 bits

16 bits data is saved in an array of signed short. The value 0 represents silence. The samples can otherwise have values between -32768 and +32767. By default, the data will be encoded in little endian. However, the RAWformat doesn't specify the endianess of the data saved in that case. You should avoid using RAW16 bits data. Use Uncompresseddata instead, compress it in some of the available compression formats (including RAW8 bits data). A player may wish to avoid playing any sound saved in RAW16 bits to avoid any problem.

  • Mono

Mono sound saves only one channel of sound. It will be played back on both output (left and right) channels. This is often enough for most sound effects and voice.

  • Stereo

For better quality music and sound effects, you can save the data in stereo. In this case, the samples for each channel (left and right) are interleaved, with the data for the left channel first. Thus, you will have: LRLRLRLRLR... In 8 bit, you get one byte for the left channel, then one byte for the right, one for the left, one for the right, etc. In 16 bit, you get two bytes for the left then two for the right channel, etc.

  • Raw

The RAWencoding is an uncompressed endian unspecified encoding. You can use this format to safely save small 8 bits samples sound effects. For 16 bit sound effects, some system may not swap the data before playing it, although it is likely that the buffer is expected to be in little endian.

  • ADPCM

Audio differential pulse code modulation compression scheme. This is pretty good compression for sound effects.

The ADPCM tables used by the SWF players are as follow:

int swf_adpcm_2bits[ 2] = { -1,  2 };

int swf_adpcm_3bits[ 4] = { -1, -1,  2,  4 };

int swf_adpcm_4bits[ 8] = { -1, -1, -1, -1,  2,  4,  6,  8 };

int swf_adpcm_5bits[16] = { -1, -1, -1, -1, -1, -1, -1, -1,
                             1,  2,  4,  6,  8, 10, 13, 16 };

The ADPCM data is composed of a 2 bits encoding size (2 to 5 bits) and an array of 4096 left (mono) or left and right (stereo) samples.

	struct swf_adpcm_header {
		unsigned		f_encoding : 2;
	};

The number of bits for the compression is f_encoding + 2.

	struct swf_adpcm_mono {
		unsigned short		f_first_sample;
		unsigned		f_first_index : 6;
		unsigned		f_data[4096] : f_encoding + 2;
	};
	struct swf_adpcm_stereo {
		unsigned short		f_first_sample_left;
		unsigned		f_first_index_left : 6;
		unsigned short		f_first_sample_right;
		unsigned		f_first_index_right : 6;
		unsigned		f_data[8192] : f_encoding + 2;
	};
  • MP3

IMPORTANT LICENSING NOTES: please, see The entire SSWF project license above for information about the Audio MPEG licensing rights.

The SWF players which support movie v4.x and better will also support MPEG1 audio compression. This is a good quality high compression scheme. The players need to support constant and variable bit rates, and MPEG1 Layer 3, v2 and v2.5. For more information about MPEG you probably want to check out this web site: http://www.mp3-tech.org/.

In SWF movies, you need to save a seeking point (position of the data to play in a given frame) before the MP3 frames themselves. It is also called the initial latency. I will make this clearer once I understand better what it means.

An MP3 frame is described below. This is exactly what you will find in any music file.

	struct swf_mp3_header {
		unsigned		f_sync_word : 11;
		unsigned		f_version : 2;
		unsigned		f_layer : 2;
		unsigned		f_no_protection : 1;
		unsigned		f_bit_rate : 4;
		unsigned		f_sample_rate : 2;
		unsigned		f_padding : 1;
		unsigned		f_reserved : 1;
		unsigned		f_channel_mode : 2;
		unsigned		f_mode_extension : 2;
		unsigned		f_copyright : 1;
		unsigned		f_original : 1;
		unsigned		f_emphasis : 2;
		if(f_no_protection == 0) {
			unsigned short	f_check_sum;
		}
		unsigned char		f_data[variable size];
	};

The f_sync_word are 11 bits set to 1's only. This can be used to synchronize to the next frame without knowing the exact size of the previous frame.

The f_version can be one of the following:

  • 0 - MPEG version 2.5 (extension to MPEG 2)
  • 1 - reserved
  • 2 - MPEG version 2 (ISO/IEC 13818-3)
  • 3 - MPEG version 1 (ISO/IEC 11172-3)

Note: if the MPEG version 2.5 isn't use, then the f_sync_word can be viewed as 12 bits and the f_version as 1 bit.

In SWF movies, the f_layer must be set to III (which is 1). The valid MPEG layers are as follow:

  • 0 - reserved
  • 1 - Layer III
  • 2 - Layer II
  • 3 - Layer I

The f_no_protection determines whether a checksum is defined right after the 32 bits header. If there is a checksum, it is a 16 bit value which represents the total of all the words in the frame data.

The f_bit_rate determines the rate at which the following data shall be taken as. The version and layer have also an effect on determining what the rate is from this f_bit_rate value. Since SWF only accepts Layer III data, we can only accepts a few set of rates as follow. MP3 players (and thus SWF players) must support variable bit rates. Thus, each frame may use a different value for the f_bit_rate field.

f_bit_rate MPEG version 1 MPEG version 2
0 free(1) free(1)
1 32 kbps 8 kbps
2 40 kbps 16 kbps
3 48 kbps 24 kbps
4 56 kbps 32 kbps
5 64 kbps 40 kbps
6 80 kbps 48 kbps
7 96 kbps 56 kbps
8 112 kbps 64 kbps
9 128 kbps 80 kbps
10 160 kbps 96 kbps
11 192 kbps 112 kbps
12 224 kbps 128 kbps
13 256 kbps 144 kbps
14 320 kbps 160 kbps
15 bad(2) bad(2)
(1) free — means any (variable) bit rate
(2) bad — means you can't properly use this value

The f_sample_rate defines the rate at which the encoded samples will be played at. This rate may vary and be equal or smaller than the rate indicated in the DefineSound header. The rate definition depends on the MPEG version as follow:

f_sample_rate MPEG version 1 MPEG version 2 MPEG version 2.5
0 44100 Hz 22050 Hz 11025 Hz
1 48000 Hz 24000 Hz 12000 Hz
2 32000 Hz 15000 Hz 8000 Hz
3 reserved

The f_padding will be set to 1 if the stream includes pads (one extra slot - 8 bits of data). This is used to ensure that the sound is exactly the right size. Useful only if your sound is very long and synchronized with the images.

The f_reserved isn't used and must be set to zero in SWF files.

The f_channel_mode determines the mode used to compress stereophonic audio. Note that the Dual Channel mode is viewed as a stereo stream by SWF. It can be one of the following:

  • 0 - stereo (standard LRLRLR...)
  • 1 - joint stereo (L+R and L-R)
  • 2 - dual channels (LLLLL... and then RRRRR...)
  • 3 - single channel (monophonic audio)

The f_mode_extension determines whether the intensity stereo (L+R — bit 5) and middle side stereo (L-R — bit 4) are used (set bit to 1) or not (set bit to 0) in joint stereo. f_mode_extension is usually always set to 3.

The f_copyright field is a boolean value which specify whether the corresponding audio is copyrighted or not. The default is to set it to 1 (copyrighted).

The f_original field is a boolean value which specify whether the corresponding audio is a copy or the actual original sound track. It's usually set to 0 (a copy) in SWF movies.

The f_emphasis field can be one of the following values. It is rarely used. It tells the decoder to re-equalize the sounds.

  • 0 - no emphasis
  • 1 - 50/15 ms
  • 2 - reserved
  • 3 - CCIT J.17
  • Nellymoser

This is a newly supported scheme to encode speech (and audio) of either better quality or smaller bit rate. Thus you can either put more sound in your files resulting in a similar file size or make the entire file smaller so it downloads faster.

Somehow, the Nellymoser encoding and decoding patents used by Flash have been released. You may want to look at the mpeg project for information about the format. Feel free to check out the http://www.nellymoser.com web site for more info about this compression scheme.