Skip to content

Understanding Digital Audio Encoding

KZcheese edited this page Dec 4, 2014 · 1 revision

Understanding the process behind storing audio in a digital format is a crucial first step to becoming a digital musician or sound designer.
You can view a supplementary video on sampling rate, bit depth, and interpolation here.

How Audio is Stored

In real life, sound is a continuous wave. This wave cannot be perfectly stored in the form of a digital file because computers store information using binary (0s and 1s). In order for a computer to store playable audio, the sound wave must be turned into a set of data points called samples.
Digital Wave
Notice how the blue samples occur in regular intervals and are in line with the grid.

Pulse-Code Modulation (PCM)

PCM is the raw and uncompressed form a digital audio, and all audio files are converted into PCM when processed or played. PCM is measured using audio channels, sampling rate, bit depth, and bit rate.

Audio Channels

Each channel in an audio file contains unique audio track and represents sound coming from a specific direction. Audio with one center channel is known as mono, and is usually used to record single instruments or sound effects. Audio with a separate left and right channel is known as stereo, and most music available today is in stereo. Games and movies often support surround sound, which usually comes in the form of 5.1 or 7.1 surround sound. The .1 in 5.1 and 7.1 stands for the subwoofer channel which is used in big speaker setups that include a subwoofer, a speaker dedicated to bass.
7.1 Layout
The Standard Layout for a Set of 7.1 Surround Sound Speakers

Sampling Rate

Sampling rate is the number of samples taken per second in a PCM track. This is measured in hertz (Hz), which stands for repetitions per second. Theoretically, a higher sampling rate allows for a greater amount of detail to be stored. However, the difference between audio with a standard sampling rate and audio with a very high sampling rate is negligible, as everyday audio files have a high enough sampling rate to accurately reproduce audio on a practical level. Common audio files have sampling rates of 44,100 Hz, 48,000 Hz, and very rarely 96,000 Hz and 192,000 Hz.

![Sampling Rate] (http://www.jazzpoparkisto.net/audio/webpictures/samplingrate.jpg)
As the sampling rate becomes higher, the resulting wave is smoother and more accurate.

Bit Depth

Bit depth is the number of bits in each sample in a PCM track. The value stored in these bits represents the state of the sound wave at the point where the sample was taken. Because digital samples can only be represented in set values, a higher bit depth increases the precision and dynamic range of an audio track. Common bit depths include 16 bit, 24 bit, and 32 bit. Most audio is usually encoded in with 16 bit depth. Keep in mind that, like sampling rate, the difference between a standard 16 bit audio track and a high depth 32 bit audio track is negligible during normal playback. However, audio is often processed in 32 bit or 32 bit float (float = decimals) in order to minimized rounding errors caused by processing algorithms.

Bit Depth
A higher bit depth allows for more precise and accurate samples.

Bit Rate

Bit rate is the average amount of data per second in a file, and is usually represented in kilobits per second (kbps), and often used as a measurement of the space taken by an encoded audio file. This can apply to any file format that contains a continuous stream of information, however this is very important to audio specifically. In audio, bit rate is measured using the total memory of all channels in a file. This means that a stereo file has roughly twice the bit rate of a mono file (1 * 2 = 2. Do the math).

Common Audio File Formats

PCM is raw, uncompressed audio. Because of this, audio is almost never stored in this format, but is instead encoded into other file types. These files types are decoded back into PCM during playback or processing. There a two types of encoded audio files: lossless and lossy.

Lossless

Lossless audio files are files that encode audio in a way includes all of the data from the original source. With lossless audio, no data is lost. This means that lossless audio files are of higher quality, but are also much bigger. Lossless audio is generally not as well supported as lossy. In music, stereo lossless files can range in bit rate from under 700 kbps to over 2,000 kbps depending on the level of compression. CDs store audio at a bit rate of 1,411.2 kbps in stereo. Bitrate is often used in lossless audio to measure how efficiently a file as been compressed.

Free Lossless Audio Codec (FLAC)

FlAC is a compressed, lossless audio format. This means that FLAC files are compressed in order to save space, but retain all of the information of the source. FLAC is the fastest and most widely used and supported lossless audio codec today, and tends to be smaller in size compared to other lossless formats.

Waveform Audio File Format (WAV)

WAV is a lossless audio most commonly used in Windows applications that store audio in a lossless format, especially basic recording software. WAV is often uncompressed, meaning it takes up a significantly larger amount of space than FLAC does.

Lossy

Out of the two major types of audio formats, lossy is much more widely used and supported. Lossy audio formats are able to store audio in a much smaller amount of space than lossless formats, but sacrifice audio quality in order to do so. This is done by storing only small part of the source's original data, and estimating the missing information during decoding. Stereo lossy files are commonly found with bit rates of 128 kbps, 192 kbps, 256 kbps, and 320 kbps. Because can lossy files only store a limited amount of information, bit rate is used to measure the quality of the file.

Mpeg-1 or Mpeg-2 Audio Layer 3 (MP3)

MP3 is currently the most widely used and supported digital audio format available. It is a standard lossy audio format which can have a bit rate of up to 320 kbps.

Advanced Audio Coding (AAC/M4a)

Commonly known M4a, AAC is more modern version of MP3, and is the standard file format used in iTunes. M4a can support a bit rate of up to 256 kbps, and usually sounds better than an MP3 file of the same bitrate. It is currently well supported, but not as well as MP3 is. The reason AAC files are usually labeled as m4a is because AAC is actually just the audio part of MP4, which normally supports video.

Ogg Vorbis

Ogg is a relatively new and obscure open source format. It supposedly has a higher audio quality than both MP3 and M4a at the same bit rate, but it is not well known or supported.

Clone this wiki locally