Theory of Digitized Audio Information – Digital Audio Fundamentals

A sound comprises continuous signals with a sound wave of varying frequency and amplitude. Therefore, the higher the amplitude of the signal, the louder it is for a person to listen to the sound. Moreover, the higher the frequency of the signal, the higher the tone of the sound. However, the number of oscillations per second expresses a sound wave frequency, and it is measured in hertz (Hz, kHz).

The human ear can perceive “sound” within the range from twenty Hz to twenty kHz. Hence, the sound coding depth refers to the number of bits allocated to one audio signal. Moreover, the modern sound cards have a range of 16, 32 – or 64 -bit audio coding depth.

A continuous signal is replaced by a discrete one when the audio information is encoded and thus generates a sequence of electrical pulses (binary zeros and ones).

Therefore, digitization refers to the process of translating sound signals from a continuous representation form to a discrete digital form.

1. Sampling rate

One of the important characteristics while encoding the sound is the sampling rate,

  • 1 (one) measurement per second corresponds to a frequency of 1 Hz;
  • 1000 measurements per second correspond to a frequency of 1 kHz.

The sampling rate of the sound is the number of measurements of sound volume in one second. Furthermore, the range of the number of measurements can lie between 8 kHz to 48 kHz (from the broadcast frequency to a frequency corresponding to the sound quality of musical media).

The greater the depth and frequency of sound sampling, the better the digitized sound’s quality. For instance, the digitized sound with the lowest quality corresponds to the quality of communication made via telephone obtained with a sampling frequency of 8000 times per second and sampling depth of 8 bits. In contrast, it has the capacity to record only one audio track (mono mode). On the other hand, the digitized sound with the highest quality corresponds to the quality of audio CDs that are achieved with a sampling frequency of 48000 times per second and a sampling depth of 16 bits, which can record two audio tracks (stereo mode).

Hence, it must be remembered that the higher the quality of digital sound, the greater will be the information volume of the sound file.

2. How to calculate Audio File Size

It is easier for you to evaluate the information volume of a mono-audio file by using V = N⋅ f⋅ Where N is the total sound duration in seconds, f is the sampling frequency in Hz, and k is the coding depth in a bit.
For example, with a sound duration of 1 minute and average sound quality

V = 60 ⋅ 24000 ⋅ 16 bits = 23040000 bits = 2880000 bytes = 2812 , 5 Kbytes = 2 , 75 MB

The sampling process is separately and independently performed for the right and left channels while encoding the stereo sound that further doubles the sound file’s volume compared to encoding mono sound.
For example, for the estimation of the informational value possessed by the digital stereo file, which lasts for 1 second and its sound quality is average (sixteen bits, 24 000 measurements per second). To achieve this, we have to multiply the coding depth with the number of measurements per second and then again multiply it by 2 (stereo sound):

V= 16 bit ⋅ 24000⋅2 = 768000 bits = 96000 bytes = 93 , 75 Kbytes

For encoding the audio data along with the binary code, several methods can be adopted. These can be categorized into two main domains; the FM method and the Wave-Table method.
The FM (Frequency Modulation) method takes into consideration the theoretical fact that any sound that is complex tends to be decomposed into a series of simple harmonic signals having varying frequencies. Each of it is a standard sinusoid, and hence it can be described via a code. Special devices known as analog-to-digital converters (ADCs) are employed to expand the sound signals present in the harmonic series and for the presentation, which is in the form of discrete digital signals.

3. Continious audio signal into discrete signal conversion

  1. An audio signal at ADC’s input.
  2. A more discrete/processed signal at the ADC’s output.

For the inverse conversion to reproduce sound encoded by a numerical code, it is usually performed with the aid of digital-to-analog converters (DACs). Although this method of encoding does not provide a high sound quality, but it can churn out a compact code.

4. Discrete signal into an audio signal conversion

  1. A discrete signal at DAC’s input.
  2. An audio signal at DAC’s output.

Since the samples used in this are ‘real’ sounds, therefore the sound quality produced as a result of the synthesis is quite high and can be matched with the sound quality that the real musical instruments have.

Sound files are available in several formats, among which the most popular ones are MP3, WAV, and MIDI.

Since “real” sounds are used as samples, the sound quality resulting from the synthesis is very high and approaches the sound quality of real musical instruments.

The MIDI (Musical Instrument Digital Interface) format was designed to control instruments currently employed in the computer synthesis modules and electronic musical instruments industry.

The WAV (waveform) audio format is representative of an arbitrary sound. It exists in the form of a digital representation of any sound wave or original sound. The .wav extension is used for all the standard Windows sounds.

The MP3 format (MPEG-1 Audio Layer 3) is another digital format for the audio data storage, and it provides much improved coding quality.

Leave a Reply