Main Content

Extract MFCC, log energy, delta, and delta-delta of audio signal

specifies options using one or more `coeffs`

= mfcc(___,`Name,Value`

)`Name,Value`

pair
arguments.

`coeffs = mfcc(audioIn,fs,'LogEnergy','Replace')`

returns
mel frequency cepstral coefficients for the audio input signal sampled at
`fs`

Hz. The first coefficient in the `coeffs`

vector is replaced with the log energy value.`[`

also returns the delta, delta-delta, and location of samples corresponding to each
window of data.`coeffs`

,`delta`

,`deltaDelta`

,`loc`

] = mfcc(___)

Mel frequency cepstrum coefficients are popular features extracted from speech signals for use in recognition tasks. In the source-filter model of speech, cepstral coefficients are understood to represent the filter (vocal tract). The vocal tract frequency response is relatively smooth, whereas the source of voiced speech can be modeled as an impulse train. As a result, the vocal tract can be estimated by the spectral envelope of a speech segment.

The motivating idea of mel frequency cepstral coefficients is to compress information about the vocal tract (smoothed spectrum) into a small number of coefficients based on an understanding of the cochlea. Although there is no hard standard for calculating the coefficients, the basic steps are outlined by the diagram.

The default mel filter bank linearly spaces the first 10 triangular filters and logarithmically spaces the remaining filters.

The information contained in the zeroth mel frequency cepstral coefficient is often augmented with or replaced by the log energy. The log energy calculation depends on the input domain.

If the input (*audioIn*) is a time-domain signal, the log energy is
computed using the following equation:

$$\mathrm{log}E=\mathrm{log}(\text{sum}({x}^{2}))$$

If the input (*audioIn*) is a frequency-domain signal, the log energy
is computed using the following equation:

$$\mathrm{log}E=\mathrm{log}\left(\text{sum}\left({\left|x\right|}^{2}\right)/FFTLength\right)$$

[1] Rabiner, Lawrence R., and
Ronald W. Schafer. *Theory and Applications of Digital Speech
Processing*. Upper Saddle River, NJ: Pearson, 2010.

[2] Auditory Toolbox. https://engineering.purdue.edu/~malcolm/interval/1998-010/AuditoryToolboxTechReport.pdf

Cepstral Feature
Extractor | `audioFeatureExtractor`

| `audioDelta`

| `cepstralCoefficients`

| `detectSpeech`