Digital Dynamic Sampling Frequency Modulation
using Machine Learning

Working with Professor Islam in the BASH lab, I programmed a method for producing aliasing and non-aliasing audio in Python. I then trained and tested a several machine learning models to detect aliasing and non-aliasing audio files.

Audio recording systems currently sample at an excessively high rate to avoid aliasing - an audio effect that occurs when audio is sampled below the Nyquist sampling rate - or twice the highest frequency present in audio. Essentially, sampling too slowly will not capture enough data to effectively reproduce or analyze a signal. This research yielded an ML algorithm to determine if audio sampling is too low.

Audio_Aliasing_Paper.pdf

Dynamic Sampling Feedback Modulation Block Diagram

First, a input from a microphone is sampled. The ML algorithm will determine if there is aliasing - answering the question: is the sample rate too low? If the sample rate is too low and aliasing occurs, the sample rate is increased. Otherwise, a Fast Fourier Transform can determine the highest frequency present. At this point the sample rate can be sustained or lowered. This process can reduce the sampling rate of a microphone, reducing the frequency of the embedded system collecting data and the power used by the whole system. Typical embedded systems which could reduce their power use with this system are Amazon Alexa or Google Home devices which constantly sample audio.

What is an STFT?

A Short-Time Fourier Transform (STFT) is a method for visualizing audio. The audio is divided into equal length segments and a Fourier Transform is run on each segment. The magnitude of each component of the FFT is represented by the color bar. The recording to the right is a drilling sound; this makes sense as there are consistent frequencies present across higher frequencies around 4kHz.

Butterworth Filter Results

The recording above was filtered at 1kHz using a 6-pole Butterworth filter from the Python SciPy library. This STFT has no frequencies above 1kHz - the filter worked properly!

Aliasing Audio!

What does aliasing audio look like? This STFT on the right is the sample drilling audio from above sampled at 2kHz. Because the highest frequency in the drilling audio is 4kHz, this audio is aliasing. Playback of this audio would have 'sonic elements' that reduce the quality of the sound.

Non-Aliasing Audio

The audio presented in this STFT was filtered at 1kHz prior to sampling at 2kHz. As a result this STFT is non-aliasing. Using the ESC-50, Urban Sound 8K, and LJ-Speech audio corpora, I produced aliasing and non-aliasing audio 6 different filter cutoff frequencies - 22kHz, 16kHz, 11.5kHz, 8kHz, 4kHz, and 2kHz.

Classifier Results 1

Four classifiers were trained and tested on the Urban Sound 8K corpus. The best model built was the XGBoost classifier which achieved up to 70% F1 score.

Classifier Results 2

The performance of the four classifiers was confirmed on the ESC-50 corpus. The best model continued to be the XGBoost classifier which achieved up to 65% F1 score.

Inference Time and Feasibility

I inferenced the XGBoost model on a Raspberry Pi 3.0 to test the feasibility of deploying this system on an embedded system. The resulting average inference time was less than 21ms per inference.

Digital Dynamic Sampling Frequency Modulation using Machine Learning