top of page
BuildSysBackground.png

EfficientMic: Adaptive Acoustic Sensing with a Single Microphone

EfficientMic addresses proliferating power and bandwidth of IoT devices equipped with microphones by reducing the sample rate of microphones without inducing aliasing – distortion which occurs when the sample rate is below Nyquist. Professor Islam and I developed a methodology to synthesize both aliasing and non-aliasing audio across multiple frequencies, creating a corpus of environmental and context-specific sounds for audio anomaly detection. Using this corpus, I evaluated several machine learning algorithms and neural networks to identify the most effective model for distinguishing anomalous from normal audio. Collaborating with Professor Islam at WPI and Professor Wei at Augusta University, I published and presented a workshop paper on the dataset and technical paper on the model analysis during BuildSys 2025 at the Colorado School of Mines.

01

Dynamic Sampling Feedback Modulation Block Diagram

AliasingFlowchart.png

First, a input from a microphone is sampled. The ML algorithm will determine if there is aliasing - answering the question: is the sample rate too low? If the sample rate is too low and aliasing occurs, the sample rate is increased. Otherwise, a Fast Fourier Transform can determine the highest frequency present. At this point the sample rate can be sustained or lowered. This process can reduce the sampling rate of a microphone, reducing the frequency of the embedded system collecting data and the power used by the whole system. Typical embedded systems which could reduce their power use with this system are Amazon Alexa or Google Home devices which constantly sample audio.

02

What is an STFT?

A Short-Time Fourier Transform (STFT) is a method for visualizing audio. The audio is divided into equal length segments and a Fourier Transform is run on each segment. The magnitude of each component of the FFT is represented by the color bar. The recording to the right is a drilling sound; this makes sense as there are consistent frequencies present across higher frequencies around 4kHz. 

STFT1.png
STFT2.png

03

Butterworth Filter Results

The recording above was filtered at 1kHz using a 6-pole Butterworth filter from the Python SciPy library. This STFT has no frequencies above 1kHz - the filter worked properly!

04

Aliasing Audio!

What does aliasing audio look like? This STFT on the right is the sample drilling audio from above sampled at 2kHz. Because the highest frequency in the drilling audio is 4kHz, this audio is aliasing. Playback of this audio would have 'sonic elements' that reduce the quality of the sound.

STFT1A.png

05

Non-Aliasing Audio

STFT2A.png

The audio presented in this STFT was filtered at 1kHz prior to sampling at 2kHz. As a result this STFT is non-aliasing. Using the ESC-50, Urban Sound 8K, and LJ-Speech audio corpora, I produced aliasing and non-aliasing audio 6 different filter cutoff frequencies - 22kHz, 16kHz, 11.5kHz, 8kHz, 4kHz, and 2kHz.

06

Classifier Results 1

Four classifiers were trained and tested on the Urban Sound 8K corpus. The best model built was the XGBoost classifier which achieved up to 70% F1 score. 

U8KF1s.jpg
ESCF1s.jpg

07

Classifier Results 2

The performance of the four classifiers was confirmed on the ESC-50 corpus. The best model continued to be the XGBoost classifier which achieved up to 65% F1 score.

08

image.png

Inference Time and Feasibility

I inferenced the XGBoost model on a Raspberry Pi 3.0 to test the feasibility of deploying this system on an embedded system. The resulting average inference time was less than 21ms per inference. 

LJRPTimeToInference.png
  • LinkedIn

©2024 by Jack Adiletta. Proudly created with Wix.com

bottom of page