Comparison of ideal mask-based speech enhancement algorithms for white noise and low mixture signal-to-noise ratios



Graetzer, S and Hopkins, C ORCID: 0000-0002-9716-0793
(2019) Comparison of ideal mask-based speech enhancement algorithms for white noise and low mixture signal-to-noise ratios. In: ICA, 2019-9-9 - 2019-9-13, Aachen.

[img] Text
Graetzer & Hopkins Speech intelligibility - mask based enhancement - white noise - low SNRs ICA 2019.pdf - Published version

Download (1MB) | Preview

Abstract

The intelligibility of noisy speech can be improved by applying an ideal binary or soft gain mask in the time-frequency domain for signal-to-noise ratios (SNRs) that are typically between -10 and +10 dB. In this study, two mask-based algorithms are compared when applied to speech mixed with white Gaussian noise (WGN) at low SNRs (from -29 to -5 dB). These comprise an Ideal Binary Mask (IBM) with a local criterion set to 0 dB and an Ideal Ratio Mask (IRM). The performance of Short-Time Objective Intelligibility (STOI), and a STOI variant (termed STOI+), is compared with that of other monaural intelligibility metrics that can be used before and after mask-based processing. The results show that IRMs can be used to obtain near maximal speech intelligibility (> 90% for sentence material) even at very low mixture SNRs, while IBMs with LC = 0 provide limited intelligibility gains for SNR < -14 dB. It is also shown that STOI+ is a suitable metric for speech mixed with WGN at low SNRs and processed by IBMs with LC = 0, even when the speech is high-pass filtered to flatten the spectral tilt.

Item Type: Conference or Workshop Item (Unspecified)
Depositing User: Symplectic Admin
Date Deposited: 13 Sep 2019 07:19
Last Modified: 27 Apr 2024 12:18
DOI: 10.18154/RWTH-CONV-239141
URI: https://livrepository.liverpool.ac.uk/id/eprint/3054456