Comparison of ideal mask-based speech enhancement algorithms for white noise and low mixture signal-to-noise ratios

Graetzer, S and Hopkins, C ORCID: 0000-0002-9716-0793 (2019) Comparison of ideal mask-based speech enhancement algorithms for white noise and low mixture signal-to-noise ratios. In: ICA, 2019-9-9 - 2019-9-13, Aachen.

Text
Graetzer & Hopkins Speech intelligibility - mask based enhancement - white noise - low SNRs ICA 2019.pdf - Published version
Download (1MB) | Preview

Abstract

The intelligibility of noisy speech can be improved by applying an ideal binary or soft gain mask in the time-frequency domain for signal-to-noise ratios (SNRs) that are typically between -10 and +10 dB. In this study, two mask-based algorithms are compared when applied to speech mixed with white Gaussian noise (WGN) at low SNRs (from -29 to -5 dB). These comprise an Ideal Binary Mask (IBM) with a local criterion set to 0 dB and an Ideal Ratio Mask (IRM). The performance of Short-Time Objective Intelligibility (STOI), and a STOI variant (termed STOI+), is compared with that of other monaural intelligibility metrics that can be used before and after mask-based processing. The results show that IRMs can be used to obtain near maximal speech intelligibility (> 90% for sentence material) even at very low mixture SNRs, while IBMs with LC = 0 provide limited intelligibility gains for SNR < -14 dB. It is also shown that STOI+ is a suitable metric for speech mixed with WGN at low SNRs and processed by IBMs with LC = 0, even when the speech is high-pass filtered to flatten the spectral tilt.

Item Type:	Conference or Workshop Item (Unspecified)
Depositing User:	Symplectic Admin
Date Deposited:	13 Sep 2019 07:19
Last Modified:	27 Apr 2024 12:18
DOI:	10.18154/RWTH-CONV-239141
URI:	https://livrepository.liverpool.ac.uk/id/eprint/3054456