Graetzer, S and Hopkins, C ORCID: 0000-0002-9716-0793
(2019)
Comparison of ideal mask-based speech enhancement algorithms for white noise and low mixture signal-to-noise ratios.
In: ICA, 2019-9-9 - 2019-9-13, Aachen.
Text
Graetzer & Hopkins Speech intelligibility - mask based enhancement - white noise - low SNRs ICA 2019.pdf - Published version Download (1MB) | Preview |
Abstract
The intelligibility of noisy speech can be improved by applying an ideal binary or soft gain mask in the time-frequency domain for signal-to-noise ratios (SNRs) that are typically between -10 and +10 dB. In this study, two mask-based algorithms are compared when applied to speech mixed with white Gaussian noise (WGN) at low SNRs (from -29 to -5 dB). These comprise an Ideal Binary Mask (IBM) with a local criterion set to 0 dB and an Ideal Ratio Mask (IRM). The performance of Short-Time Objective Intelligibility (STOI), and a STOI variant (termed STOI+), is compared with that of other monaural intelligibility metrics that can be used before and after mask-based processing. The results show that IRMs can be used to obtain near maximal speech intelligibility (> 90% for sentence material) even at very low mixture SNRs, while IBMs with LC = 0 provide limited intelligibility gains for SNR < -14 dB. It is also shown that STOI+ is a suitable metric for speech mixed with WGN at low SNRs and processed by IBMs with LC = 0, even when the speech is high-pass filtered to flatten the spectral tilt.
Item Type: | Conference or Workshop Item (Unspecified) |
---|---|
Depositing User: | Symplectic Admin |
Date Deposited: | 13 Sep 2019 07:19 |
Last Modified: | 27 Apr 2024 12:18 |
DOI: | 10.18154/RWTH-CONV-239141 |
URI: | https://livrepository.liverpool.ac.uk/id/eprint/3054456 |