Graetzer, Simone and Hopkins, Carl ORCID: 0000-0002-9716-0793
(2022)
Comparison of ideal mask-based speech enhancement algorithms for speech mixed with white noise at low mixture signal-to-noise ratios.
The Journal of the Acoustical Society of America, 152 (6).
pp. 3458-3470.
Abstract
<jats:p> The literature shows that the intelligibility of noisy speech can be improved by applying an ideal binary or soft gain mask in the time-frequency domain for signal-to-noise ratios (SNRs) between –10 and +10 dB. In this study, two mask-based algorithms are compared when applied to speech mixed with white Gaussian noise (WGN) at lower SNRs, that is, SNRs from −29 to –5 dB. These comprise an Ideal Binary Mask (IBM) with a Local Criterion (LC) set to 0 dB and an Ideal Ratio Mask (IRM). The performance of three intrusive Short-Time Objective Intelligibility (STOI) variants—STOI, STOI+, and Extended Short-Time Objective Intelligibility (ESTOI)—is compared with that of other monaural intelligibility metrics that can be used before and after mask-based processing. The results show that IRMs can be used to obtain near maximal speech intelligibility (>90% for sentence material) even at very low mixture SNRs, while IBMs with LC = 0 provide limited intelligibility gains for SNR < −14 dB. It is also shown that, unlike STOI, STOI+ and ESTOI are suitable metrics for speech mixed with WGN at low SNRs and processed by IBMs with LC = 0 even when speech is high-pass filtered to flatten the spectral tilt before masking. </jats:p>
Item Type: | Article |
---|---|
Uncontrolled Keywords: | Speech Intelligibility, Perceptual Masking, Speech Perception, Algorithms, Signal-To-Noise Ratio |
Depositing User: | Symplectic Admin |
Date Deposited: | 20 Dec 2022 08:39 |
Last Modified: | 21 Aug 2023 02:25 |
DOI: | 10.1121/10.0016494 |
Open Access URL: | https://asa.scitation.org/doi/10.1121/10.0016494 |
Related URLs: | |
URI: | https://livrepository.liverpool.ac.uk/id/eprint/3166721 |