Speech-Controlled Smart Speaker for Accurate, Real-Time Health and Care Record Management.
- J. Carrick, N. Dethlefs, , L. Greaves, V. Gunturi, , R. Kureshi and Y. Cheng.
- Link
Hide/Show Full Abstract
To help alleviate the pressures felt by care work- ers, we have begun new research into improv- ing the efficiency of care plan management by advancing recent developments in automatic speech recognition. Our novel approach adapts off-the-shelf tools in a purpose-built application for the speech domain, addressing challenges of accent adaption, real-time processing and speech hallucinations. We augment the speech- recognition scope of Open AI’s Whisper model through fine-tuning, reducing word error rates (WERs) from 16.8 to 1.0 on a range of British dialects. Addressing the speech-hallucination side effect of adapting to real-time recognition by enforcing a signal-to-noise ratio threshold and audio stream checks, we achieve a WER of 5.1, compared to 14.9 with Whisper’s orig- inal model. These ongoing research efforts tackle challenges that are necessary to build the speech-control basis for a custom smart speaker system that is both accurate and timely.- International Workshop on Spoken Dialogue System (IWSDS), Bilbao, Spain, 2025.