National Speech Corpus: Large-scale Singapore English Corpus of Open Speech Data
About
The National Speech Corpus (NSC) is the first large-scale Singapore English corpus spearheaded by the Info-communications and Media Development Authority (IMDA) of Singapore. It aims to become an important source of open speech data for automatic speech recognition (ASR) research and speech-related applications.
There is a growing trend for people to use voice to interact with services, be it at home, at work, or in public spaces. Supporting Speech Technologies can be inaccurate at recognising and transcribing locally accented English. To solve this technology gap, IMDA introduced the National Speech Corpus (NSC).
The NSC improves speech engines’ accuracy of recognition and transcription for locally accented English. The NSC is also able to contribute to speech synthesis technology where an AI voice can be produced that is more familiar to Singaporeans, with local terms pronounced more accurately.
Benefits
As speech technology improves and with speech engines tuned to the Singaporean English accent, this will enable Singapore to keep pace with current and future advancements for speech interfaces.
For example, the Automatic Speech Recognition (ASR) may be used by telco call centres to transcribe calls for auditing and sentiment analysis purposes, chatbots can go beyond text and can accurately support our accent while replying in a familiar local accent with accurate pronunciations of street names and food.
Download the Corpus
Click here to download the Corpus.
FAQ
Contact
For further enquiries on the National Speech Corpus, please contact nsc@imda.gov.sg.