About

There is a growing trend for people to use voice to interact with services, be it at home, at work, or in public spaces. Supporting Speech Technologies can be inaccurate at recognising and transcribing locally accented English. To solve this technology gap, Digital Services Lab introduced the National Speech Corpus (NSC) and Natural Speech and Transcription Technologies (NSTT).

The NSC improves speech engines’ accuracy of recognition and transcription for locally accented English. As a suite of complementary tools to the NSC, the NSTT aims to ease and expedite the development of local speech applications.

Enhanced with the NSC, IMDA-DSL technologies coupled with open-sourced and commercially available tools allow individuals and businesses to be speech-enabled within weeks through the following technologies:

  • Speech Activity Detection Engine (SADE)
  • Automated Speech Recognition Engine (ASR)
  • Speech Synthesis Mark-up Language (SSML) Module for Text-to-Speech Engine (TTS)

Benefits

For Technology Partners and Developers specifically in the Technology, Media and Telecom (TMT) sector to augment and enhance their current technological offerings with speech technologies, adapted for local use. Using the complimentary technologies as a baseline, interested parties from and beyond the TMT sector are welcome to experiment with introducing speech interfaces into their products, enhancing user experience.

For example, the ASR may be used by telco call centres to transcribe calls for auditing and sentiment analysis purposes.

Additionally, media companies can easily use Script Assisted Subtitling to add subtitles to videos. This solution had been successfully adopted and deployed within the TMT sector.

Download the Corpus

Click here to download the Corpus.

Demo

Commercial ASR Engine Trained on NSC

Enhanced with the NSC, this ASR Engine is able to more accurately transcribe locally accented English.

Click here to Demo

Speech Synthesis Mark-up Language (SSML) Module for Text-to-Speech Engine (TTS)

This demo displays the application possibilities of a Singapore English accented TTS engine. The engine uses data released under the NSC. Using the Lexicon provided in the NSC, pronunciations of local terms are more accurate.

SSML converts text input into a manner for TTS engines to pronounce local terms. Developers who are looking to provide an authentic locally accented audio response for text input may use it.

Click here to Demo

Speech Activity Detection Engine (SADE)

This demo is able to remove non-speech audio inputs such as music and silence from the uploaded audio file, producing individual audio snippets of speech. Through detecting only speech in audio files, SADE reduces audio file size and improves speech recognition accuracy for speech application development.

Click here to Demo

Illustration of NSTT through Food Ordering Demo

This demo displays how the NSTT may replace the conversational dialogue between customers and customer service providers. It is capable of making, repeating, changing, deleting and confirming orders. The NSTT may be applied to various uses in banking, servicing and telemarketing.

Click here to Demo

NSTT Microservices

This NSTT Microservices Demo is meant to showcase loosely coupled services that may be developed, deployed and maintained independently by licensees. Each service is responsible for a discrete task and communicates with other services through APIs to solve large and complex business problems. The benefits include:

  • Ease of developing speech applications
  • Add new capabilities and scale services effortlessly

Click here to Demo

NSTT Components

Description: NSTT is a suite of technologies licensed under IMDA’s Technology licensing terms. Technology partners are welcome to contact Digital Services Lab to gain licences to access the source codes. 

Open Source Kaldi ASR Engine

This ASR Engine is trained using an acoustic model comprised of data from the NSC which transcribes audio speeches into text. Developers who are looking to provide an on-site ASR solution may adopt this engine to enhance security and privacy.

Click here (557.86KB) for more information

Speech Synthesis Mark-up Language (SSML) Module for Text-to-Speech Engine (TTS) 

This module deciphers local words by checking the existence of the word in the local database and transcribing the words into an International Phonetic Alphabet (IPA) format. The module is then fed into the TTS engine for audio playback. Developers who are looking to produce custom speech playback solutions may adopt this module into their product.

Click here (244.18KB) for more information

Speech Activity Detection Engine (SADE)

This audio extraction tool distinguishes speech from non-speech audio inputs, e.g. music and silence, amongst others. It reduces audio file size and improves the accuracy of speech recognition during the development of speech applications.

Click here (230.76KB) for more information

Illustration of NSTT through Food Ordering Demo

This toolkit contains intents and taxonomy for implementing a basic food ordering use case. Developers may adopt the methodology and redesign use cases for other applications such as a speech-enabled robot or chatbots.

Click here (981.69KB) for more information

FAQ

1. What are the cost associated to using the NSC?

The NSC is made available via the Singapore Open Data Licence.

2. Do I need to have a Dropbox account to download the NSC?

Yes, users who are interested to download the NSC will require a Dropbox account.

3. How big is the NSC?

Currently, the NSC is 1TB and is expected to grow over time.   

4. Will there be future updates to the corpus?

The NSC will be continually updated over time. The current release version is V2.0.

5. Will there be future updates to the Natural Speech and Transcription Toolkit? 

NSTT will be updated on an ongoing basis, with updates released to the community when available.

6. What are the terms of use for the NSTT?

If you are keen to obtain the source codes for the NSTT, please reach us at our email. The transfer of the source codes will require a Technology Licensing Agreement with IMDA. For more information on terms & conditions, please refer to the licence agreement forms here (523.48KB).

Contact

For further enquiries on the National Speech Corpus or the Natural Speech and Transcription Technologies (i.e. NSTT licensing), please contact DSL_Tech@imda.gov.sg.

Last updated on: 18 Feb 2020