Skip to main content

Can Your Doorbell Be Spoofed?

Image
Palash Koutu
Palash Koutu
Customer Support Engineering Manager
Published: June 1, 2024

What is Voice Anti-Spoofing and why is it important?

Voice anti-spoofing is a set of techniques designed to prevent scam attempts that involve mimicked voices and improve the overall UI/UX experience of VUI systems by preventing accidental triggers. These techniques are particularly important to prevent issues related to:

  • Speech Synthesis (SS): This type of attack employs a computer-simulated voice
  • Voice Conversion (VC): In this attack, an impostor's voice is made to sound as close as possible to the voice of the targeted individual using filters and other tools
  • Replay Attack (RA): Fraudsters use a pre-recorded sample of the victim's voice
  • Impersonation: The attacker mimics the victim's voice tonality, prosodic features, and vocabulary among other characteristics
  • Nuisance triggers: This issue arises when an artificial voice accidentally triggers the system, thereby creating inconvenience to the user
Image

These attacks and issues can pose a significant disruption to the flawless experience of using voice systems and hence demand a robust solution.

How does voice anti-spoofing work?

Voice anti-spoofing works by detecting and preventing voice-spoofing attacks, which can involve recorded, computer-generated, or computer-modified voices. Here are some key components of how it works:

Image
Anti-spoofing solution components
Figure 1. Anti-spoofing solution components
  • Keyword detection: The system needs to be trained to identify when someone is talking or triggering commands. For example: "Hi Renesas" to trigger the system.
  • Feature extraction: The system extracts specific features from the input speech signal, such as timbre, articulation, intonations, and lexical behavior
  • Spoof speech detection (SSD): This set of measures is used to identify and prevent a voice spoof attack. For example, replay attacks create certain signal artifacts that are sometimes indistinguishable by a human ear but advanced algorithms find and identify such artifacts to accurately determine liveness.
  • Classification: After extracting the features, a classifier is used to classify the speech into genuine or recorded

By using these techniques, voice anti-spoofing systems can effectively combat distinct types of voice spoofing attacks and enhance the overall UX experience…in addition to assuring smart doorbell users everywhere that it really is your neighbor at the front door. 

Renesas Application Example

Renesas' Voice Anti-spoofing is engineered for speed and responsiveness while maintaining high accuracy and is completely done at the edge. We combine hardware across the RA MCU family (RA6, RA4, RA2 series) and RX MCU family with the Cyberon voice stack to identify the trigger/wake word and then use Reality AI generated models to check for real vs recorded voice in the signal.

Renesas' Reality AI model uses "Hi Renesas" as a wake word. Users may speak with any common spoken English accent and natural vocal tonal quality (male or female) to use this solution. Our testing benchmarked the model to be 96% accurate with recorded voice played from a phone speaker (iPhone or Android) and ~99% accurate on training K-Fold validation.

Image
e2 studio solution workflow
Figure 2. e² studio solution workflow

How did we create the application example?

Utilizing Renesas' IDE, e² studio, a user can collect data, integrate Cyberon's Voice Stack for wake-word detection (Hi Renesas), and finally integrate any AI models generated using the Reality AI Tools® module. 

 

Image
e² studio – Reality AI Tools Integration workflow
Figure 3. e² studio – Reality AI Tools integration workflow

We collected real (recorded via Renesas hardware microphone) and recorded data across a small set of people. This data was fed to Reality AI's feature extraction and training engine to develop and output a model. We achieved ~99% training K-Fold accuracy which prompted us to select the model for live testing and benchmarking.

The model was then integrated back into the e² studio project and extensively tested in live office settings with people not included in the training set for benchmarking achieving 96% accuracy.

Image
Reality AI Tools Training Results
Figure 4. Reality AI Tools Training Results

The adaptation of this application example in your VUI-based system will lead to further adaptation-based needs which are simplified using the Voice Anti-spoofing application example as a reference. For further information, you'll find development resources on the Reality AI Tools page or contact your local sales representative.

Conclusion

Renesas' anti-spoofing application example demonstrates the Reality AI Tools' capability to address real-world challenges, improve user experience, and enhance voice user interface (VUI) systems with additional features. Our AI models have a small footprint and the flexibility to expand by utilizing extensive data collection.

Share this news on