Speech Synthesis Markup Language (SSML)

What is SSML?

SSML is a markup for sentences that are intended to be read out by a Voice User Interface (VUI). Similar to how you may make text in HTML bold by writing it <b>inside b tags</b>, you can use similar tags to denote numbers, questions, and more.

Examples of SSML markup tags

While Amazon has a few original markup tags, there are also standardized ones that may find support from other speech synthesis services. Here are some examples from both:

  • amazon:effect – Let's you whisper text, for example for a game skill where you want to reveal a clue or secret to the player
  • amazon:emotion – Can change Alexa's voice to sound more excited or disappointed
  • audio: Plays a MP3 file mid-sentence, for example if you want to play an audio message
  • break: Creates pauses of various lengths
  • emphasis: Text within this tag can be made to sound stronger or softer
  • lang: Used to pronounce a foreign word in the correct foreign speech synthesis
  • say-as: Used to decide how text should be read out. For example, for pronouncing numbers you could choose between amounts, phone numbers, as individual digits, a figure or a fraction.