Text to speech

The way that blind and low vision users interact with most technology is through synthesised speech reading out the content of the screen. Text to speech is also used by a secondary audience of users who have difficulty reading. There are two aspects to this: firstly the communication itself, and secondly personalisation of the speech.

The communication consists of four elements.

  1. Focus management, i.e. a way for someone with no vision to physically navigate screens, which is best achieved through digital controls – e.g. keyboard rather than mouse, d-pad rather than moving a cursor using an analogue stick. It should cycle between the elements on the screen. Nesting can be useful for complex interfaces, i.e. navigate between whole groups of interface elements, and when selected, navigating the individual elements within a group.
  2. Communicating label, type and state of interface elements, for example a button element with a label of ‘next’, an image with a label of ‘man holding a joypad’, a checkbox labelled ‘opt in’ with a state of ‘selected’
  3. Communicating any changes to onscreen elements that aren’t the direct result of a result of a user action, e.g. a system notification appearing.

The customisation covers such things as choice of voice and choice of speech rate. Many blind users choose a synthetic sounding voice with pronounced syllables to enable them to listen at a very high speed.

Text to speech should be available across markets / languages and cover all of the system UI, including settings menus. Developing a full text to speech system is not a small undertaking, but there are existing third party products – such as NVDA – for which licencing might be possible.

Recent updates to text-to-speech functionality on Windows

Next: Ability for text to speech to read in-game UI

Intro & contents