I can’t hear words in videos.
My hearing is a bit odd: I can hear the actual sound in the video quite well, and I can hear a speaker’s tone, pitch, and even emotion and cadence. But try as I may, I’m missing an aspect to my hearing that provides clarity, a critical aspect of hearing that helps turn sounds into words that can be understood.
On November 19, 2009, YouTube released a new automatic captioning service for their videos. This remarkable tool takes uploaded videos and generates a caption track for the video based on voice recognition and machine learning, which can then be automatically synced and translated into any number of languages.
Speech-to-text is hard. Very, very hard. Understanding and accurately transcribing spoken language is a huge problem that many, many companies have been dedicating a lot of time and money towards solving, and we’re a long way away. Thanks to context, our human brains are very good at taking what we hear (or rather, what we thought we heard), and filling in the blanks based on a wide range of environmental, linguistic, and interpersonal context. We have to navigate accents, gender differences, filler sounds, brain farts, and background noise. We’ve made remarkable advances in mimicking this through speech-to-text technology the last few years, but we’re a long way from a perfect solution.
Back to YouTube: whenever I’m with a group of folks and someone turns on the automatic captions on a YouTube video, the room invariably fills with jokes and derision for the quality of the captions: “The captions are so bad,” or “More like craptions!”
Meanwhile, I’m sitting there — ten years ago, and today, ten years after the launch of YouTube’s automatic captions — thinking that it’s the most remarkable, amazing, wondrous thing. I’m delighted. For me, it’s the difference between something and nothing.
Yes, they’re often terribly inaccurate. Yes, they’re a poor substitute for manually-transcribed, reliable captions that should be an expectation. Yes, they can mislead folks into thinking that since they exist, there’s no need to take further action to create inclusive content and experiences.
But, something is almost always better than nothing.
This applies to both accessibility and product strategy.
In making products inclusive, doing what you can do may be the difference between someone being delighted or excluded.
In making products successful, creating a minimal product is better than no product. If you wait until it’s perfect before you ship it, nobody can use it. If your product can’t be delivered until it’s perfect, your product may be too complex.
What’s the minimum you can do to make someone included?
What’s the minimum you can do to take a meaningful step forward in what you’re doing?
What can you do that will make a remarkable impact for someone, somewhere, if it simply existed?
Do something. Then find ways to keep making it better.