Thursday, 26 April 2012

The IBM Speech-to-Text Experiment

Edited extract from Pretotype It (e-book PDF), 2nd Pretotype Edition, Alberto Savoia, Oct 2011

I probably got a few details wrong, but in this case the moral of the story is much more important than the details. A few decades ago IBM was best known for its mainframe computers and typewriters.  In those days, typing was something that a small minority of people were good at – mostly secretaries, writers and some computer programmers. Most people typed with one finger – slowly and inefficiently. IBM was ideally positioned to leverage its computer technology and typewriter business to develop a speech-to-text machine. This device would allow people to speak into a microphone and their words would “magically” appear on the screen with no need for typing. It had the potential for making a lot of money for IBM,  and it made sense for the company to make a big bet on it.

However, there were a couple of major problems.  Computers in those days were much less powerful and more expensive than today, and speech-to-text requires a lot of computing power.  Furthermore, even with adequate processing power, speech-to-text translation was (and still is) a very difficult computer science problem. Tackling it would have re-quired a massive investment – even for IBM – and many years of research.  But everyone would have wanted such a device.  It would be a sure-fire hit.  Or would it?

Some folks at IBM were not convinced that all the people and companies who had said they “wanted and would definitely buy and use” speech-to-text machines would actually end up buying them. They feared the company would end up spending years in research and lots of money developing something that very few would actually buy: a business disaster. After all, people had never used a speech-to-text system, so how could they know for sure they would want one?  IBM wanted to test the business viability of such a device, but since even a basic prototype was years away, they devised an ingenious experiment instead.

They put potential customers of the speech-to-text system, people who said they’d definitely buy it, in a room with a computer box, a screen and a microphone – but no keyboard. They told them they had built a working speech-to-text machine and wanted to test it to see if people liked using it.  When the test subjects started to speak into the microphone their words appeared on the screen: almost immediately and with no mistakes!  Actually the computer box in the room was a dummy.  In the room next door was a skilled typist listening to the user’s voice from the microphone and typing the spoken words and commands.

So, what did IBM learn from this experiment? Here’s what I’ve heard: After being initially impressed by the “technology”, most of the people who said they would buy and use a speech-to-text machine changed their mind after using the system for a few hours. Even with fast and near perfect translation simulated by the human typist, using speech to enter more than a few lines of text into a computer had too many problems, among them: People’s throat would get sore by the end of the day, it created a noisy work environment, and it was not suitable for confidential material.Based on the results of this experiment, IBM continued to invest in speech-to-text technology but on a much smaller scale – they did not bet the company on it.

As it turned out, that was the right business decision. Keyboards are proving hard to beat for most text entry tasks. Thirty years ago most people could not type; but look at any office today and you’ll see people of all ages and professions typing away.  In devices where a full-size keyboard is not possible, such as mobile phones, speech-to-text can be the  right  it, but otherwise the keyboard is still the device to beat.

No comments: