Speech Rate Conversion Technology

The world is moving at a faster pace. Transportation is faster, communication is faster, and so on. Some even feel that people are speaking faster. This makes it hard for some of us to keep pace. The Science and Technical Research Laboratories of Japan’s public broadcaster, NHK, has developed a device that they feel will help (at least with how fast people seem to be talking). They’ve created a speech rate conversion technology that is designed to be used in a couple of different ways.

Have you ever listened to a recording played at a fast speed? This is commonly done in video or audio editing to search through a large volume of material in a short time. Normally, when speech is sped up, the pitch is also changed, causing the recorded voices to sound like the chipmunk characters that some of us listened to, or watched, as children. Even when the pitch is shifted to restore it to its original level, it is often difficult to understand the individual words. The system designed by Japan’s STRL has changed that.

Their new technology maintains the original timber and pitch of the speech, while playing it back at a faster rate. The playback rate is varied adaptively, which allows listeners to comprehend what is being said even at speeds that are five times the normal rate.

This system also benefits elderly listeners by slowing down radio broadcasts or the audio in television programs. It is common for older people to have some degree of hearing impairment, for various reasons. This disability might include a difficulty in perceiving specific frequencies, a weaker cognitive capacity, or a problem understanding speech with a large amount of background noise. It has also been found that with age comes a decreased ability to comprehend rapid speech. At the same time, the population statistics of many countries have forecast that twenty years from now, twenty five percent of the radio and television audience will be sixty five years of age or older. In order to reach this audience, STRL felt that something had to be done. Since it seemed unlikely that television announcers would slow down, they applied the speech rate conversion technology to solve this problem. Again, the pitch and timber are maintained, while the rate of speech is adjusted. By shortening the pauses between sentences, the slowed speech can be presented in real time. In addition to the Japanese language, it can be used for English, German, Chinese, French, and other languages. The technology is used on NHK’s online news service, and has also been incorporated into several consumer radio and television systems. As a result, they won an Asahi Shimbun Invention Award in 2008.

The world continues to speed up, and we continue to age. Whether you’re working at a job where you need to search through large quantities of video or audio material, or you would just like to be able to understand what the news announcer is saying, this new speech rate conversion technology may make your life a little easier.