Programmers of Artificial Intelligent Speech Translation Software Are Completely Challenged

About a month ago, I took some old cassette tapes that I had, and some old videotapes that I had saved of myself giving speeches and tried to convert them into digitized writing. I had a silly, but rather clever idea; I would turn on my voice recognition software on my computer, put the headphones up against the TV, turn the TV on to an appropriate high volume, and play the video. The theory being that it would record everything that was said into the voice recognition software, and I can convert these into e-books, articles, and word files.

It didn’t work.

Next, I took some old cassette tapes from a micro cassette recorder which I had often recorded during various local community speeches, and during college presentations where I gave talks. I turned on the micro cassette recorder, and tried the same thing. It didn’t work very well either. When I looked at the word file that my speech recognition software created – it was pretty much garbage. Although, it was rather funny what it wrote.

Indeed, I had considered that perhaps my voice inflection was too great, or my voice was too animated. I had also considered that the tapes were too old, and/or it did record quite good enough to be translated into ones and zeros. Interestingly enough the other day, I read an article that made me rethink my previous strategy.

In fact, there was a great article surrounding this topic in the Wall Street Journal on May 4, 2011 by Nick Wingfield titled “Say What? High-Tech Messages Can Get Lost in Translation – Devices Make Communicating Easier or Incomprehensible; Phones Doesn’t Swear” – the article is hilarious and yet, it’s not all that funny as humans are relying on these things more and more you see?

Now then, it is obvious to me that programmers of artificially intelligent speech translation software have more work to do, and I can see they are completely challenged. There is often too much background noise, and each person speaks a slightly different dialect, accent, and it all depends on their language of origin, and the region where they came from as they developed their language.

You can imagine how difficult it is for folks to program speech recognition software and then translate it from one language to another. Obviously there are enough challenges just getting the recording to come out right with various accents, or poor audio systems. Then there is the issue with translation, and some phrases and words simply don’t match up with other languages.

It is interesting that professional translators are able to take a speech from one language and translate it into another, adjusting the phrases in each language so that it makes sense. The United Nations has some interesting translation software, and it does work pretty well, but even it is not good enough, and therefore causes communication challenges, and hurt feelings every so often.

Perhaps this will be one of the biggest challenges of artificial intelligent speech translation programmers in the future. It is my guess that it will be. Indeed I hope you will please consider all this and think on it. If you come up with any new strategies, ideas, or new concepts along this line, please contact me.