In an earlier post on 2001, I wrote:
Some say we will know we have developoed intelligent machines not when they can speak, but when they can read our lips.
Not so fast! Today’s article in the NYTimes on Google’s translator programs raises the possibility that we may get lip reading machines before intelligent ones. Oh well, many people speak before they think already!
It seems that the translators, which are pretty darn good, I think, use models of language that are augmented with, among other things, huge amounts of multi-lingual transcripts from UN meetings. The translators there are among the best – human – ones around, so their work is the gold standard. The massive database of phrases and sentences is parsed and indexed a la Google, and that’s why they do a decent job with text that strays from textbook, factual propositions. What’s to stop the Google folks from feeding in massive amounts of video of people’s mouths speaking words whcih the machine can already process with it’s voice-recognition software? It would build a model of the relationship between mouth configurations and actual phonemes, which it already knows, lip reading.