28 Jan 2019
|
World innovation news
|
Information and Communications Technologies
Robots Will Learn Language Like Children Do


Header image purchased on Istock.com. Protected by copyright.
If your smart personal assistant does not understand a request, it’s because it did not learn to speak like humans. This finding will be addressed through a new technology that will optimize learning and use of natural language in robots. Researchers at the Massachusetts Institute of Technology’s Computer Science and Artificial Intelligence Laboratory (CSAIL) have created a system that allows robots to learn language like children do. The innovation aims to optimize verbal communication between robots and humans.
Syntactic and Semantic Parsers
The concept is inspired by the most recent theories of language learning in humans. In the 1990s, Swiss psychologist Jean Piaget put forth a first explanation of language acquisition. It is an interactionist cognitive theory stating that children learn to speak by interacting with people and the world they live in.
In computer science, natural language recognition is provided by syntactic and semantic parsers. However, these do not take into consideration the fact that language is inseparable from its material environment and its social context. How do the parsers work?
These systems are designed by humans to decode and understand natural language. They can recognize previously annotated language units to understand the meaning of words and the structure of a sentence. Because of these capabilities, they are used in web searches, queries in natural language databases and voice recognition systems like Alexa and Siri. This video explains the semantic labeling, or parsing, principle.
Given the multiplicity of oral and written practices in natural language, syntactic and semantic parsing is extremely complex. Furthermore, these programs are sometimes unable to understand ambiguous sentences or indecipherable structures. Research co-author Andrei Barbu notes that people often communicate using disjointed sentences and personal turns.
Innovation Based on Video Learning
The technology designed by the team will allow machines to learn natural language the way children do. The method will optimize language comprehension, whatever its formulation, taking into account the context of enunciation. To achieve this, the researchers used an approach that gives less language training to parsers and includes visual learning. They actually invented the first parser that can learn language by watching captioned videos showing people in real life situations. To this end, they equipped the machine with an artificial vision system trained to recognize objects and humans in action. Videos are used in cases where the robot detects an ambiguity in a sentence. So, for example, if the parser is unsure of the meaning of an action or an object in a sentence, it can refer to the video to understand it. This improves its ability to grasp the subtleties of everyday language use.
In order for the system to decipher visual data, the researchers prepared a database of 400 videos showing people performing several types of actions, such as moving towards an object, taking it and putting it back. A web-based outsourcing platform subsequently provided 1,200 captions for these videos, of which 840 were used for training and fine-tuning purposes, and 360 for testing.
The computer vision algorithm examines each video image to track objects and people over time in order to capture the relationship between current actions and the caption description. During training, the researchers asked the parser to determine if a sentence accurately described a given video. Exposing the system to situations that are similar allows to fine tune its acquisition of words and sentences. From the training, the machine acquires its own syntactic and semantic grammar. Faced with a new sentence, the parser no longer needs videos; it uses its own grammatical and lexical knowledge to guess the structure and meaning.
In the future, the team wants to explore sensory and cognitive aspects of language acquisition in children. In fact, because small children learn by interacting with their environment, researchers want to design a system that uses perception to learn natural language.
The study entitled “Grounding language acquisition by training semantic parsers using captioned videos”, co-written by Candace Ross, Andrei Barbu and Yevgeni Berzak, was presented at the Conference on Empirical Methods in Natural Language Processing

Hanen Hattab
Hanen Hattab is a PhD student in Semiology at UQAM. Her research focuses on subversive and countercultural arts and design practices such as artistic vandalism, sabotage and cultural diversions in illustration, graphic arts and sculpture.
