News

Kids store 1.5 megabytes of information to master their native language

March 27, 2019
By: Yasmin Anwar
Baby against binary code backdrop
New research from UC Berkeley suggests that language acquisition between birth and 18 is a remarkable feat of cognition, rather than something humans are just hardwired to do. (Courtesy image iStock)

Learning one’s native language may seem effortless. One minute, we’re babbling babies. The next we’re in school reciting Martin Luther King Jr.’s “I Have a Dream” speech or Robert Frost’s poem “Fire and Ice.”

But new research from UC Berkeley suggests that language acquisition between birth and 18 is a remarkable feat of cognition, rather than something humans are just hardwired to do.

Researchers calculated that, from infancy to young adulthood, learners absorb approximately 12.5 million bits of information about language — about two bits per minute — to fully acquire linguistic knowledge. If converted into binary code, the data would fill a 1.5 MB floppy disk, the study found.

The findings, published today in the Royal Society Open Science journal, challenge assumptions that human language acquisition happens effortlessly, and that robots would have an easy time mastering it.

“Ours is the first study to put a number on the amount you have to learn to acquire language,” said study senior author Steven Piantadosi, an assistant professor of psychology at UC Berkeley. “It highlights that children and teens are remarkable learners, absorbing upwards of 1,000 bits of information each day.”

For example, when presented with the word “turkey,” a young learner typically gathers bits of information by asking, “Is a turkey a bird? Yes, or no? Does a turkey fly? Yes, or no?” and so on, until grasping the full meaning of the word “turkey.”

A bit, or binary digit, is a basic unit of data in computing, and computers store information and calculate using only zeroes and ones. The study uses the standard definition of eight bits to a byte.

“When you think about a child having to remember millions of zeroes and ones (in language), that says they must have really pretty impressive learning mechanisms,” Piantadosi said.

Piantadosi and study lead author Frank Mollica, a Ph.D. candidate in cognitive science at the University of Rochester, sought to gauge the amounts and different kinds of information that English speakers need to learn their native language.

They arrived at their results by running various calculations about language semantics and syntax through computational models. Notably, the study found that linguistic knowledge focuses mostly on the meaning of words, as opposed to the grammar of language.

“A lot of research on language learning focuses on syntax, like word order,” Piantadosi said. “But our study shows that syntax represents just a tiny piece of language learning, and that the main difficulty has got to be in learning what so many words mean.”

That focus on semantics versus syntax distinguishes humans from robots, including voice-controlled digital helpers such as Alexa, Siri and Google Assistant.

“This really highlights a difference between machine learners and human learners,” Piantadosi said. “Machines know what words go together and where they go in sentences, but know very little about the meaning of words.”

As for the question of whether bilingual people must store twice as many bits of information, Piantadosi said this is unlikely in the case of word meanings, many of which are shared across languages.

“The meanings of many common nouns like ‘mother’ will be similar across languages, and so you won’t need to learn all of the bits of information about their meanings twice,” he said.

Click here to read study in the Royal Society Open Science journal.