Inside Science
/
Article

Information Theory Counts Size Of Animal Languages

SEP 13, 2013
Meanings behind whale, dolphin, and bird vocabularies remain opaque.
Amanda Alvarez
Information Theory Counts Size Of Animal Languages lead image

Information Theory Counts Size Of Animal Languages lead image

LaPrimaDonna via Flickr

New research employed mathematical techniques to estimate how many “words” animals such as birds and whales use in their vocalizations. Using a discipline called information theory, it is possible to illuminate the structure and complexity of animal “language,” though it can’t yet tell us what the animals are conveying.

Human languages have been deconstructed into bits since the idea of information theory came about in the 1940s. Using the vast printed texts of our species as a database, scientists can treat words and their combinations as a signal that can be analyzed. The frequency and repetition of symbols yield a measure of the information content in human languages. The symbols in English are the 26 letters of the alphabet, plus a space character. For animal modes of communication, however, figuring out the symbols can be a bit trickier, and researchers don’t have the benefit of huge animal language libraries that they can mine.

“I would love to be able to translate dolphin speak,” said Reginald Smith. Since there is no translation app for clicks and whistles, he uses information theory to gain insights.

“Some animals use combinations of symbols or sounds with meaning, so I try to avoid calling those ‘words,’” said Smith.

Instead, he uses the term “N-gram.” As an independent investigator affiliated with the Citizen Scientists League, Smith has previously used statistical methods to probe complex linguistic systems like Meroitic, an extinct and undeciphered East African language. With human languages, studying how frequently words occur, and how the symbols within words are combined to make longer words, can tell us about how much information is being transmitted, a quantity that can be measured in bits, the same unit of information storage for the ones and zeros on a computer.

The same principle can be applied to animal communication, which is what Smith has done in a new study posted online on the scientific pre-print server arXiv . The way a letter in a word depends on those that come before it – a property known as the conditional entropy of the symbols in the sequence – can be used to estimate the number of words, or N-grams, in a language, via some complex calculations. Smith used data from previous studies that had recorded the whistles, cries, and songs of bottlenose dolphins, humpback whales, and four species of birds, including robins and European starlings.

“Dolphins had 27 whistles that they used a lot, though there were 125 different whistles used overall,” said Smith. They used these whistles in a uniform, repetitive fashion, whereas birds tended to use all the songs in their repertoire more liberally.

Starting with animal recordings, Smith first determined how much information is conveyed by a single symbol, and how this changed as a second, third, or fourth symbol, or letter, are added to the sequence. In English, for example, adding a second letter after a first conveys 4.14 bits of information, while a third has 3.56 bits, and a fourth 3.30 bits These are called the first-, second-, and third-order entropies, and describe how long symbol combinations can become while still carrying information and not becoming redundant. All the bird songs he studied appeared to be limited to the first order, indicating a lower level of complexity.

Smith then extrapolated from the estimate of a language’s complexity to its total vocabulary. For example, the dolphin’s vocabulary has approximately 36 “words,” while the figure for whales is about 23; the starling song repertoire is estimated at 119 to 202 songs. The precision of the size estimate goes down as the amount of original data Smith had to work with decreases; at each increasing order of entropy, more samples of the language are needed to create a good estimate of the number of one-, two-, and three-letter sequences, or N-grams. For whales, for example, not enough data existed to go beyond second order entropy, so Smith can’t be sure how many longer sequences there might be. He also suspects animals in captivity, like the dolphins of SeaWorld whose whistle data he used, might have less complex communication, but this would require many more samples and comparative studies to verify.

Going beyond just measuring the structure and complexity of animal languages would be a logical next step, which Smith says he will leave to animal researchers. And, he says, extracting a song or cry from a pattern is about as illuminating as plucking a word out of a sentence: “The takeaway is that we need more research into how valuable the second or third order complexity is to animals’ communication.”


Amanda Alvarez has written about science for the Milwaukee Journal Sentinel, Yale Medicine, and GigaOM. She received her PhD in Vision Science from the University of California, Berkeley, and tweets at @sci3a .

More Science News
AAS
/
Article
Researchers investigate the possibility that the off-center black hole and double nucleus of NGC 4486B can be traced to a recent supermassive black hole merger.
/
Article
WASHINGTON, March 31, 2026 — In case of an emergency, the Federal Aviation Administration requires aircraft to be able to evacuate within 90 seconds. However, as the […]
AAS
/
Article
A recent study uses high-resolution JWST observations to perform an atmospheric analysis of a rare exoplanet orbiting a dead star.
/
Article
Combining pump-probe spectroscopy with fluorescence lifetime imaging microscopy enables non-invasive ultrafast imaging of excitation dynamics
/
Article
Understanding how ingredients interact can help cooks consistently achieve delicious results.
/
Article
Strong and tunable long-range dipolar interactions could help probe the behavior of supersolids and other quantum phases of matter.
/
Article
Inside certain quantum systems, where randomness was thought to lurk, researchers—after a 40-year journey—have found order and unique wave patterns that stubbornly survive.