Alien chatter: at the space party, the mind will be heard best

Application of information theory to the search for signals of extraterrestrial civilizations


If you are looking for signals of alien civilizations, then why not first practice some kind of information transmission systems that are not owned by humans that exist on our planet? Whales have a global communication system for millions of years - longer than Homo sapiens exist. Bees communicating with each other partly with the help of dances organized democratic debates about the best places for swarming for millions of years before people came up with a democratic political system. Full and other examples. None of my friends who have studied the communication system of other animals, never after that did not conclude that this species turned out to be dumber than he thought.

Studying the means of communication of animals, my colleagues and I developed a new type of detector, a “communication intelligence” filter that determines whether a signal from space belongs to a technologically advanced civilization. Most of the previous attempts in the SETI project have sought broadcasts in a narrow frequency range or fast flickering optical signals. Judging by our knowledge in the field of astrophysics, such signals would be clearly artificial, and their discovery would mean the availability of technologies capable of transmitting a signal over interstellar distances. SETIs usually throw out broadband radio signals and slow optical pulses, whose origin is less obvious. Although these signals could well have been sent by intelligent beings, they can come from natural sources of radio waves, for example, interstellar gas clouds, and we still have not had a good way to distinguish them.


Simply put, we could already get a message from sentient beings and neglect it, because it did not meet our expectations about how such a signal should look. This may be the reason that in 50 years of searching we have not found any interstellar communications.

Over the past decade and a half, my colleagues and I have come up with a better way. We applied the theory of information to the communication systems of people and animals, and now we can say for sure in which case certain living beings convey to each other complex ideas without even knowing what they are saying. We use the term “communication system” in order not to decide in advance whether other species have a language in the human sense of the word. Complex communications are subject to the general rules of syntax, from which the existence of what can be called "reasonable content" follows. If we have a large enough piece of the message, we can estimate the degree of its complexity or the structure of the rules. In the mathematics of information theory, this structure is called " conditional information entropy " and is composed of mathematical connections between elementary communication units, such as letters and phonemes . In everyday speech, we consider grammar to be such a structure, and at a more basic level - composing words and sentences from sounds. And for the first time at the SETI Institute in Mountain View in California, we began to look for such structures in the SETI data collected.

My colleagues, Brenda McCowan, Sean Hunser of the University of California, Davis, and I decided to study beings, both socially complex and highly dependent on acoustic coupling, and using sound signals that we could classify. Therefore, our first three subjects were bottlenose dolphins (Tursiops truncatus), common squirrel monkeys (Saimiri sciureus) and humpback whales (Megaptera novaeangliae).

One aspect of human linguistics, manifested in early studies of words, letters, and phonemes, is known as Zipf's law , named after Harvard linguist George Zipf . In English texts, the letters “e” are larger than the letters “t”, the letters “t” are larger than the letters “a”, and so on, up to the most rarely used “q”. If you build a list of letters from “e” to “q” in descending frequency, and build the frequency of their use on a logarithmic graph, then these values ​​will fall on a straight line inclined at 45 degrees - that is, on a straight line with a slope of -1 [Simply put, the frequency of using the n-th letter is inversely proportional to its ordinal number n / approx. trans.]. If you do the same with text composed of Chinese characters, you get the same bias. The same will be true for letters, words or phonemes for conversations in Japanese, German, Hindi and dozens of other languages. And baby talk is not subject to Zipf's law. His bias is less than -1, because he makes sounds almost by accident. But as you study the language, the bias gradually increases and reaches -1 by about 24 months.

Mathematical linguists say that this bias at -1 indicates that the sequence of sounds or characters of writing contains enough complexity to belong to a language. This is a necessary, but not sufficient condition - that is, this is the first check for complexity, but not proof of its presence. Zipf himself believed that the reason for this bias in -1 lies in the compromise, which he called the “principle of least effort”. This is a balance between the transmitting individual, who is trying to use less energy to send a signal, and the receiver, who wants to get more redundancy, to make sure that the entire message is received.

The main thing in the application of information theory is the isolation of signal units. For example, if you build all the dots and dashes of Morse code on the graph, we get Zipf’s slope of the order - 0.2. But if we take elementary units of a sequence of points and a dash — a point, a point, a dash, a dash, a point and a dash, as well as longer variations — then the slope will change to -1, reflecting how the letters of the alphabet are encoded in this system. In this way, it is possible to recognize the initial units of meaning using reverse engineering.

Most linguists assumed that Zipf's law characterizes only human languages. We are happy to find out by plotting the frequency of the whistles of the bottlenose dolphins on the chart that they also obey Zipf's law! Later, when two little bottle-nets were born in “Sea World” in California, we recorded their infant whistles and found that for them Zipf's slope corresponded to that of human babies. It turns out that babies of dolphins mumble and whistle, learning their communication system in much the same way as human babies learn a language. When the dolphins were 12 months old, the frequency distribution of sounds in their whistles reached a slope of -1.

Although we still have to decipher what the dolphins say, we have established that they and the whales have communication systems with internal complexity approaching the human language. This complexity makes communication fault tolerant. Any creature exchanging information should be able to do this, despite the surrounding noise, obstacles and other phenomena that interfere with signal propagation. Human language is structured to provide redundancy. At a basic level, this structure determines the probability of a given letter appearing. If I tell you what the word is about, you can guess that the first letter of this word will be “t”, since this is the most popular first letter for words in English. Your guess will be the most likely, but little informative. We can say that you have chosen the easiest of options. If you stopped at the letter "q", and guessed it, you would get some more accurate information about the word I planned, if it really starts with "q".

Let's go further. If I said that I think about the second letter in a word, the first letter of which is “q”, you would immediately guess that this letter is “u”. Why? Because you [the English-speaking reader] know that these two letters in English meet together with almost 100% probability. To guess the missing information, you used not only the probability of letters, but also the conditional probability associated with two letters - the probability that the word will contain the letter “u” if it is known that the letter “q” already exists. Our brain uses conditional probabilities when it is necessary to correct errors in the transmission of information - fuzzy text on a printout with a running out of a cartridge, or poorly distinguishable words in a noisy phone call.


In English, conditional probabilities can be set up to nine words in a row. If one word is missing, you can guess what the word is. If two words are missing in a row, it is still often possible to restore them from the context. A short example of a sentence without one word: “How does ___ feel today?”. It is easy to guess that you are missing. Now consider the sentence without two words: “How is ___ ___ yourself today?” There may be: “How does Innokentiy feel today?” There may be other options. Obviously, the more words are missing, the more difficult it is to understand them from the context, and the less their conditional probability. For most of the written languages ​​of mankind, the conditional probability disappears when about nine words in a row are missed. If 10 words are missing, you will have no idea what the words might be. In the language of information theory, this means that the entropy of the human word reaches the ninth order.

We found similar conditional probabilities in animal communication systems. For example, we recorded communication sounds of humpback whales in southeastern Alaska with Fred Sharp from the Alaska Whale Foundation. Humpback whales are known for their songs, which are usually recorded when they come to Hawaii for breeding. In Alaska, their sounds are very different: the sounds intended for fish penning on the net consist of gurgles and social cries, not songs. We recorded these vocalizations both in the presence and in the absence of noise from the boat. We calculated the extent to which the ocean works like static in a telephone line. We then used information theory to evaluate numerically how much whales would need to slow their vocalization in order to ensure reception of messages without errors.

As expected, in the presence of noise from the boat, the whales slowed down the speed of vocalization, just as a person slows down the phone conversation in the presence of noise. But they slowed down only 3/5 of the amount that they theoretically should have achieved in order to ensure the transfer of messages without misinterpretations. How did they manage to do this slow deceleration, when ambient noise clearly demanded more? We thought about this for some time, and then we realized that in their system of communication, the structure of the rules was probably such that the remaining two fifths of the signal could be restored. Humpback whales used the conditional probabilities of their sound analogue of words. They did not need to receive the entire message to be able to fill in the empty spaces.


We found an internal structure in the communication of dolphins. The difference was that dolphins have about 50 basic signals, and humpback whales have hundreds of them. We are now collecting information to determine what the maximum degree of entropy is for the humpback whales' communications system.

To test the possibility of our approach to separate astrophysics from sensible signals, we turned to examples from radio astronomy. When astronomers Jocelyn Bell Burnell and Anthony Hewish discovered star pulsars in 1967, they were called “MWFs,” that is, “little green men” [LGM]. Due to the clear periodicity of these radio sources, some scientists began to argue that they could be beacons of advanced extraterrestrial civilizations. With the help of Simon Johnston from the Australian National Radio Astronomy Society, we analyzed the pulsar pulses in Sails and obtained Zipf's slope of -0.3. This does not correspond to any of the known languages. In addition, we didn’t find in the pulsar signals practically any conventional-probabilistic structure. Indeed, today it is known that pulsars are natural supernova remnants. It turns out that information theory can easily distinguish between an imaginary reasonable signal and a natural source.

Now we are analyzing microwave data obtained from the SETI Allen antenna array consisting of 42 telescopes viewing in the range from 1 to 10 GHz. In addition to the usual scheme of searching for narrow-band radio waves, we begin to apply measures from information theory. If we, for example, find signals that obey Zipf’s law, it will inspire us to continue working and to look for the syntactic structure in signals in an attempt to determine the complexity of a potential message.

To transfer knowledge, even a very advanced extraterrestrial civilization will need to obey the rules of information theory. Suppose we probably will not be able to decipher such a message due to the lack of common symbols (we have the same problem with humpback whales), we will be able to get an idea of ​​the complexity of their communication system - and consequently, the thought processes. If, for example, the conditional probabilities of the signal found in SETI are of the 20th order, this will mean not only the artificial nature of the signal, but also the enormous complexity of the language compared to any terrestrial one. We will have a quantitative measure of the complexity of the thought processes transmitting information to extraterrestrial beings.

Lawrence Doyle - Director of the Institute for the Metaphysics of Physics, College of Principle, pcs. Illinois, and the organizer of the Quantum Astrophysics Group at the SETI Institute. He was a member of the Kepler mission at NASA, and led the team that discovered the first planet with a multiple orbit (called Tatooine).


All Articles