Since the days of “ Ornitopera
”, Da Vinci, the greatest minds of mankind have drawn inspiration from the natural world. In the modern world, nothing has changed, and the latest achievements in machine learning and artificial intelligence were created on the basis of the most advanced computational body: the human brain.
Imitating our gray matter is not just a good idea in creating more advanced AI. It is absolutely necessary in their further development. Neural networks based on deep learning, such as in AlphaGo
, as well as the current generation of pattern recognition
systems are the best machine learning systems we have developed to date. They are capable of incredible things, but still face significant technological difficulties. For example, the fact that they need direct access to large data sets to learn a particular skill. Moreover, if you want to retrain a neural network to perform a new task, you essentially need to erase its memory and start from scratch — a process known as “catastrophic forgetting.”
Compare this with the brain of a person who learns gradually, and does not immediately arise from a heap of data. This is a fundamental difference: AI based on deep learning is generated from top to bottom, knowing everything that is needed from the very beginning, while the human mind is built from scratch, when previous lessons applicable to new experience are used in creating new knowledge.
Moreover, the human mind performs relational reasoning particularly well based on logic, and builds connections between past experience in order to understand new situations on the fly. Statistical AI (i.e., machine learning) is able to imitate brain skills in pattern recognition, but does not work when logic is applied. Symbolic AI, on the other hand, can use logic (assuming that he has been trained in the rules of this reasoning system), but is usually unable to apply this skill in real time.
But what if we can combine the best possibilities of the computational flexibility of the human brain with the wide possibilities of
processing AI? This is exactly what the team from DeepMind tried to do. They created a neural network capable of applying relational reasoning to their tasks. It works in much the same way as the network of brain neurons. While neurons use different connections to each other to recognize patterns: “We are clearly forcing the network to discover relationships that exist between pairs of objects in a given scenario,” said Timothy Lillicraft, DeepMind scientist, in Science Magazine
When she was
asked in June to ask difficult questions
about the relative positions of geometric objects in an image, for example: “Is there an object in front of a blue object, does it have the same shape as the tiny blue thing to the right of a gray metal ball?” She correctly identified the object in 96% of cases. Conventional machine learning systems provided the right solution in 42–77% of cases. Even people passed the test only in 92% of cases. That's right, this hybrid AI does a better job
than the people who built it.
The results were the same when the AI presented problems with the word. While conventional systems were able to comply with DeepMind on simple questions such as “Sarah has the ball. Sarah enters her office. Where is the ball? ”The hybrid AI system was out of competition on more complex issues like:“ Lily is a swan. Lily is white. Greg is a swan. What is Greg's color? ”DeepMind answered correctly in 98% of cases compared to about 45% of competitors.
DeepMind even works on a system that “remembers” important information
and applies this accumulated knowledge to future requests. But IBM takes two more steps forward. In the two research papers presented at the 2017 International Joint Conference on Artificial Intelligence
held in Melbourne (Australia) last week, IBM presented two studies: one examines how to provide the “attention” to the AI
, and the other to examine the question of the application of the biological process of neurogenesis
, that is, the birth and death of neurons - for machine learning systems.
“Training of a neural network is usually designed, and a lot of work is needed to actually create a specific architecture that works best. Almost a trial and error method, ”told Engadget Irina Risch, a researcher at IBM. "It would be nice if these networks could build themselves."
The attention algorithm IBM informs the neural network about which inputs provide the highest reward. The higher the reward, the more attention will be paid to them by the network. It is especially useful in situations where the data set is not static — that is, in real life. “Attention is a mechanism based on remuneration; it is not just something not related to decision making and our actions,” said Risch.
“We know that when we see an image, the human eye usually has a very small field of view,” Rish said. “So, depending on the resolution, you only see a few pixels of the image [sharp], but everything else looks blurry. The fact is that you quickly move the view, and the assembly mechanism of the various parts into an image in the correct sequence will allow you to quickly recognize the image. ”
The first use of the attention function will most likely be in pattern recognition, although it may be used in various fields. For example, if you train an AI using the Oxford dataset
, which is mainly architectural images, it can easily identify cityscapes correctly. But if you show him a bunch of pictures from the countryside (fields and flowers, etc.), the AI will be confused because he does not know what flowers are. And when you do the same tests with people and animals, you will stimulate neurogenesis, because their brains are trying to adapt what they already know about cities to the new landscapes of the country.
The mechanism tells the system what it should focus on. For example, take your doctor, she may conduct hundreds of possible tests for you to determine what is bothering you, but this is impossible: either by time or at reasonable prices. So, what questions should she ask and what tests should she do to get the best diagnosis in the least amount of time? “This is what the algorithm is learning to figure out,” Risch explained. He not only determines which solution leads to the best result, but also finds out where to look for the data. Thus, the system not only makes better decisions, but also makes them faster, because it does not request parts of the data set that are not applicable to the current problem. Just as your doctor does not touch your knees with a strange little hammer when you come with complaints of chest pain and shortness of breath.
While the attention system is convenient for ensuring that the network performs the task, IBM's neural plasticity work (how well “memories” are remembered) is used to provide long-term network memory. This is actually modeled on the basis of the same mechanisms of birth and death of neurons that are observed in the human hippocampus.
In this system, “you don't have to simulate millions of parameters,” explained Risch. "You can start with a much smaller model, and then, depending on the data you see, they will adapt."
When new data is presented to it, the neurogenetic system of IBM begins to form new improved connections (neurons), while some of the older, less useful ones will be cut off, as Rish said. This does not mean that the system literally deletes the old data, it just becomes less attached to it - just like your old memories tend to become fuzzy over the years, but those that carry a significant emotional load remain bright in for many years.
“Neurogenesis is a way of adapting deep networks,” said Risch. "The neural network is a model, and you can build this model from scratch, or you can change this model when needed, because you have several layers of hidden neurons, and you can decide how many (neurons) you want to have ... depending on the data. "
This is important because you do not want infinite expansion of the neural network. If this were to occur, the data set would become so large that it would not be possible even for the digital equivalent of hyperthymesia. “It also helps with normalization, so [AI] does not 'change the mind' data,” Rish said.
Taken together, these achievements could be very useful for the AI research community. Risha’s team wants to work on what they call “inner attention.” You will not only choose which data you want to see on the network, but which parts of the network you want to use in the calculations based on the data set and inputs. At the heart of the attention model will cover a short-term, active, thought process, while part of the memory will allow the network to optimize its function depending on the current situation.
But do not expect that in the near future the AI will be able to compete with human consciousness, Rish warns. “I would say at least several decades - but, again, this is only an assumption. What we are doing now, in terms of highly accurate pattern recognition, is still very, very far from the basic model of human emotions, ”she said. "We just started."