Despite its age, the game
Pac-Man , its different versions, still enjoys wildly popular. Fans of the game regularly compete with each other in who will score the most points during the passage. AI developers are also studying it, improving the skills of their systems in passing games.
A development team from a Canadian startup Maluuba, which was previously acquired by Microsoft,
managed to create a software platform that showed extremely good results when Ms. Pac-Man, the version of this game for the Atari 2600. The program has broken all previously set records, scoring 999,990 points.
By the way, the game
was created in 1982. Then the company Atari Inc. released the port of the hit arcade game Pac-Man from Namco. As in the original version, you need to control the protagonist with the help of a joystick. The main character must move through the maze, gaining points, while avoiding ghosts. The creator of the ported game is Tod Fry.
According to some experts, the fact that the software platform scored the maximum points in the game is a significant achievement in the field of artificial intelligence (its weak form). The fact is that other software platforms showed much less successful results - the game was quite difficult for AI. Of course, not all developers created their projects on the basis of taking a record in Ms. Pac-Man, but those who worked on it, talked about a large number of technical problems when trying to show any meaningful results.
In order to achieve success, the Maluuba team decided to split the game into a group of small elements, with the search for solutions to interact with each of them. Further, these elements were distributed among AI agents, each of which solved a specific task. Doina Prekap, a professor at McGill University in Montreal, said that the idea proposed by the developers deserves attention. In her opinion, this is exactly how the human brain works in some cases - it breaks down some problem into several components and solves each element in turn.
The developers called their method the Hybrid Reward Architecture, it uses more than 150 AI agents, each of which works in parallel with the others in the course of the game. Moreover, each of the agents receives a "reward" with the successful passage of their game site.
In addition to the “small” agents, there is another top-level agent who consolidates all the data received from all the “subordinates”. He also decides where to move the figure of the hero. This analyzes a large amount of data. The main factor is the direction of movement chosen by the maximum number of elementary agents. On the other hand, if 100 agents want to go to the right to get a trophy, but 3 agents plan to turn left because they have noticed the ghost, then the main agent will “listen” to these three.
According to the main developer, the system began to show the best results after the agents were programmed for selfish behavior. That is, each of them makes a decision without looking at the others. But the general decision on the direction of movement of the hero is already the main agent. He assesses various factors and commands where to go.
“This is a balance between, on the one hand, the need for interaction, and, on the other hand, the need to make individual decisions,” said Harm van Sageen, head of the Microsoft research group.
But why was it generally decided to choose Ms. Pac-man? Choosing this game may look weird for an ordinary person. But experts say that there is nothing strange here, since this class of games is rather complicated for machine intelligence, since there are quite a lot of “abnormal” situations per unit of time, each of which needs its own scenario. And to find a solution, you need to “think” almost as a person, as mentioned above.
“Many companies are working on AI for games, developing their own projects, since many human qualities are needed to get through,” said Raul Mehrotra, one of the Maluuba programmers.
The software platform developed by Microsoft
operates according to a machine learning method called reinforcement learning. In the course of work, the test system (agent or agents) is trained by interacting with the environment. This, by definition, is a type of cybernetic experiment. The reinforced learning method is one of the varieties of learning with a teacher, but the environment or its model plays the role of a teacher here. The agent acts on the environment, and she, in turn, on the agent. There is a feedback.
The project itself is not a pure theory. According to the developers, the created software platform can be used in many areas. For example, it can be attracted to work within a company that sells, in order to predict the dynamics of the influx of customers, the popularity of products and other important things. The system can work with both general trends and individual factors, including individual customers.