AI has mastered some of the most complex games known to man, but while it often excels in competition, cooperation is not so obvious. Now an AI from Meta has mastered the game Diplomacy, where you have to team up with other players to win.
Google’s mastery of the game go was hailed as a major milestone for AI, but despite its undeniable complexity, it is in many ways well suited to the cold, calculating logic of a machine. It’s a game of perfect information, where you have full visibility of your opponent’s moves, and winning simply means you can outsmart another player.
Diplomacy, on the other hand, is a much messier affair. The board game sees take up to seven players about European military powers and use their armies to take control of strategic cities. But players may negotiate with each other to form and break alliances in the pursuit of total domination.
In addition, all players’ moves are made simultaneously on every turn, so you can’t just react to what others are doing. This means that winning games require a complex combination of strategic thinking, the ability to collaborate with other players and persuasive negotiation skills. While AI has already mastered pure strategy, those other skills have proven much trickier to replicate.
A new AI designed by researchers at Meta may have taken a big step in that direction. In a paper published last week in Science, they describe a system called Cicero that ranked in the top 10 percent of players in an online diplomacy league and achieved more than double the average score of the human players.
“Cicero is resilient, relentless and patient,” said three-time World Diplomacy Champion Andrew Goff SAID card in a video produced by Meta. “It plays without much of the human emotion that sometimes makes you make bad decisions. It simply assesses the situation and makes the best decision, not only for itself, but also for the people it works with.”
To create Cicero, Meta researchers had to combine state-of-the-art AI methods from two different subfields: strategic reasoning and natural language processing. At its core, the system has a planning algorithm that predicts other players’ moves and uses it to determine its own strategy. This was trained by having the AI play itself over and over again, while also trying to mimic the way humans play the game.
The researchers had already demonstrated that only this planning module was capable of doing this beat human pros in a simplified version of the game. But in this latest study, the team combined it with a large language model that was trained on massive amounts of text from the Internet and then refined using dialogue from 40,000 Diplomacy online games. This gave the upgraded Cicero the ability to interpret messages from other players as well and also tinker yourself messages persuade to work together.
The combined system starts by using the current state of the board and previous dialogues to predict what each player is likely to do. It then devises a plan of action for itself and its partners before generating messages designed to outline its intent and ensure the cooperation of other players.
Over 40 games in the online tournament Cicero effectively communicated with 82 other players to explain his intentions, coordinate actions and forge alliances. Crucially, the researchers say they saw no evidence from in-game messages that human players suspected they were teaming up with an AI.
However, the model’s communication skills were not impeccable. It’s more than capable of spewing out nonsensical messages or messages that don’t align with its targets, so the researchers had to generate multiple candidate messages with each move and then use different filtering mechanisms to get the garbage out. And even then, the researchers admit that sometimes illogical messages slip outped by means of.
This suggests that the language model at the heart of Cicero still doesn’t really understand what’s going on, simply producing plausible-sounding messages that then need to be vetted to ensure they achieve the desired results.
Sign up The conversation, AI researcher Toby Walsh of the University of New South Wales in Australia also notes that Cicero is unfailingly fair, unlike most human players. While this is a surprisingly effective strategy, it can be a major weakness if competitors find out that their opponent will never try to trick them.
The progress is significant, however, and Facebook hopes it can have applications far beyond board games. In a blog postthe researchers say the ability to use scheduling algorithms to control language generation could make it possible to have much longer and richer conversations with AI chatbots or create video game characters that can adapt to a player’s behavior .
Image credit: Mabel Amber / 4008 images