Dr. Dongda Chinese asked GPT-4 to use "theory of mind" to play Depu and beat traditional algorithms and crush human novices

币小白_

2023-10-15 02:38:27

Author: Shin Zhiyuan, source: Heart of the Metaverse

Suspicion Agent from the University of Tokyo uses GPT-4 to demonstrate high-order theories of mind (ToM) in incomplete information games.

In a complete information game, each player knows all the elements of information.

But the incomplete information game is different in that it simulates the complexity of making decisions in the real world under uncertain or incomplete information.

GPT-4, as the most powerful model at present, has extraordinary knowledge retrieval and reasoning capabilities.

But can GPT-4 use what it has learned to play incomplete information games?

To this end, researchers at the University of Tokyo introduced Suspicion Agent, an innovative agent that uses GPT-4’s capabilities to perform incomplete information games.

Paper Address:

In the study, GPT-4-based Suspicion Agent was able to achieve different functions through proper hint engineering and demonstrated superior adaptability in a series of incomplete information games.

Most importantly, GPT-4 demonstrated strong higher-order theory of mind (ToM) capabilities during the game.

GPT-4 can use its understanding of human cognition to predict an adversary’s thought processes, susceptibility, and actions.

This means that GPT-4 has the ability to understand others and intentionally influence their behavior like humans.

Similarly, GPT-4-based agents also outperform traditional algorithms in incomplete information games, which may stimulate more applications of LLM in incomplete information games.

01 Training method

In order to enable LLM to play various incomplete information game games without specialized training, the researchers broke down the entire task into several modules as shown in the figure below, such as the observation interpreter, game mode analysis and planning module.

And, to mitigate the problem that LLM can be misled in incomplete information games, the researchers first developed structured hints to help LLM understand the rules of the game and current state.

For each type of incomplete information game, the following structured rule description can be written:

General rules: introduction to the game, number of rounds and betting rules;

Action description: (Description of Action 1), (Description of Action 2)…;

Win-loss rules: conditions for win-loss or draw-inning;

Win-loss return rules: rewards or penalties for winning or losing a single game;

Whole game win and loss rules: number of games and overall win-loss conditions.

In most incomplete information game environments, game states are usually represented as low-level numeric values, such as click vectors, to facilitate machine learning.

But with LLM, low-level game states can be converted into natural language text, thereby helping to understand patterns:

Input description: The type of input received, such as a dictionary, list, or other format, and describes the number of elements in the game state and the name of each element;

Element description: (Description of element 11, (description of element 2),…

Transition Tips: More guidance on converting low-level game states to text.

! [beyfMqHmFbURoO6EQO5AoTFYhrYUnnA6gLdnZWWU.png] (https://img.jinse.cn/7115940_watermarknone.png “7115940”)

In incomplete information games, this formulation makes it easier to understand the interaction with the model.

The researchers introduced a nihilistic programming method with a Reflexion module designed to automatically check the history of matches, enabling LLMs to learn and improve planning from historical experience, and a separate planning module dedicated to making corresponding decisions.

However, nihilistic planning methods often struggle to cope with the uncertainty inherent in incomplete information games, especially when faced with opponents who are adept at using the strategies of others.

Inspired by this adaptation, the researchers devised a new planning approach that harnesses the ToM capabilities of LLM to understand the behavior of opponents and adjust strategies accordingly.

02 Quantitative evaluation of experiments

As shown in Table 1, Suspicion Agent outperformed all baselines, and GPT-4-based Suspicion Agent obtained the highest average number of chips in the comparison.

These findings strongly demonstrate the advantages of using large language models in the field of incomplete information games, and also demonstrate the effectiveness of the proposed framework.

The graph below shows the percentage of actions taken by the Suspicion Agent and the baseline model.

It can be observed:

Suspicion Agent vs CFR: The CFR algorithm is a conservative strategy that tends to be conservative and often folds when holding weak cards.

The Suspicion Agent successfully identified this pattern and strategically opted for more frequent raises, putting fold pressure on CFRs.

This allows the Suspicion Agent to accumulate more chips even if its cards are weak or comparable to those of CFR.

Suspicion Agent vs DMC: DMC is based on search algorithms and employs more diverse strategies, including bluffing. It often raises when its hand is weakest and strongest.

In response, the Suspicion Agent reduced the frequency of raises, depending on their own hands and observed DMC behavior, and chose to call or fold more.

Suspicion Agent vs DON: The DON algorithm takes a more aggressive stance, almost always raising with strong or intermediate cards, and never folding.

The Suspicion Agent discovered this and in turn minimized its own raises, choosing to call or fold more based on the actions of the public and DON.

Suspicion Agent vs NFSP: NFSP exhibits a call strategy, choosing to always call and never fold.

The Suspicion Agent responds by reducing the frequency of fills and choosing to fold based on the actions observed by the community and NFSP.

Based on the above analysis results, it can be seen that Suspicion Agent is highly adaptable and can exploit the weaknesses of the strategies adopted by various other algorithms.

This fully illustrates the reasoning and adaptability of large language models in imperfect information games.

03 Qualitative assessment

In qualitative evaluation, the researchers evaluated Suspicion Agent in three incomplete information game games (Coup, Texas Hold’emLimit, and Leduc Hold’em).

Coup, Chinese translation is a coup, a card game in which players play as politicians trying to overthrow other players’ regimes. The goal of the game is to survive in the game and accumulate power.

Texas Hold’em Limit, or Texas Hold’em Limit, is a very popular card game with several variants. “Limit” means that there is a fixed cap on each bet, which means that players can only place a fixed amount of bets.

Leduc Hold’em is a simplified version of Texas Hold’em for the study of game theory and artificial intelligence.

In each case, the Suspicion Agent has a Jack in their hands, while the opponent either has a Jack or a Queen.

Opponents initially choose to call rather than raise, implying that they have a weaker hand. Under the normal planning strategy, the Suspicion Agent selects call to view the public cards.

When this reveals that the opponent’s hand is weak, the opponent quickly raises the bet, leaving the Suspicion Agent in an unstable situation, as Jack is the weakest hand.

Under the first-order theoretical mental strategy, the Suspicion Agent chooses to fold in order to minimize losses. This decision is based on observing that opponents usually call when they have Queen or Jack in their hands.

However, these strategies fail to take full advantage of the speculative weaknesses of the opponent’s hand. This drawback stems from the fact that they don’t consider how the Suspicion Agent’s actions might affect the opponent’s reaction.

In contrast, as shown in Figure 9, simple hints allow the Suspicion Agent to understand how to influence the adversary’s actions. Intentionally choosing to raise puts pressure on opponents to fold and minimize losses.

Therefore, even if the strength of the hands is similar, the Suspicion Agent is able to win many games and thus win more chips than the baseline.

In addition, as shown in Figure 10, in the event of an opponent’s call or response to a raise from the Suspicion Agent (which indicates that the opponent’s hand is strong), the Suspicion Agent quickly adjusts its strategy and chooses to fold to prevent further losses.

This shows the excellent strategic flexibility of Suspicion Agent.

04 Ablation studies and component analysis

To explore how different order ToM perception planning methods affect the behavior of large language models, the researchers conducted experiments and comparisons on Leduc Hold’em and plaagainst CFR.

Figure 5 shows the percentage of actions of Suspicion Agents with different ToM level planning, and the chip yield results are shown in Table 3.

Table 3: Comparison results of Suspicion Agent against CFRonLeduc Hold’em environments using different levels of ToM and quantification results after 100 games

It can be observed:

Based on the Reflexion modulevanilla plan, there is a tendency to call and pass more during the game (the highest percentage of call and pass against CFR and DMC), which cannot exert pressure on the opponent to fold and leads to many unnecessary losses.

However, as shown in Table 3, the Vanilla program has the lowest chip gains.

Using a first-order ToM, the Suspicion Agent is able to make decisions based on their own power and estimates of their opponent’s power.

As a result, it will raise more times than the normal plan, but it tends to fold more times than other strategies in order to minimize unnecessary losses. However, this cautious approach can be exploited by savvy rival models.

For example, DMC often raises when holding the weakest hand, while CFR sometimes even raises when holding an intermediate hand to put pressure on the Suspicion Agent. In these cases, the Suspicion Agent’s tendency to double down can lead to losses.

In contrast, Suspicion Agent is better at identifying and exploiting patterns of behavior in rival models.

Specifically, when the CFR has chosen a card (usually indicating a weak hand) or when the DMC has passed (indicating that its hand is not consistent with the community card), the Suspicion Agent will bluff to induce the opponent to fold.

As a result, Suspicion Agent showed the highest fill rate among the three planning methods.

This aggressive strategy allows the Suspicion Agent to accumulate more chips even with weak cards, thereby maximizing chip gains.

To assess the effects of rear-view observation, the researchers conducted an ablation study in which rear-view observation was not incorporated into current games.

As shown in Tables 4 and 5, the Suspicion Agent maintains its performance advantage over the baseline method without rearview observation.

Table 4: Comparative results illustrate the impact of incorporating opponent observations into the history of the hand in the context of the Ledek hand

Table 5: Comparison results show that when the Suspicion Agent plays against CFR in a Leduc Hold’em environment, the impact of opponent observations is added to the game history. The result is a winning and losing chip after 100 rounds using different seeds, with the number of winning and losing chips ranging from 1 to 14

05 Conclusion

Suspicion Agent does not have any specialized training, and only uses GPT-4’s prior knowledge and reasoning ability to defeat algorithms trained specifically for these games, such as CFR and NFSP, in different incomplete information games such as Leduc Hold’em.

This shows that large models have the potential to achieve strong performance in games with incomplete information.

By integrating first- and second-order theoretical mental models, the Suspicion Agent can predict the behavior of its opponents and adjust its strategy accordingly. This makes it possible to adapt to different types of opponents.

Suspicion Agent also demonstrates the ability to generalize across different incomplete information games, allowing decisions to be made in games such as Coup and Texas Hold’em based solely on the rules of the game and the rules of observation.

But Suspicion Agent also has certain limitations. For example, the sample size of the evaluation of different algorithms is small due to computational cost constraints.

As well as the high cost of inference, which costs nearly $1 per game, and the output of the Suspicion Agent is highly sensitive to prompts, there is a hallucination problem.

At the same time, when it comes to complex reasoning and calculations, Suspicion Agent also performs unsatisfactorily.

In the future, Suspicion Agent will improve computational efficiency, reasoning robustness, and support multimodal and multi-step reasoning to achieve better adaptation to complex game environments.

At the same time, the application of Suspicion Agent in incomplete information game games can also be migrated to the integration of multimodal information in the future, simulating more realistic interactions and extending to multi-player game environments.

Resources:

Source: Golden Finance

View Original

Disclaimer: The information on this page may come from third parties and does not represent the views or opinions of Gate. The content displayed on this page is for reference only and does not constitute any financial, investment, or legal advice. Gate does not guarantee the accuracy or completeness of the information and shall not be liable for any losses arising from the use of this information. Virtual asset investments carry high risks and are subject to significant price volatility. You may lose all of your invested principal. Please fully understand the relevant risks and make prudent decisions based on your own financial situation and risk tolerance. For details, please refer to Disclaimer.

Comment

0/400

No comments