蒙特卡洛树求解五子棋

love you joyfully

已于 2025-01-03 11:07:48 修改

阅读量1.5k

点赞数 20

CC 4.0 BY-SA版权

分类专栏：深度学习基础文章标签：算法人工智能 python 强化学习

于 2025-01-03 10:55:34 首次发布

本文链接：https://blog.csdn.net/weixin_60223645/article/details/142332665

蒙特卡洛树求解五子棋

蒙特卡洛树搜索（Monte Carlo Tree Search，简称MCTS）是一种基于模拟的搜索算法，常用于解决决策过程中的优化问题，特别是在那些具有庞大搜索空间且难以用传统方法（如动态规划）有效解决的问题中。MCTS通过从初始状态开始，模拟多个可能的游戏或决策过程，逐步构建搜索树，并利用随机采样来估计各状态或动作的价值，以此来指导搜索过程。蒙特卡洛树搜索的核心思想是利用随机模拟（即蒙特卡洛方法）来估计节点的价值，蒙特卡洛树搜索因其简单高效的特点，在游戏AI领域（如围棋程序AlphaGo）和其他决策优化问题中得到了广泛应用。它通过逐步构建搜索树并利用随机模拟来优化决策，特别适合处理那些难以用精确数学模型描述的问题。

注意：本文用到了PyTorch库，gym强化学习环境库，需要提前安装。

gym环境安装：https://github.com/Farama-Foundation/Gymnasium
gym环境介绍文档：https://gymnasium.farama.org/environments/classic_control/mountain_car/
pytorch官网：https://pytorch.org/

本文所使用的python版本为3.11.8

step1：gym创建五子棋环境

如何利用gym构建自己的环境，可以参考官方地址（创建你自己的环境）：https://gymnasium.farama.org/tutorials/gymnasium_basics/environment_creation/

下面我们简单设计了一个五子棋环境：

import numpy as np  
import gymnasium as gym  
from gymnasium import spaces  
import matplotlib.pyplot as plt  
  
class GomokuEnv(gym.Env):  
    '''
    五子棋强化学习交互环境
    '''
    metadata = {
   
   'render.modes': ['human']}  
  
    def __init__(self, board_size=15, win_length=5, player=1):  
        '''
        board_size为棋盘尺寸
        win_length为获胜条件，即连续多少个棋子连成一线
        '''
        self.board_size = board_size  # 棋盘尺寸
        self.win_length = win_length  # 获胜长度
        self.board = np.zeros((board_size, board_size))  # 创建棋盘
        self.current_player = 1  # 1 for player 1, -1 for player 2  
        self.done = False  # 判断游戏是否结束
        self.winner = None  # 获胜者
        self.player = player  # 玩家
  
        # Define action and observation spaces  
        self.action_space = spaces.Discrete(board_size * board_size)  
        self.observation_space = spaces.Box(low=0, high=1, shape=(board_size, board_size), dtype=np.int8)  
  
    def step(self, action):  
        # Convert action into 2D board coordinates  
        x, y = divmod(action, self.board_size)  
        # Place a stone for the current player  
        self.board[x, y] = self.current_player  
        # Check for winning condition or draw  
        self.winner = self._check_winner()  
        if self.winner:  
            self.done = True  
            if self.winner == 0:  # Draw  
                reward = 0  
            else:  # Winner  
                reward = 1 if self.winner == self.player else -1  
        else:  
            reward = 0  
            # Switch player  
            self.current_player *= -1  
  
        return self.board, reward, self.done, {
   
   }  
  
    def reset(self):  
        self.board = np.zeros((self.board_size, self.board_size))  
        self.current_player = 1  
        self.done = False  
        self.winner = None  
        return self.board  
  
    def render(self, mode='human'):  
        if mode == 'human':  
            plt.imshow(self.board, cmap='binary')  
            plt.show()  
  
    def _check_winner(self):  
        directions = [(0, 1), (1, 0), (1, 1), (1, -1)]  # Horizontal, Vertical, Diagonal \ and /  
        for x in range(self.board_size):  
            for y in range(self.board_size):  
                if self.board[x, y] == 0:  
                    continue  
                for dx, dy in directions:  
                    count = 1  
                    for i in range(1, self.win_length):  
                        nx, ny = x + dx * i, y + dy * i  
                        if 0 <= nx < self.board_size and 0 <= ny < self.board_size and self.board[nx, ny] == self.board[x, y]:  
                            count += 1  
                        else:  
                            break  
                    if count == self.win_length:  
                        return self.board[x, y]  
        # Check for draw  
        if np.all