蒙特卡洛树求解五子棋
蒙特卡洛树搜索(Monte Carlo Tree Search,简称MCTS)是一种基于模拟的搜索算法,常用于解决决策过程中的优化问题,特别是在那些具有庞大搜索空间且难以用传统方法(如动态规划)有效解决的问题中。MCTS通过从初始状态开始,模拟多个可能的游戏或决策过程,逐步构建搜索树,并利用随机采样来估计各状态或动作的价值,以此来指导搜索过程。蒙特卡洛树搜索的核心思想是利用随机模拟(即蒙特卡洛方法)来估计节点的价值,蒙特卡洛树搜索因其简单高效的特点,在游戏AI领域(如围棋程序AlphaGo)和其他决策优化问题中得到了广泛应用。它通过逐步构建搜索树并利用随机模拟来优化决策,特别适合处理那些难以用精确数学模型描述的问题。
注意:本文用到了PyTorch库,gym强化学习环境库,需要提前安装。
- gym环境安装:https://github.com/Farama-Foundation/Gymnasium
- gym环境介绍文档:https://gymnasium.farama.org/environments/classic_control/mountain_car/
- pytorch官网:https://pytorch.org/
本文所使用的python版本为3.11.8
step1:gym创建五子棋环境
如何利用gym构建自己的环境,可以参考官方地址(创建你自己的环境):https://gymnasium.farama.org/tutorials/gymnasium_basics/environment_creation/
下面我们简单设计了一个五子棋环境:
import numpy as np
import gymnasium as gym
from gymnasium import spaces
import matplotlib.pyplot as plt
class GomokuEnv(gym.Env):
'''
五子棋强化学习交互环境
'''
metadata = {
'render.modes': ['human']}
def __init__(self, board_size=15, win_length=5, player=1):
'''
board_size为棋盘尺寸
win_length为获胜条件,即连续多少个棋子连成一线
'''
self.board_size = board_size # 棋盘尺寸
self.win_length = win_length # 获胜长度
self.board = np.zeros((board_size, board_size)) # 创建棋盘
self.current_player = 1 # 1 for player 1, -1 for player 2
self.done = False # 判断游戏是否结束
self.winner = None # 获胜者
self.player = player # 玩家
# Define action and observation spaces
self.action_space = spaces.Discrete(board_size * board_size)
self.observation_space = spaces.Box(low=0, high=1, shape=(board_size, board_size), dtype=np.int8)
def step(self, action):
# Convert action into 2D board coordinates
x, y = divmod(action, self.board_size)
# Place a stone for the current player
self.board[x, y] = self.current_player
# Check for winning condition or draw
self.winner = self._check_winner()
if self.winner:
self.done = True
if self.winner == 0: # Draw
reward = 0
else: # Winner
reward = 1 if self.winner == self.player else -1
else:
reward = 0
# Switch player
self.current_player *= -1
return self.board, reward, self.done, {
}
def reset(self):
self.board = np.zeros((self.board_size, self.board_size))
self.current_player = 1
self.done = False
self.winner = None
return self.board
def render(self, mode='human'):
if mode == 'human':
plt.imshow(self.board, cmap='binary')
plt.show()
def _check_winner(self):
directions = [(0, 1), (1, 0), (1, 1), (1, -1)] # Horizontal, Vertical, Diagonal \ and /
for x in range(self.board_size):
for y in range(self.board_size):
if self.board[x, y] == 0:
continue
for dx, dy in directions:
count = 1
for i in range(1, self.win_length):
nx, ny = x + dx * i, y + dy * i
if 0 <= nx < self.board_size and 0 <= ny < self.board_size and self.board[nx, ny] == self.board[x, y]:
count += 1
else:
break
if count == self.win_length:
return self.board[x, y]
# Check for draw
if np.all