当前位置：文江博客话题详情

Chomp 游戏的算法

发布于 2024-11-26 13:16:18 字数 488 浏览 4 评论 0原文

我正在为 Chomp 游戏编写一个程序。您可以在维基百科上阅读游戏的描述，但无论如何我都会简要描述它。

我们玩的是尺寸为 nxm 的巧克力棒，即巧克力棒被分成 nxm 个方块。在每个回合中，当前玩家选择一个方格并吃掉所选方格下方和右侧的所有内容。因此，例如，以下是有效的第一步：

在此处输入图像描述

目标是迫使对手吃东西最后一块巧克力（有毒）。

关于人工智能部分，我使用了带有深度截断的极小极大算法。但是我无法想出合适的位置评估函数。结果是，通过我的评估函数，人类玩家很容易战胜我的程序。

任何人都可以：

建议一个好的位置评估函数或
提供一些有用的参考或
建议替代算法？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

亂 2024-12-03 13:16:18

你的板子有多大？

如果你的棋盘相当小，那么你可以通过动态编程精确地解决游戏。在Python中：

n,m = 6,6
init = frozenset((x,y) for x in range(n) for y in range(m))

def moves(board):
    return [frozenset([(x,y) for (x,y) in board if x < px or y < py]) for (px,py) in board]

@memoize
def wins(board):
    if not board: return True
    return any(not wins(move) for move in moves(board))

函数wins(board)计算棋盘是否是获胜位置。棋盘表示是一组元组 (x,y)，指示棋子 (x,y) 是否仍在棋盘上。 move 函数计算一次移动中可到达的棋盘列表。

wins 函数背后的逻辑是这样的。如果我们的棋盘是空的，那么其他玩家一定已经吃掉了最后一块，所以我们赢了。如果棋盘不为空，那么如果我们可以进行任何移动，使得结果位置是失败位置（即没有获胜，即不获胜（移动）)，因为这样我们就让另一个玩家陷入了失败的境地。

您还需要 memoize 辅助函数来缓存结果：

def memoize(f):
    cache = dict()
    def memof(x):
        try: return cache[x]
        except:
            cache[x] = f(x)
            return cache[x]
    return memof

通过缓存，我们只计算一次给定位置的获胜者，即使该位置可以通过多种方式到达。例如，如果第一个玩家在他的第一步中吃掉所有剩余的行，则可以获得单行巧克力的位置，但也可以通过许多其他系列的移动来获得。一次又一次地计算单排棋盘上谁获胜是浪费的，所以我们缓存结果。这将渐近性能从 O((n*m)^(n+m)) 提高到 O((n+m)!/(n!m!))< /code>，这是一个巨大的改进，但对于大型主板来说仍然很慢。

为了方便起见，这里有一个调试打印功能：

def show(board):
    for x in range(n):
        print '|' + ''.join('x ' if (x,y) in board else '  ' for y in range(m))

这段代码仍然相当慢，因为代码没有以任何方式优化（这是Python......）。如果你用 C 或 Java 高效地编写它，你可能可以将性能提高 100 倍以上。您应该能够轻松处理 10x10 板，并且最多可以处理 15x15 板。您还应该使用不同的板表示，例如位板。如果您使用多个处理器，您甚至可以将其速度提高 1000 倍。

这是极小极大的推导

我们将从极小极大开始：

def minimax(board, depth):
  if depth > maxdepth: return heuristic(board)
  else:
    alpha = -1
    for move in moves(board):
      alpha = max(alpha, -minimax(move, depth-1))
    return alpha

我们可以删除深度检查来进行完整搜索：

def minimax(board):
  if game_ended(board): return heuristic(board)
  else:
    alpha = -1
    for move in moves(board):
      alpha = max(alpha, -minimax(move))
    return alpha

因为游戏结束，启发式将返回 -1 或 1，具体取决于哪个玩家获胜。如果我们将 -1 表示为 false，将 1 表示为 true，则 max(a,b) 变为 a 或 b，-a 变为 < code>not a：

def minimax(board):
  if game_ended(board): return heuristic(board)
  else:
    alpha = False
    for move in moves(board):
      alpha = alpha or not minimax(move)
    return alpha

您可以看到这相当于：

def minimax(board):
  if not board: return True
  return any([not minimax(move) for move in moves(board)])

如果我们从极小极大值开始进行 alpha-beta 剪枝：

def alphabeta(board, alpha, beta):
  if game_ended(board): return heuristic(board)
  else:
    for move in moves(board):
      alpha = max(alpha, -alphabeta(move, -beta, -alpha))
      if alpha >= beta: break
    return alpha

// start the search:
alphabeta(initial_board, -1, 1)

搜索从 alpha = -1 和 beta = 1 开始。一旦 alpha 变为 1 ，循环中断。因此我们可以假设在递归调用中 alpha 保持为 -1，beta 保持为 1。所以代码相当于这样：

def alphabeta(board, alpha, beta):
  if game_ended(board): return heuristic(board)
  else:
    for move in moves(board):
      alpha = max(alpha, -alphabeta(move, -1, 1))
      if alpha == 1: break
    return alpha

// start the search:
alphabeta(initial_board, -1, 1)

所以我们可以简单地删除参数，因为它们总是作为相同的值传入：

def alphabeta(board):
  if game_ended(board): return heuristic(board)
  else:
    alpha = -1
    for move in moves(board):
      alpha = max(alpha, -alphabeta(move))
      if alpha == 1: break
    return alpha

// start the search:
alphabeta(initial_board)

我们可以再次从 -1 和 1 切换到布尔值：

def alphabeta(board):
  if game_ended(board): return heuristic(board)
  else:
    alpha = False
    for move in moves(board):
      alpha = alpha or not alphabeta(move))
      if alpha: break
    return alpha

所以你可以看到这相当于使用any 带有一个生成器，一旦找到 True 值，它就会停止迭代，而不是总是计算整个子列表：

def alphabeta(board):
  if not board: return True
  return any(not alphabeta(move) for move in moves(board))

请注意，这里我们有 any(not Alphabeta(move) for move in moves(board)) 而不是 any([不是minimax(move) 用于 moves(board)])。这可以将合理大小的电路板的搜索速度加快大约 10 倍。不是因为第一种形式更快，而是因为它允许我们在找到 True 值后立即跳过循环的整个其余部分，包括递归调用。

现在你知道了，wins 函数只是变相的字母表搜索。我们用于获胜的下一个技巧是记住它。在游戏编程中，这被称为使用“换位表”。因此，wins 函数正在使用换位表进行字母表搜索。当然，直接写下这个算法而不是通过这个推导更简单;)

How big are your boards?

If your boards are fairly small then you can solve the game exactly with dynamic programming. In Python:

n,m = 6,6
init = frozenset((x,y) for x in range(n) for y in range(m))

def moves(board):
    return [frozenset([(x,y) for (x,y) in board if x < px or y < py]) for (px,py) in board]

@memoize
def wins(board):
    if not board: return True
    return any(not wins(move) for move in moves(board))

The function wins(board) computes whether the board is a winning position. The board representation is a set of tuples (x,y) indicating whether piece (x,y) is still on the board. The function moves computes list of boards reachable in one move.

The logic behind the wins function works like this. If the board is empty on our move the other player must have eaten the last piece, so we won. If the board is not empty then we can win if there is any move we can do such that the resulting position is a losing position (i.e. not winning i.e. not wins(move)), because then we got the other player into a losing position.

You'll also need the memoize helper function which caches the results:

def memoize(f):
    cache = dict()
    def memof(x):
        try: return cache[x]
        except:
            cache[x] = f(x)
            return cache[x]
    return memof

By caching we only compute who is the winner for a given position once, even if this position is reachable in multiple ways. For example the position of a single row of chocolate can be obtained if the first player eats all remaining rows in his first move, but it can also be obtained through many other series of moves. It would be wasteful to compute who wins on the single row board again and again, so we cache the result. This improves the asymptotic performance from something like O((n*m)^(n+m)) to O((n+m)!/(n!m!)), a vast improvement though still slow for large boards.

And here is a debugging printing function for convenience:

def show(board):
    for x in range(n):
        print '|' + ''.join('x ' if (x,y) in board else '  ' for y in range(m))

This code is still fairly slow because the code isn't optimized in any way (and this is Python...). If you write it in C or Java efficiently you can probably improve performance over 100 fold. You should easily be able to handle 10x10 boards, and you can probably handle up to 15x15 boards. You should also use a different board representation, for example a bit board. Perhaps you would even be able to speed it up 1000x if you make use of multiple processors.

Here is a derivation from minimax

We'll start with minimax:

def minimax(board, depth):
  if depth > maxdepth: return heuristic(board)
  else:
    alpha = -1
    for move in moves(board):
      alpha = max(alpha, -minimax(move, depth-1))
    return alpha

We can remove the depth checking to do a full search:

def minimax(board):
  if game_ended(board): return heuristic(board)
  else:
    alpha = -1
    for move in moves(board):
      alpha = max(alpha, -minimax(move))
    return alpha

Because the game ended, heuristic will return either -1 or 1, depending on which player won. If we represent -1 as false and 1 as true, then max(a,b) becomes a or b, and -a becomes not a:

def minimax(board):
  if game_ended(board): return heuristic(board)
  else:
    alpha = False
    for move in moves(board):
      alpha = alpha or not minimax(move)
    return alpha

You can see this is equivalent to:

def minimax(board):
  if not board: return True
  return any([not minimax(move) for move in moves(board)])

If we had instead started with minimax with alpha-beta pruning:

def alphabeta(board, alpha, beta):
  if game_ended(board): return heuristic(board)
  else:
    for move in moves(board):
      alpha = max(alpha, -alphabeta(move, -beta, -alpha))
      if alpha >= beta: break
    return alpha

// start the search:
alphabeta(initial_board, -1, 1)

The search starts out with alpha = -1 and beta = 1. As soon as alpha becomes 1, the loop breaks. So we can assume that alpha stays -1 and beta stays 1 in the recursive calls. So the code is equivalent to this:

def alphabeta(board, alpha, beta):
  if game_ended(board): return heuristic(board)
  else:
    for move in moves(board):
      alpha = max(alpha, -alphabeta(move, -1, 1))
      if alpha == 1: break
    return alpha

// start the search:
alphabeta(initial_board, -1, 1)

So we can simply remove the parameters, as they are always passed in as the same values:

def alphabeta(board):
  if game_ended(board): return heuristic(board)
  else:
    alpha = -1
    for move in moves(board):
      alpha = max(alpha, -alphabeta(move))
      if alpha == 1: break
    return alpha

// start the search:
alphabeta(initial_board)

We can again do the switch from -1 and 1 to booleans:

def alphabeta(board):
  if game_ended(board): return heuristic(board)
  else:
    alpha = False
    for move in moves(board):
      alpha = alpha or not alphabeta(move))
      if alpha: break
    return alpha

So you can see this is equivalent to using any with a generator which stops the iteration as soon as it has found a True value instead of always computing the whole list of children:

def alphabeta(board):
  if not board: return True
  return any(not alphabeta(move) for move in moves(board))

Note that here we have any(not alphabeta(move) for move in moves(board)) instead of any([not minimax(move) for move in moves(board)]). This speeds up the search by about a factor of 10 for reasonably sized boards. Not because the first form is faster, but because it allows us to skip entire rest of the loop including the recursive calls as soon as we have found a value that's True.

So there you have it, the wins function was just alphabeta search in disguise. The next trick we used for wins is to memoize it. In game programming this would be called using "transposition tables". So the wins function is doing alphabeta search with transposition tables. Of course it's simpler to write down this algorithm directly instead of going through this derivation ;)

回复收藏 0 原文