栀梦 2024-08-20 14:03:54

更新：我非常喜欢这个主题，所以我写了编程难题、国际象棋位置和霍夫曼编码。如果您通读本文，我已经确定存储完整游戏状态的唯一方法是存储完整的动作列表。请继续阅读以了解原因。因此，我使用问题的稍微简化版本来进行块布局。

问题

该图说明了国际象棋的起始位置。国际象棋在 8x8 的棋盘上进行，每个玩家从一组相同的 16 个棋子开始，其中包括 8 个兵、2 个车、2 个马、2 个象、1 个后和 1 个王，如下所示：

位置一般为记录为列的字母后跟行的数字，因此白后位于 d1。棋步通常存储在代数符号中，它是明确的，通常只指定最少的必要信息。考虑这个开局：

e4 e5
Nf3 Nc6
…

这意味着：

白方将王的棋子从 e2 移动到 e4（它是唯一可以到达 e4 的棋子，因此是“e4”）；
黑方将王的棋子从 e7 移至 e5；
白方将马 (N) 移至 f3；
黑将马移至 c6。
...

该板看起来像这样：

对于任何程序员来说，一项重要的能力就是能够正确且明确地指定问题。

那么有什么遗漏或不明确的地方呢？事实证明很多。

棋盘状态与游戏状态

您需要确定的第一件事是您是否要存储游戏状态或棋盘上棋子的位置。简单地编码棋子的位置是一回事，但问题是“所有后续的合法动作”。这个问题也没有提及了解到目前为止的动作。这实际上是一个问题，我将对此进行解释。

易位

游戏进行如下：

e4 e5
Nf3 Nc6
Bb5 a6
Ba4 Bc5

棋盘如下：

later opening

白方有王位转换选项。对此的部分要求是国王和相关的车永远不能移动，因此无论国王还是双方的任何一个车都移动都需要被存储。显然，如果它们不在起始位置，则它们已经移动，否则需要指定。

有多种策略可用于处理此问题。

首先，我们可以存储额外的 6 位信息（车和王各 1 位）来指示该棋子是否已移动。如果正确的棋子恰好在其中，我们可以通过只为这六个方块之一存储一点来简化这一过程。或者，我们可以将每个未移动的棋子视为另一种棋子类型，这样每边不再有 6 种棋子类型（兵、车、马、象、后和王），而是 8 种（加上未移动的车和未移动的王）。

En Passant

国际象棋中另一个奇特且经常被忽视的规则是En Passant。

游戏已经进行。

e4 e5
Nf3 Nc6
Bb5 a6
Ba4 Bc5
OO b5
Bb3 b4
c4

黑方在 b4 上的棋子现在可以选择将其在 b4 上的棋子移动到 c3，从而占领 c4 上的白方棋子。这种情况只发生在第一次机会时，这意味着如果黑方现在放弃该选择，他就无法采取下一步行动。所以我们需要存储这个。

如果我们知道之前的举动，我们肯定可以回答 En Passant 是否可能。或者，我们可以存储第 4 级的每个棋子是否刚刚以双步前进的方式移动到那里。或者我们可以查看棋盘上每个可能的 En Passant 位置，并用一个标志来指示其是否可能。

促销

这是白棋的棋步。如果白棋将 h7 上的棋子移动到 h8，它可以升级为任何其他棋子（但不能升级为王）。 99% 的情况下它都会被提升为女王，但有时却不会，通常是因为这可能会导致僵局，否则你会获胜。写为：

h8=Q

这对于我们的问题很重要，因为这意味着我们不能指望每边都有固定数量的棋子。如果所有 8 个棋子都晋级，那么一方完全有可能（但极其不可能）获得 9 个后、10 个车、10 个象或 10 个马。

僵局

当你处于无法获胜的境地时，最好的策略就是尝试僵局。最有可能的变体是你无法采取合法的行动（通常是因为任何行动都限制了你的国王）。在这种情况下，您可以要求平局。这个很容易满足。

第二种变体是三重重复。如果相同的棋盘位置在一场游戏中出现三次（或在下一步棋中出现第三次），则可以判定为平局。这些位置不需要以任何特定的顺序出现（这意味着它不必以相同的移动顺序重复三次）。这使得问题变得非常复杂，因为你必须记住之前的每一个棋盘位置。 如果这是问题的要求，则该问题的唯一可能的解决方案是存储以前的每一步。

最后，有五十步规则。如果在前 50 个连续移动中没有棋子移动且没有拿走棋子，则玩家可以要求平局，因此我们需要存储自移动棋子或拿走棋子（两者中最新的一个）以来的移动次数。这需要6 位（0-63）。

当然我们还需要知道现在轮到

谁了，这是一个单一的问题。

由于

僵局的情况，唯一可行或明智的方式是存储游戏状态是存储导致这个位置的所有动作，我将解决这个问题，棋盘状态问题将被简化为：存储棋盘上所有棋子的当前位置，忽略易位。，僵局条件以及轮到谁

棋子布局可以通过以下两种方式之一进行处理：通过存储每个方块的内容或通过存储每个棋子的位置

简单内容

有六种棋子类型（棋子。、车、马、主教、后和国王）。每颗棋子都可以是白色或黑色，因此一个方格可能包含 12 种可能的棋子之一，也可能是空的，因此有 13 种可能性。 13 可以存储在 4 位（0-15）中，因此最简单的解决方案是为每个方格存储 4 位乘以 64 个方格或 256 位信息。

这种方法的优点是操作极其简单且快速。这甚至可以通过在不增加存储需求的情况下添加 3 种可能性来扩展：一个棋子在最后一回合移动了 2 个空格，一个国王没有移动，一个车没有移动，这将满足很多需求前面提到的问题。

但我们可以做得更好。

Base 13 编码将

棋盘位置视为一个非常大的数字通常会有所帮助。这通常在计算机科学中完成。例如，停止问题将计算机程序（正确地）视为一个大数。

第一个解决方案将位置视为 64 位 16 进制数字，但正如所演示的，此信息中存在冗余（即每个“数字”有 3 个未使用的可能性），因此我们可以将数字空间减少到 64 位 13 进制数字。当然，这不能像基数 16 那样高效，但它将节省存储需求（最小化存储空间是我们的目标）。

在 10 进制中，数字 234 相当于 2 x 10² + 3 x 10¹ + 4 x 10⁰。

在基数 16 中，数字 0xA50 相当于 10 x 16² + 5 x 16¹ + 0 x 16⁰ = 2640（十进制）。

因此，我们可以将位置编码为 p₀ x 13⁶³ + p₁ x 13⁶² + ... + p₆₃ x 13⁰ 其中 p_i 表示方块 i 的内容。

2²⁵⁶ 大约等于 1.16e77。 13⁶⁴ 大约等于 1.96e71，需要 237 位存储空间。仅仅节省 7.5% 的代价就是操作成本显着增加。

可变基本编码

在合法棋盘中，某些棋子不能出现在某些方格中。例如，棋子不能出现在第一或第八排，从而将这些方格的可能性减少到 11。这将可能的棋盘减少到 11¹⁶ x 13⁴⁸ = 1.35 e70（大约），需要 233 位存储空间。

实际上，将这些值与十进制（或二进制）进行编码和解码有点复杂，但它可以可靠地完成，并留给读者作为练习。

可变宽度字母

前两种方法都可以描述为固定宽度字母编码。字母表的 11、13 或 16 个成员中的每一个都被替换为另一个值。每个“字符”的宽度相同，但当您考虑到每个字符的可能性并不相同时，效率可以提高。

莫尔斯电码

考虑摩尔斯电码（如上图所示）。消息中的字符被编码为一系列破折号和点。这些破折号和点通过无线电传输（通常），它们之间有停顿以界定它们。

请注意字母 E（英语中最常见的字母）是一个点，是最短的点可能的序列，而 Z（最不常见）是两个破折号和两声蜂鸣声。

这种方案可以显着减小预期消息的大小，但代价是增加随机字符序列的大小。

应该注意的是，莫尔斯电码还有另一个内置功能：破折号长达三个点，因此在创建上述代码时考虑到了这一点，以尽量减少破折号的使用。由于 1 和 0（我们的构建块）没有这个问题，因此它不是我们需要复制的功能。

最后，摩尔斯电码有两种休止符。短休止符（点的长度）用于区分点和破折号。较长的间隙（破折号的长度）用于分隔字符。

那么这如何应用于我们的问题呢？

霍夫曼编码

有一种处理可变长度代码的算法，称为霍夫曼编码。霍夫曼编码创建可变长度代码替换，通常使用符号的预期频率将较短的值分配给更常见的符号。

在上面的树中，字母 E 被编码为 000（或左左左），S 为 1011。应该清楚的是，这种编码方案是明确的。

这是与摩尔斯电码的一个重要区别。摩尔斯电码具有字符分隔符，因此它可以进行其他不明确的替换（例如，4 个点可以是 H 或 2 个 Is），但我们只有 1 和 0，因此我们选择明确的替换。

下面是一个简单的实现：

private static class Node {
  private final Node left;
  private final Node right;
  private final String label;
  private final int weight;

  private Node(String label, int weight) {
    this.left = null;
    this.right = null;
    this.label = label;
    this.weight = weight;
  }

  public Node(Node left, Node right) {
    this.left = left;
    this.right = right;
    label = "";
    weight = left.weight + right.weight;
  }

  public boolean isLeaf() { return left == null && right == null; }

  public Node getLeft() { return left; }

  public Node getRight() { return right; }

  public String getLabel() { return label; }

  public int getWeight() { return weight; }
}

使用静态数据：

private final static List<string> COLOURS;
private final static Map<string, integer> WEIGHTS;

static {
  List<string> list = new ArrayList<string>();
  list.add("White");
  list.add("Black");
  COLOURS = Collections.unmodifiableList(list);
  Map<string, integer> map = new HashMap<string, integer>();
  for (String colour : COLOURS) {
    map.put(colour + " " + "King", 1);
    map.put(colour + " " + "Queen";, 1);
    map.put(colour + " " + "Rook", 2);
    map.put(colour + " " + "Knight", 2);
    map.put(colour + " " + "Bishop";, 2);
    map.put(colour + " " + "Pawn", 8);
  }
  map.put("Empty", 32);
  WEIGHTS = Collections.unmodifiableMap(map);
}

和：

private static class WeightComparator implements Comparator<node> {
  @Override
  public int compare(Node o1, Node o2) {
    if (o1.getWeight() == o2.getWeight()) {
      return 0;
    } else {
      return o1.getWeight() < o2.getWeight() ? -1 : 1;
    }
  }
}

private static class PathComparator implements Comparator<string> {
  @Override
  public int compare(String o1, String o2) {
    if (o1 == null) {
      return o2 == null ? 0 : -1;
    } else if (o2 == null) {
      return 1;
    } else {
      int length1 = o1.length();
      int length2 = o2.length();
      if (length1 == length2) {
        return o1.compareTo(o2);
      } else {
        return length1 < length2 ? -1 : 1;
      }
    }
  }
}

public static void main(String args[]) {
  PriorityQueue<node> queue = new PriorityQueue<node>(WEIGHTS.size(),
      new WeightComparator());
  for (Map.Entry<string, integer> entry : WEIGHTS.entrySet()) {
    queue.add(new Node(entry.getKey(), entry.getValue()));
  }
  while (queue.size() > 1) {
    Node first = queue.poll();
    Node second = queue.poll();
    queue.add(new Node(first, second));
  }
  Map<string, node> nodes = new TreeMap<string, node>(new PathComparator());
  addLeaves(nodes, queue.peek(), "");
  for (Map.Entry<string, node> entry : nodes.entrySet()) {
    System.out.printf("%s %s%n", entry.getKey(), entry.getValue().getLabel());
  }
}

public static void addLeaves(Map<string, node> nodes, Node node, String prefix) {
  if (node != null) {
    addLeaves(nodes, node.getLeft(), prefix + "0");
    addLeaves(nodes, node.getRight(), prefix + "1");
    if (node.isLeaf()) {
      nodes.put(prefix, node);
    }
  }
}

一种可能的输出是：

         White    Black
Empty          0 
Pawn       110      100
Rook     11111    11110
Knight   10110    10101
Bishop   10100    11100
Queen   111010   111011
King    101110   101111

对于起始位置，这等于 32 x 1 + 16 x 3 + 12 x 5 + 4 x 6 = 164 位。

状态差异

另一种可能的方法是将第一种方法与霍夫曼编码相结合。这是基于这样的假设：大多数预期的棋盘（而不是随机生成的棋盘）更有可能（至少部分）类似于起始位置。

因此，您要做的就是将 256 位当前棋盘位置与 256 位起始位置进行异或，然后对其进行编码（使用霍夫曼编码，或者使用游程编码）。显然，一开始这会非常高效（64 个 0 可能对应于 64 位），但随着游戏的进行，所需的存储空间也会增加。

棋子位置

如前所述，解决此问题的另一种方法是存储玩家拥有的每个棋子的位置。这对于大多数方块都是空的残局位置特别有效（但在霍夫曼编码方法中，空方块无论如何只使用 1 位）。

每方都有一个国王和 0-15 个其他棋子。由于促销的原因，这些棋子的确切组成可能会有所不同，因此您不能假设基于起始位置的数字是最大值。

划分此部分的逻辑方法是存储由两个边（白色和黑色）组成的位置。每一方有：

国王：6 位用于位置；
拥有棋子：1（有），0（否）；
如果是，棋子数量：3位（0-7+1=1-8）；
如果是，则对每个棋子的位置进行编码：45位（见下文）；
非棋子数量：4 位（0-15）；
对于每个棋子：类型（后、车、马、象为 2 位）和位置（6 位）

至于棋子位置，棋子只能位于 48 个可能的方格上（而不是像其他方格那样为 64 个方格）。因此，最好不要浪费每个 pawn 使用 6 位所需要的额外 16 个值。因此，如果您有 8 个棋子，则有 48⁸ 种可能性，等于 28,179,280,429,056。您需要 45 位来编码这么多值。

每边 105 位，总共 210 位。然而，起始位置是这种方法最糟糕的情况，当你移除碎片时，它会变得更好。

需要指出的是，可能性少于 48⁸，因为棋子不可能都在同一个方格内。第一个有 48 种可能性，第二个有 47 种，依此类推。 48 x 47 x … x 41 = 1.52e13 = 44 位存储。

您可以通过消除被其他棋子（包括另一边）占据的方格来进一步改进这一点，这样您可以首先放置白色非棋子，然后放置黑色非棋子，然后放置白色棋子，最后放置黑色棋子。在起始位置，这将白色的存储要求减少到 44 位，黑色的存储要求减少到 42 位。

组合方法

另一种可能的优化是，每种方法都有其优点和缺点。例如，您可以选择最好的 4 个，然后在前两位中编码方案选择器，然后在后面编码特定于方案的存储。

由于开销很小，这将是迄今为止最好的方法。

游戏状态

我回到存储游戏而不是位置的问题。由于重复了三次，我们必须存储此时已发生的移动列表。

注释

您必须确定的一件事是您只是存储移动列表还是为游戏添加注释？国际象棋比赛经常会有注释，例如：

Bb5！！ NC4？

白方的棋步以两个感叹号标记为出色，而黑方的棋步则被视为错误。请参阅国际象棋标点符号。

此外，您可能还需要存储描述移动的自由文本。

我假设动作足够了，所以不会有注释。

代数表示法

我们可以简单地在此处存储移动的文本（“e4”、“Bxb5”等）。包括您正在查看的终止字节，每次移动大约 6 个字节（48 位）（最坏情况）。这不是特别有效。

要尝试的第二件事是存储起始位置（6 位）和结束位置（6 位），因此每次移动 12 位。那明显更好。

或者，我们可以以可预测和确定性的方式确定从当前位置开始的所有合法动作，并说明我们选择的动作。然后这又回到上面提到的可变基本编码。白棋和黑棋的第一步各有 20 种可能的棋步，第二步更多，依此类推。

结论

这个问题没有绝对正确的答案。有许多可能的方法，以上只是其中的几种。

我喜欢这个问题和类似问题的地方在于，它需要对任何程序员都很重要的能力，比如考虑使用模式、准确确定需求和思考极端情况。

国际象棋位置截图来自国际象棋位置训练器。

Update: I liked this topic so much I wrote Programming Puzzles, Chess Positions and Huffman Coding. If you read through this I've determined that the only way to store a complete game state is by storing a complete list of moves. Read on for why. So I use a slightly simplified version of the problem for piece layout.

The Problem

This image illustrates the starting Chess position. Chess occurs on an 8x8 board with each player starting with an identical set of 16 pieces consisting of 8 pawns, 2 rooks, 2 knights, 2 bishops, 1 queen and 1 king as illustrated here:

Positions are generally recorded as a letter for the column followed by the number for the row so White’s queen is at d1. Moves are most often stored in algebraic notation, which is unambiguous and generally only specifies the minimal information necessary. Consider this opening:

e4 e5
Nf3 Nc6
…

which translates to:

White moves king’s pawn from e2 to e4 (it is the only piece that can get to e4 hence “e4”);
Black moves the king’s pawn from e7 to e5;
White moves the knight (N) to f3;
Black moves the knight to c6.
…

The board looks like this:

An important ability for any programmer is to be able to correctly and unambiguously specify the problem.

So what’s missing or ambiguous? A lot as it turns out.

Board State vs Game State

The first thing you need to determine is whether you’re storing the state of a game or the position of pieces on the board. Encoding simply the positions of the pieces is one thing but the problem says “all subsequent legal moves”. The problem also says nothing about knowing the moves up to this point. That’s actually a problem as I’ll explain.

Castling

The game has proceeded as follows:

e4 e5
Nf3 Nc6
Bb5 a6
Ba4 Bc5

The board looks as follows:

later opening

White has the option of castling. Part of the requirements for this are that the king and the relevant rook can never have moved, so whether the king or either rook of each side has moved will need to be stored. Obviously if they aren’t on their starting positions, they have moved otherwise it needs to be specified.

There are several strategies that can be used for dealing with this problem.

Firstly, we could store an extra 6 bits of information (1 for each rook and king) to indicate whether that piece had moved. We could streamline this by only storing a bit for one of these six squares if the right piece happens to be in it. Alternatively we could treat each unmoved piece as another piece type so instead of 6 piece types on each side (pawn, rook, knight, bishop, queen and king) there are 8 (adding unmoved rook and unmoved king).

En Passant

Another peculiar and often-neglected rule in Chess is En Passant.

The game has progressed.

e4 e5
Nf3 Nc6
Bb5 a6
Ba4 Bc5
O-O b5
Bb3 b4
c4

Black’s pawn on b4 now has the option of moving his pawn on b4 to c3 taking the White pawn on c4. This only happens on the first opportunity meaning if Black passes on the option now he can’t take it next move. So we need to store this.

If we know the previous move we can definitely answer if En Passant is possible. Alternatively we can store whether each pawn on its 4th rank has just moved there with a double move forward. Or we can look at each possible En Passant position on the board and have a flag to indicate whether its possible or not.

Promotion

It is White’s move. If White moves his pawn on h7 to h8 it can be promoted to any other piece (but not the king). 99% of the time it is promoted to a Queen but sometimes it isn’t, typically because that may force a stalemate when otherwise you’d win. This is written as:

h8=Q

This is important in our problem because it means we can’t count on there being a fixed number of pieces on each side. It is entirely possible (but incredibly unlikely) for one side to end up with 9 queens, 10 rooks, 10 bishops or 10 knights if all 8 pawns get promoted.

Stalemate

When in a position from which you cannot win your best tactic is to try for a stalemate. The most likely variant is where you cannot make a legal move (usually because any move when put your king in check). In this case you can claim a draw. This one is easy to cater for.

The second variant is by threefold repetition. If the same board position occurs three times in a game (or will occur a third time on the next move), a draw can be claimed. The positions need not occur in any particular order (meaning it doesn’t have to the same sequence of moves repeated three times). This one greatly complicates the problem because you have to remember every previous board position. If this is a requirement of the problem the only possible solution to the problem is to store every previous move.

Lastly, there is the fifty move rule. A player can claim a draw if no pawn has moved and no piece has been taken in the previous fifty consecutive moves so we would need to store how many moves since a pawn was moved or a piece taken (the latest of the two. This requires 6 bits (0-63).

Whose Turn Is It?

Of course we also need to know whose turn it is and this is a single bit of information.

Two Problems

Because of the stalemate case, the only feasible or sensible way to store the game state is to store all the moves that led to this position. I’ll tackle that one problem. The board state problem will be simplified to this: store the current position of all pieces on the board ignoring castling, en passant, stalemate conditions and whose turn it is.

Piece layout can be broadly handled in one of two ways: by storing the contents of each square or by storing the position of each piece.

Simple Contents

There are six piece types (pawn, rook, knight, bishop, queen and king). Each piece can be White or Black so a square may contain one of 12 possible pieces or it may be empty so there are 13 possibilities. 13 can be stored in 4 bits (0-15) So the simplest solution is to store 4 bits for each square times 64 squares or 256 bits of information.

The advantage of this method is that manipulation is incredibly easy and fast. This could even be extended by adding 3 more possibilities without increasing the storage requirements: a pawn that has moved 2 spaces on the last turn, a king that hasn’t moved and a rook that hasn’t moved, which will cater for a lot of previously mentioned issues.

But we can do better.

Base 13 Encoding

It is often helpful to think of the board position as a very large number. This is often done in computer science. For example, the halting problem treats a computer program (rightly) as a large number.

The first solution treats the position as a 64 digit base 16 number but as demonstrated there is redundancy in this information (being the 3 unused possibilities per “digit”) so we can reduce the number space to 64 base 13 digits. Of course this can’t be done as efficiently as base 16 can but it will save on storage requirements (and minimizing storage space is our goal).

In base 10 the number 234 is equivalent to 2 x 10² + 3 x 10¹ + 4 x 10⁰.

In base 16 the number 0xA50 is equivalent to 10 x 16² + 5 x 16¹ + 0 x 16⁰ = 2640 (decimal).

So we can encode our position as p₀ x 13⁶³ + p₁ x 13⁶² + ... + p₆₃ x 13⁰ where p_i represents the contents of square i.

2²⁵⁶ equals approximately 1.16e77. 13⁶⁴ equals approximately 1.96e71, which requires 237 bits of storage space. That saving of a mere 7.5% comes at a cost of significantly increased manipulation costs.

Variable Base Encoding

In legal boards certain pieces can’t appear in certain squares. For example, pawns cannot occur at in the first or eighth ranks, reducing the possibilities for those squares to 11. That reduces the possible boards to 11¹⁶ x 13⁴⁸ = 1.35e70 (approximately), requiring 233 bits of storage space.

Actually encoding and decoding such values to and from decimal (or binary) is a little more convoluted but it can be done reliably and is left as an exercise to the reader.

Variable Width Alphabets

The previous two methods can both be described as fixed-width alphabetic encoding. Each of the 11, 13 or 16 members of the alphabet is substituted for another value. Each “character” is the same width but the efficiency can be improved when you consider that each character is not equally likely.

morse code

Consider Morse code (pictured above). Characters in a message are encoded as a sequence of dashes and dots. Those dashes and dots are transferred over radio (typically) with a pause between them to delimit them.

Notice how the letter E (the most common letter in English) is a single dot, the shortest possible sequence, whereas Z (the least frequent) is two dashes and two beeps.

Such a scheme can significantly reduce the size of an expected message but comes at the cost of increasing the size of a random character sequence.

It should be noted that Morse code has another inbuilt feature: dashes are as long as three dots so the above code is created with this in mind to minimize the use of dashes. Since 1s and 0s (our building blocks) don’t have this problem, it’s not a feature we need to replicate.

Lastly, there are two kinds of rests in Morse code. A short rest (the length of a dot) is used to distinguish between dots and dashes. A longer gap (the length of a dash) is used to delimit characters.

So how does this apply to our problem?

Huffman Coding

There is an algorithm for dealing with variable length codes called Huffman coding. Huffman coding creates a variable length code substitution, typically uses expected frequency of the symbols to assign shorter values to the more common symbols.

In the above tree, the letter E is encoded as 000 (or left-left-left) and S is 1011. It should be clear that this encoding scheme is unambiguous.

This is an important distinction from Morse code. Morse code has the character separator so it can do otherwise ambiguous substitution (eg 4 dots can be H or 2 Is) but we only have 1s and 0s so we choose an unambiguous substitution instead.

Below is a simple implementation:

private static class Node {
  private final Node left;
  private final Node right;
  private final String label;
  private final int weight;

  private Node(String label, int weight) {
    this.left = null;
    this.right = null;
    this.label = label;
    this.weight = weight;
  }

  public Node(Node left, Node right) {
    this.left = left;
    this.right = right;
    label = "";
    weight = left.weight + right.weight;
  }

  public boolean isLeaf() { return left == null && right == null; }

  public Node getLeft() { return left; }

  public Node getRight() { return right; }

  public String getLabel() { return label; }

  public int getWeight() { return weight; }
}

with static data:

private final static List<string> COLOURS;
private final static Map<string, integer> WEIGHTS;

static {
  List<string> list = new ArrayList<string>();
  list.add("White");
  list.add("Black");
  COLOURS = Collections.unmodifiableList(list);
  Map<string, integer> map = new HashMap<string, integer>();
  for (String colour : COLOURS) {
    map.put(colour + " " + "King", 1);
    map.put(colour + " " + "Queen";, 1);
    map.put(colour + " " + "Rook", 2);
    map.put(colour + " " + "Knight", 2);
    map.put(colour + " " + "Bishop";, 2);
    map.put(colour + " " + "Pawn", 8);
  }
  map.put("Empty", 32);
  WEIGHTS = Collections.unmodifiableMap(map);
}

and:

private static class WeightComparator implements Comparator<node> {
  @Override
  public int compare(Node o1, Node o2) {
    if (o1.getWeight() == o2.getWeight()) {
      return 0;
    } else {
      return o1.getWeight() < o2.getWeight() ? -1 : 1;
    }
  }
}

private static class PathComparator implements Comparator<string> {
  @Override
  public int compare(String o1, String o2) {
    if (o1 == null) {
      return o2 == null ? 0 : -1;
    } else if (o2 == null) {
      return 1;
    } else {
      int length1 = o1.length();
      int length2 = o2.length();
      if (length1 == length2) {
        return o1.compareTo(o2);
      } else {
        return length1 < length2 ? -1 : 1;
      }
    }
  }
}

public static void main(String args[]) {
  PriorityQueue<node> queue = new PriorityQueue<node>(WEIGHTS.size(),
      new WeightComparator());
  for (Map.Entry<string, integer> entry : WEIGHTS.entrySet()) {
    queue.add(new Node(entry.getKey(), entry.getValue()));
  }
  while (queue.size() > 1) {
    Node first = queue.poll();
    Node second = queue.poll();
    queue.add(new Node(first, second));
  }
  Map<string, node> nodes = new TreeMap<string, node>(new PathComparator());
  addLeaves(nodes, queue.peek(), "");
  for (Map.Entry<string, node> entry : nodes.entrySet()) {
    System.out.printf("%s %s%n", entry.getKey(), entry.getValue().getLabel());
  }
}

public static void addLeaves(Map<string, node> nodes, Node node, String prefix) {
  if (node != null) {
    addLeaves(nodes, node.getLeft(), prefix + "0");
    addLeaves(nodes, node.getRight(), prefix + "1");
    if (node.isLeaf()) {
      nodes.put(prefix, node);
    }
  }
}

One possible output is:

         White    Black
Empty          0 
Pawn       110      100
Rook     11111    11110
Knight   10110    10101
Bishop   10100    11100
Queen   111010   111011
King    101110   101111

For a starting position this equates to 32 x 1 + 16 x 3 + 12 x 5 + 4 x 6 = 164 bits.

State Difference

Another possible approach is to combine the very first approach with Huffman coding. This is based on the assumption that most expected Chess boards (rather than randomly generated ones) are more likely than not to, at least in part, resemble a starting position.

So what you do is XOR the 256 bit current board position with a 256 bit starting position and then encode that (using Huffman coding or, say, some method of run length encoding). Obviously this will be very efficient to start with (64 0s probably corresponding to 64 bits) but increase in storage required as the game progresses.

Piece Position

As mentioned, another way of attacking this problem is to instead store the position of each piece a player has. This works particularly well with endgame positions where most squares will be empty (but in the Huffman coding approach empty squares only use 1 bit anyway).

Each side will have a king and 0-15 other pieces. Because of promotion the exact make up of those pieces can vary enough that you can’t assume the numbers based on the starting positions are maxima.

The logical way to divide this up is store a Position consisting of two Sides (White and Black). Each Side has:

A king: 6 bits for the location;
Has pawns: 1 (yes), 0 (no);
If yes, number of pawns: 3 bits (0-7+1 = 1-8);
If yes, the location of each pawn is encoded: 45 bits (see below);
Number of non-pawns: 4 bits (0-15);
For each piece: type (2 bits for queen, rook, knight, bishop) and location (6 bits)

As for the pawn location, the pawns can only be on 48 possible squares (not 64 like the others). As such, it is better not to waste the extra 16 values that using 6 bits per pawn would use. So if you have 8 pawns there are 48⁸ possibilities, equalling 28,179,280,429,056. You need 45 bits to encode that many values.

That’s 105 bits per side or 210 bits total. The starting position is the worst case for this method however and it will get substantially better as you remove pieces.

It should be pointed out that there are less than 48⁸ possibilities because the pawns can’t all be in the same square The first has 48 possibilities, the second 47 and so on. 48 x 47 x … x 41 = 1.52e13 = 44 bits storage.

You can further improve this by eliminating the squares that are occupied by other pieces (including the other side) so you could first place the white non-pawns then the black non-pawns, then the white pawns and lastly the black pawns. On a starting position this reduces the storage requirements to 44 bits for White and 42 bits for Black.

Combined Approaches

Another possible optimization is that each of these approaches has its strength and weaknesses. You could, say, pick the best 4 and then encode a scheme selector in the first two bits and then the scheme-specific storage after that.

With the overhead that small, this will by far be the best approach.

Game State

I return to the problem of storing a game rather than a position. Because of the threefold repetition we have to store the list of moves that have occurred to this point.

Annotations

One thing you have to determine is are you simply storing a list of moves or are you annotating the game? Chess games are often annotated, for example:

Bb5!! Nc4?

White’s move is marked by two exclamation points as brilliant whereas Black’s is viewed as a mistake. See Chess punctuation.

Additionally you could also need to store free text as the moves are described.

I am assuming that the moves are sufficient so there will be no annotations.

Algebraic Notation

We could simply store the the text of the move here (“e4”, “Bxb5”, etc). Including a terminating byte you’re looking at about 6 bytes (48 bits) per move (worst case). That’s not particularly efficient.

The second thing to try is to store the starting location (6 bits) and end location (6 bits) so 12 bits per move. That is significantly better.

Alternatively we can determine all the legal moves from the current position in a predictable and deterministic way and state which we’ve chosen. This then goes back to the variable base encoding mentioned above. White and Black have 20 possible moves each on their first move, more on the second and so on.

Conclusion

There is no absolutely right answer to this question. There are many possible approaches of which the above are just a few.

What I like about this and similar problems is that it demands abilities important to any programmer like considering the usage pattern, accurately determining requirements and thinking about corner cases.

Chess positions taken as screenshots from Chess Position Trainer.

回复收藏 0 原文

守不住的情 2024-08-20 14:03:54

最好以人类可读的标准格式存储国际象棋游戏。

便携式游戏符号采用标准起始位置（尽管它不必），只需依次列出动作即可。紧凑、人类可读的标准格式。

例如，

[Event "F/S Return Match"]
[Site "Belgrade, Serbia Yugoslavia|JUG"]
[Date "1992.11.04"]
[Round "29"]
[White "Fischer, Robert J."]
[Black "Spassky, Boris V."]
[Result "1/2-1/2"]

1. e4 e5 2. Nf3 Nc6 3. Bb5 {This opening is called the Ruy Lopez.} 3... a6
4. Ba4 Nf6 5. O-O Be7 6. Re1 b5 7. Bb3 d6 8. c3 O-O 9. h3 Nb8  10. d4 Nbd7
11. c4 c6 12. cxb5 axb5 13. Nc3 Bb7 14. Bg5 b4 15. Nb1 h6 16. Bh4 c5 17. dxe5
Nxe4 18. Bxe7 Qxe7 19. exd6 Qf6 20. Nbd2 Nxd6 21. Nc4 Nxc4 22. Bxc4 Nb6
23. Ne5 Rae8 24. Bxf7+ Rxf7 25. Nxf7 Rxe1+ 26. Qxe1 Kxf7 27. Qe3 Qg5 28. Qxg5
hxg5 29. b3 Ke6 30. a3 Kd6 31. axb4 cxb4 32. Ra5 Nd5 33. f3 Bc8 34. Kf2 Bf5
35. Ra7 g6 36. Ra6+ Kc5 37. Ke1 Nf4 38. g3 Nxh3 39. Kd2 Kb5 40. Rd6 Kc5 41. Ra6
Nf2 42. g4 Bd3 43. Re6 1/2-1/2

如果您想让它更小，那么只需压缩它。工作完成了！

It's best just to store chess games in a human-readable, standard format.

The Portable Game Notation assumes a standard starting position (although it doesn't have to) and just lists the moves, turn by turn. A compact, human-readable, standard format.

E.g.

[Event "F/S Return Match"]
[Site "Belgrade, Serbia Yugoslavia|JUG"]
[Date "1992.11.04"]
[Round "29"]
[White "Fischer, Robert J."]
[Black "Spassky, Boris V."]
[Result "1/2-1/2"]

1. e4 e5 2. Nf3 Nc6 3. Bb5 {This opening is called the Ruy Lopez.} 3... a6
4. Ba4 Nf6 5. O-O Be7 6. Re1 b5 7. Bb3 d6 8. c3 O-O 9. h3 Nb8  10. d4 Nbd7
11. c4 c6 12. cxb5 axb5 13. Nc3 Bb7 14. Bg5 b4 15. Nb1 h6 16. Bh4 c5 17. dxe5
Nxe4 18. Bxe7 Qxe7 19. exd6 Qf6 20. Nbd2 Nxd6 21. Nc4 Nxc4 22. Bxc4 Nb6
23. Ne5 Rae8 24. Bxf7+ Rxf7 25. Nxf7 Rxe1+ 26. Qxe1 Kxf7 27. Qe3 Qg5 28. Qxg5
hxg5 29. b3 Ke6 30. a3 Kd6 31. axb4 cxb4 32. Ra5 Nd5 33. f3 Bc8 34. Kf2 Bf5
35. Ra7 g6 36. Ra6+ Kc5 37. Ke1 Nf4 38. g3 Nxh3 39. Kd2 Kb5 40. Rd6 Kc5 41. Ra6
Nf2 42. g4 Bd3 43. Re6 1/2-1/2

If you want to make it smaller, then just zip it. Job done!

回复收藏 0 原文

温柔女人霸气范 2024-08-20 14:03:54

很棒的谜题！

我发现大多数人都在存储每件棋子的位置。采取更简单的方法，存储每个方块的内容怎么样？它会自动处理升级和捕获的棋子。

它允许霍夫曼编码。实际上，棋盘上棋子的初始频率几乎是完美的：一半的方格是空的，剩余的一半方格是棋子，等等。

考虑到每件作品的频率，我在纸上构建了一个霍夫曼树，我不会这样做'这里不再重复。结果，其中 c 代表颜色（白色 = 0，黑色 = 1）：

0 代表空方格
1c0
代表兵 1c100 代表车
1c101 代表马
1c110 代表主教
1c1110 代表皇后
1c1111 代表

国王整个棋盘在初始情况下，我们

有空方格： 32 * 1 位 = 32 位
兵： 16 * 3 位 = 48 位
车/骑士/象： 12 * 5 位 = 60 位
皇后/国王： 4 * 6 位 = 24 位

总计：164 位用于初始板状态。明显小于当前得票最高答案的 235 位。而且随着游戏的进行，它只会变得更小（升级后除外）。

我只看棋盘上棋子的位置；额外的状态（轮到谁、谁已经城堡、谁经过、重复移动等）必须单独编码。也许最多还有 16 位，所以整个游戏状态180 位。
可能的优化：

留下不太频繁的部分，并单独存储它们的位置。但这无济于事......用空方块替换国王和王后可以节省 5 位，这正是您以另一种方式编码其位置所需的 5 位。
“后排没有棋子”可以通过使用不同的霍夫曼表轻松地对后排进行编码，但我怀疑它有多大帮助。您可能仍然会得到相同的霍夫曼树。
“一白一黑主教”可以通过引入不具有c位的额外符号来编码，然后可以从主教所在的方格中推断出该符号。（棋子晋升为主教破坏了这个计划......）
空方块的重复可以通过引入额外的符号来进行游程编码，例如“连续 2 个空方块”和“连续 4 个空方块”。但要估计这些发生的频率并不容易，如果你弄错了，那只会带来伤害而不是帮助。

Great puzzle!

I see that most people are storing the position of each piece. How about taking a more simple-minded approach, and storing the contents of each square? That takes care of promotion and captured pieces automatically.

And it allows for Huffman encoding. Actually, the initial frequency of pieces on the board is nearly perfect for this: half of the squares are empty, half of the remaining squares are pawns, etcetera.

Considering the frequency of each piece, I constructed a Huffman tree on paper, which I won't repeat here. The result, where c stands for the colour (white = 0, black = 1):

0 for empty squares
1c0 for pawn
1c100 for rook
1c101 for knight
1c110 for bishop
1c1110 for queen
1c1111 for king

For the entire board in its initial situation, we have

empty squares: 32 * 1 bit = 32 bits
pawns: 16 * 3 bits = 48 bits
rooks/knights/bishops: 12 * 5 bits = 60 bits
queens/kings: 4 * 6 bits = 24 bits

Total: 164 bits for the initial board state. Significantly less than the 235 bits of the currently highest voted answer. And it's only going to get smaller as the game progresses (except after a promotion).

I looked only at the position of the pieces on the board; additional state (whose turn, who has castled, en passant, repeating moves, etc.) will have to be encoded separately. Maybe another 16 bits at most, so 180 bits for the entire game state.
Possible optimizations:

Leaving out the less frequent pieces, and storing their position separately. But that won't help... replacing king and queen by an empty square saves 5 bits, which are exactly the 5 bits you need to encode their position in another way.
"No pawns on the back row" could easily be encoded by using a different Huffman table for the back rows, but I doubt it helps much. You'd probably still end up with the same Huffman tree.
"One white, one black bishop" can be encoded by introducing extra symbols that don't have the c bit, which can then be deduced from the square that the bishop is on. (Pawns promoted to bishops disrupt this scheme...)
Repetitions of empty squares could be run-length encoded by introducing extra symbols for, say, "2 empty squares in a row" and "4 empty squares in a row". But it is not so easy to estimate the frequency of those, and if you get it wrong, it's going to hurt rather than help.

回复收藏 0 原文

我很坚强 2024-08-20 14:03:54

真正的大查找表方法

位置 - 18 字节
预计合法职位数量为 10⁴³
只需将它们全部枚举出来，位置就可以存储在 143 位中。还需要 1 位来指示接下来要进行哪一方。

枚举当然不切实际，但这表明至少需要 144 位。

移动 - 1 字节
每个位置通常有大约 30-40 步合法棋步，但数量可能高达 218 步
让我们枚举每个位置的所有合法动作。现在每一步都可以编码成一个字节。

我们仍然有足够的空间用于特殊动作，例如 0xFF 来代表辞职。

回复收藏 0 原文

懒的傷心 2024-08-20 14:03:54

对于人类玩的典型游戏，而不是最坏的情况，优化平均情况大小会增加兴趣。（问题陈述没有说明是哪个；大多数响应都假设最坏的情况。）

对于移动顺序，有一个好的国际象棋引擎从每个位置生成移动；它会生成一个包含 k 个可能的动作的列表，并按其质量排名排序。人们通常比随机走法更频繁地选择好的走法，因此我们需要学习从列表中的每个位置到人们选择“好”走法的概率的映射。使用这些概率（基于一些互联网国际象棋数据库中的游戏语料库），使用算术编码< /a>. （解码器必须使用相同的国际象棋引擎和映射。）

对于起始位置，ralu 的方法可行。如果我们有某种方法通过概率对选择进行加权，我们也可以用算术编码来改进它——例如，棋子经常出现在相互防御的配置中，而不是随机的。很难找到一种简单的方法来整合这些知识。一个想法：依靠上述移动编码，从标准开局位置开始，找到以所需棋盘结束的序列。（您可以尝试使用启发式距离进行 A* 搜索，该距离等于棋子与其最终位置的距离总和，或者类似的做法。）这会因过度指定移动顺序而导致效率低下，而会因利用国际象棋的优势而提高效率。知识。（您可以通过消除会导致 A* 搜索中先前探索的位置的移动选择来弥补一些低效率：这些移动选择可以在算术代码中获得权重 0。）

也很难估计这可以节省多少会给你带来平均情况的复杂性，而不需要从实际的语料库中收集一些统计数据。但我认为所有移动同样可能的起点已经击败了这里的大多数提案：算术编码不需要每次移动整数位数。

It'd add interest to optimize for average-case size for typical games played by humans, instead of the worst case. (The problem statement doesn't say which; most responses assume worst-case.)

For the move sequence, have a good chess engine generate moves from each position; it'll produce a list of k possible moves, ordered by its ranking of their quality. People generally pick good moves more often than random moves, so we need to learn a mapping from each position in the list to the probability that people pick a move that 'good'. Using these probabilities (based on a corpus of games from some internet chess database), encode the moves with arithmetic coding. (The decoder must use the same chess engine and mapping.)

For the starting position, ralu's approach would work. We could refine it with arithmetic coding there as well, if we had some way to weight the choices by probability — e.g. pieces often appear in configurations defending each other, not at random. It's harder to see an easy way to incorporate that knowledge. One idea: fall back on the above move encoding instead, starting from the standard opening position and finding a sequence that ends in the desired board. (You might try A* search with a heuristic distance equaling the sum of the distances of pieces from their final positions, or something along those lines.) This trades some inefficiency from overspecifying the move sequence vs. efficiency from taking advantage of chess-playing knowledge. (You can claw back some of the inefficiency by eliminating move choices that would lead to a previously-explored position in the A* search: these can get weight 0 in the arithmetic code.)

It's also kind of hard to estimate how much savings this would buy you in average-case complexity, without gathering some statistics from an actual corpus. But the starting point with all moves equally probable I think would already beat most of the proposals here: the arithmetic coding doesn't need an integer number of bits per move.

回复收藏 0 原文

故人爱我别走 2024-08-20 14:03:54

攻击初始位置编码后对步骤进行编码的子问题。该方法是创建步骤的“链接列表”。

游戏中的每个步骤都被编码为“旧位置->新位置”对。你知道棋局开始时的初始位置；通过遍历步链表，可以到达X步移动后的状态。

为了对每个步骤进行编码，您需要 64 个值来编码起始位置（棋盘上 64 个方格的 6 位 - 8x8 方格），以及结束位置的 6 位。每边 1 次移动 16 位。

编码给定游戏所需的空间量与移动数量成正比：

10 x（白色移动数量 + 黑色移动数量）位。

更新：升级典当的潜在并发症。需要能够说明 pawn 升级到什么 - 可能需要特殊位（为此使用格雷码以节省空间，因为 pawn 升级极为罕见）。

更新 2：您不必对结束位置的完整坐标进行编码。在大多数情况下，要移动的棋子只能移动到 X 个位置。例如，一个 pawn 在任何给定点最多可以有 3 个移动选项。通过实现每种棋子类型的最大移动次数，我们可以节省“目的地”编码的位数。

Pawn: 
   - 2 options for movement (e2e3 or e2e4) + 2 options for taking = 4 options to encode
   - 12 options for promotions - 4 promotions (knight, biship, rook, queen) times 3 squares (because you can take a piece on the last row and promote the pawn at the same time)
   - Total of 16 options, 4 bits
Knight: 8 options, 3 bits
Bishop: 4 bits
Rook: 4 bits
King: 3 bits
Queen: 5 bits

因此，黑色或白色的每次移动的空间复杂度变为

初始位置的 6 位 +（基于移动物体的类型的可变位数）。

Attacking a subproblem of encoding the steps after an initial position has been encoded. The approach is to create a "linked list" of steps.

Each step in the game is encoded as the "old position->new position" pair. You know the initial position in the beginning of the chess game; by traversing the linked list of steps, you can get to the state after X moves.

For encoding each step, you need 64 values to encode the starting position (6 bits for 64 squares on the board - 8x8 squares), and 6 bits for the end position. 16 bits for 1 move of each side.

Amount of space that encoding a given game would take is then proportionate to the number of moves:

10 x (number of white moves + number of black moves) bits.

UPDATE: potential complication with promoted pawns. Need to be able to state what the pawn is promoted to - may need special bits (would use gray code for this to save space, as pawn promotion is extremely rare).

UPDATE 2: You don't have to encode the end position's full coordinates. In most cases, the piece that's being moved can move to no more than X places. For example, a pawn can have a maximum of 3 move options at any given point. By realizing that maximum number of moves for each piece type, we can save bits on the encoding of the "destination".

Pawn: 
   - 2 options for movement (e2e3 or e2e4) + 2 options for taking = 4 options to encode
   - 12 options for promotions - 4 promotions (knight, biship, rook, queen) times 3 squares (because you can take a piece on the last row and promote the pawn at the same time)
   - Total of 16 options, 4 bits
Knight: 8 options, 3 bits
Bishop: 4 bits
Rook: 4 bits
King: 3 bits
Queen: 5 bits

So the spatial complexity per move of black or white becomes

6 bits for the initial position + (variable number of bits based upon the type of the thing that's moved).

回复收藏 0 原文

挥剑断情 2024-08-20 14:03:54

昨晚我看到这个问题，它引起了我的兴趣，所以我坐在床上思考解决方案。我的最终答案实际上与 int3 非常相似。

基本解决方案

假设有一个标准的国际象棋游戏，并且您不对规则进行编码（例如白方总是先走），那么您可以通过仅对每个棋子的移动进行编码来节省大量资金。

总共有 32 个棋子，但每次移动时您都知道正在移动的颜色，因此只需担心 16 个方格，即本轮移动的棋子的 4 位。

每个棋子只有有限的动作组，您可以通过某种方式进行枚举。

：4 个选项，2 位（向前 1 步，向前 2 步，每个对角线 1 个）
车：14 个选项，4 位（每个方向最多 7 个）
兵：13 个选项，4 位（如果一条对角线上有 7 个，则另一条对角线上只有 6 个）
骑士：8 个选项，3 位
皇后：27 个选项，< strong>5位（车+主教）
国王：9个选项，4位（8个一步棋，加上易位选项）

为了升级，有4个棋子可供选择（ Rook、Bishop、Knight、Queen），因此在该移动中我们将添加 2 位 来指定。我认为所有其他规则都会自动涵盖（例如 en passant）。

进一步优化

首先，在捕获一种颜色的 8 个片段后，您可以将片段编码减少到 3 位，然后将 4 个片段减少到 2 位，依此类推。

不过，主要的优化是仅枚举游戏中每个点的可能移动。假设我们将 Pawn 的移动存储为 {00, 01, 10, 11}，分别表示向前 1 步、向前 2 步、向左对角线和向右对角线。如果某些动作是不可能的，我们可以将它们从本回合的编码中删除。

我们知道每个阶段的游戏状态（通过跟踪所有的移动），因此在读取哪个棋子将要移动之后，我们总是可以确定需要读取多少位。如果我们意识到棋子此时唯一的动作是向右对角捕获或向前移动一个，我们就知道只读取 1 位。

简而言之，上面列出的每个部分的位存储只是一个最大值。几乎每一步的选择都会更少，而且通常也会更少。

I saw this question last night and it intrigued me so I sat in bed thinking up solutions. My final answer is pretty similar to int3's actually.

Basic solution

Assuming a standard chess game and that you don't encode the rules (like White always goes first), then you can save a lot by encoding just the moves each piece makes.

There are 32 pieces total but on each move you know what colour is moving so there's only 16 squares to worry about, which is 4 bits for which piece moves this turn.

Each piece only has a limited moveset, which you would enumerate in some way.

Pawn: 4 options, 2 bits (1 step forward, 2 steps forward, 1 each diagonal)
Rook: 14 options, 4 bits (max of 7 in each direction)
Bishop: 13 options, 4 bits (if you have 7 in one diagonal, you only have 6 in the other)
Knight: 8 options, 3 bits
Queen: 27 options, 5 bits (Rook+Bishop)
King: 9 options, 4 bits (8 one-step moves, plus the castling option)

For promotion, there are 4 pieces to choose from (Rook, Bishop, Knight, Queen) so on that move we would add 2 bits to specify that. I think all the other rules are covered automatically (e.g. en passant).

Further optimizations

First, after 8 pieces of one colour have been captured, you could reduce the piece encoding to 3 bits, then 2 bits for 4 pieces and so on.

The main optimization though is to enumerate only the possible moves at each point in the game. Assume we store a Pawn's moves as {00, 01, 10, 11} for 1 step forward, 2 steps forward, diagonal left and diagonal right respectively. If some moves are not possible we can remove them from the encoding for this turn.

We know the game state at every stage (from following all the moves), so after reading which piece is going to move, we can always determine how many bits we need to read. If we realize a pawn's only moves at this point are capture diagonally right or move forward one, we know to only read 1 bit.

In short, the bit storage listed above for each piece is a maximum only. Nearly every move will have fewer options and often fewer bits.

回复收藏 0 原文

清旖 2024-08-20 14:03:54

在每个位置获取所有可能的移动次数。

下一步的生成方式是

index_current_move =n % num_of_moves //this is best space efficiency
n=n/num_of_moves

可证明存储随机生成的游戏的最佳空间效率，并且平均每步需要大约 5 位，因为您有 30-40 种可能的移动。
组装存储只是以相反的顺序生成n。

由于冗余度大，存储位置更难破解。（一个地点的棋盘上最多可以有 9 个皇后，但在这种情况下，没有棋子，而象在棋盘上则位于相反颜色的方格上）但通常就像在剩余的方格上存储相同棋子的组合。）

编辑：

保存动作的要点是仅存储动作索引。我们不应该存储 Kc1-c2 并尝试减少此信息，而应该仅添加从确定性 movegenerator(position) 生成的移动索引。

在每次移动时，我们在池中添加大小信息

num_of_moves = get_number_of_possible_moves(postion) ;

，并且该数量不能减少，

生成的信息池是

n=n*num_of_moves+ index_current_move

额外的

如果最终位置只有一步可用，则保存为先前完成的强制步数。
示例：如果起始位置每侧有 1 次强制移动（2 次移动），并且我们希望将其保存为一次移动游戏，请将 1 存储在池 n 中。

存储到信息池的示例

假设我们已经知道起始位置并且我们进行 3 次移动。

在第一步中，有 5 个可用的移动，我们采用移动索引 4。在第二个移动中，有 6 个可用的移动，我们采用位置索引 3，在第三个移动中，该方有 7 个可用的移动，他选择选择移动索引2.

矢量形式；
索引=[4,3,2]
n_moves=[5,6,7]

我们向后编码这个信息，所以 n= 4+5*(3+6*(2))=79 （不需要乘以 7）

如何解开这个？
首先我们有位置，我们发现有 5 步可用。因此，

index=79%5=4
n=79/5=15; //no remainder

我们采用移动索引 4 并再次检查位置，从这一点我们发现有 6 种可能的移动。

index=15%6=3
n=15/6=2

我们采用移动索引 3，它使我们达到有 7 种可能移动的位置。

index=2%7=2
n=2/7=0

我们执行最后一步索引 2，到达最终位置。

正如你所看到的，时间复杂度是 O(n)，空间复杂度是 O(n)。
编辑：时间复杂度实际上是 O(n^2)，因为乘以的数字会增加，但存储最多 10,000 步的游戏应该没有问题。

保存位置

可以接近最佳状态。

当我们发现信息并存储信息时，让我详细讨论一下。总体思路是减少冗余（我稍后会谈到）。假设没有晋升，也没有夺取，因此每方有 8 个兵、2 个车、2 个马、2 个主教、1 个国王和 1 个后。

我们要保存什么：
1.各和平的位置
2. 易位的可能性
3. 过路的可能性
4. 可以移动的一侧

让我们假设每个棋子都可以站在任何地方，但不能有 2 个棋子在同一个地方。
棋盘上8个同色棋子的排列方式为C(64/8)（二项式），即32位，则2个车2R→ C(56/2)、2B→ C(54/2)、2N→C(52/2)、1Q→C(50/1)、1K→C(50/1)。 C(49/1) 与其他站点相同，但以 8P 开头 -> C(48/8)等。

将这两个站点的值相乘得到数字 4634726695587809641192045982323285670400000，大约为 142 位，我们必须为一个可能的 en-passant 添加 8（en-passant 棋子可以在 8 个位置之一），16（4 位）用于王位限制，一位用于已移动的站点。我们最终得到 142+3+4+1=150bits

但现在让我们继续在棋盘上寻找冗余，有 32 个棋子并且不采取任何行动。

黑色和白色棋子都在同一列上并且彼此相对。每个棋子都面对其他棋子，这意味着白色棋子最多只能排在第六位。这给我们带来 8*C(6/2)
而不是 C(64/8)*C(48/8)，它减少了 56 位信息。
易位的可能性也是多余的。如果车不在起始位置，则该车没有易车的可能性。因此，如果可以用这辆车进行易位，我们可以想象在船上添加 4 个方格以获得额外信息，并删除 4 个易位位。
，我们得到的是 C(58/2)*C(42/2)，而不是 C(56/2)*C(40/2)*16，并且丢失了 3.76 位（几乎全部 4 位）
en-路过：
当我们存储8个en-passant可能性之一时，我们知道黑棋子的位置并减少信息冗余（如果它是白棋并且有第3个棋子en-passant，这意味着黑棋子在c5上，而白棋子是c2，c3或c4) 因此，插入 C(6/2) 后，我们有 3 位，但丢失了 2.3 位。如果我们存储可以完成的过路人数量（3种可能性->左、右、两者），并且我们知道可以带走过路人的棋子的位置，那么我们就减少了一些冗余。（例如，从前面的 en passant 示例中，c5 上的黑色棋子可以在左、右或两者中。如果它在一个站点上，我们有 2*3（3 个用于存储 psisibilites，2 个可能的移动用于第 7 或第 6 级的黑色棋子））插入 C(6/2)，我们减少了 1.3 位，如果两边都减少了 4.2 位，这样我们就可以减少每个路过的人 2.3+1.3=3.6 位
：bisops 可以是。仅在对面方格上，这会为每个站点减少 1 位冗余。

如果我们总计需要 150-56-4。 -3.6-2=85位用于存储国际象棋位置（如果没有走棋）

如果考虑到走棋和晋级，可能不会更多（但如果有人发现这么长，我稍后会写到这一点）帖子有用）

At each position get the number of all possible moves.

next move is generated as

index_current_move =n % num_of_moves //this is best space efficiency
n=n/num_of_moves

provably best space efficiency for storing randomly generated game and need approx 5 bits/move on average since you have 30-40 possible moves.
Assembling storage is just generating n in reverse order.

Storing position is harder to crack, because of great redundancy. (There can be up to 9 queens on board for one site but in that case there are no pawns, and bishops if on the board are on opposite colored squares) but generally is like storing combination of same pieces over remaining squares.)

EDIT:

Point in saving moves is to store only the index of move. Instead of storing Kc1-c2 and trying to reduce this info we should add only index of move generated from deterministic movegenerator(position)

At each move we add information of size

num_of_moves = get_number_of_possible_moves(postion) ;

in pool and this number cannot be reduced

generating information pool is

n=n*num_of_moves+ index_current_move

extra

If there is only one move available in final position, save as number of previously done forced moves.
Example: if starting position has 1 forced moves for each side (2 moves) and we want to save this as one move game, store 1 in pool n.

example of storing into info pool

Lets suppose that we have known starting positions and we do 3 moves.

In first move there are 5 available moves, and we take move index 4. In second move there are 6 available moves and we take position index 3 and in 3th move there are 7 moves available for that side and he chose to pick the move index 2.

Vector form;
index=[4,3,2]
n_moves=[5,6,7]

We are encoding this info backwards, so n= 4+5*(3+6*(2))=79 (no multiplying by 7 needed)

How to unloop this?
First we have position and we find out that there are 5 moves available. So

index=79%5=4
n=79/5=15; //no remainder

We take move index 4 and examine position again and from this point we find out that there are 6 possible moves.

index=15%6=3
n=15/6=2

And we take move index 3 which gets us to a position with 7 possible moves.

index=2%7=2
n=2/7=0

We do last move index 2 and we reach final position.

As you can see the time complexity is O(n) ansd space complexity is O(n).
Edit: time complexity is actually O(n^2) because the number you multipy by increases, but there should be no problem storing games up to 10,000 moves.

saving position

Can be done close to optimum.

When we find out about information and storing informations let me talk more about it. General idea is to decrease redundancy (I will talk about that later). Lets presume that there were no promotions and no taking so there are 8 pawns, 2 rooks, 2 knights, 2 bishops 1 king and 1 queen per side.

What do we have to save:
1. position of each peace
2. posibilities of castling
3. possibilities of en-passant
4. side that has move avaliable

Let's suppose that every piece can stand anywhere but not 2 pieces at same place.
Number of ways 8 pawns of same color can be arranged on board is C(64/8) (binomial) which is 32 bits, then 2 rooks 2R-> C(56/2), 2B -> C(54/2), 2N->C(52/2), 1Q->C(50/1), 1K -> C(49/1) and same for other site but starting with 8P -> C(48/8) and so on.

Multiplying this together for both sites get us number 4634726695587809641192045982323285670400000 which is approx 142 bits, we have to add 8 for one possible en-passant (en-passant pawn can be in one of 8 places), 16 (4 bits) for castling limitations and one bit for site that has move. We end up with 142+3+4+1=150bits

But now let's go on the hunt for redundancy on the board with 32 pieces and no taking.

both black and white pawns are on same column and facing each other. Each pawn is facing other pawn that means that white pawn can be at most at 6th rank. This bring us 8*C(6/2)
instead of C(64/8)*C(48/8) which decrease information by 56 bits.
possibility of castling is also redundant. If rooks are not on starting place there is no castling possibility whit that rook. So we can imaginaly add 4 squares on board to get the extra info if castling whit this rook is possible and remove 4 castling bits.
So instead of C(56/2)*C(40/2)*16 we have C(58/2)*C(42/2) and we lost 3.76 bits (almost all of 4 bits)
en-passant:
When we store one of 8 en passant possibilites, we know position of black pawn and reduce informational redindancy (if it is white move and has 3th pawn en-passant that mean that black pawn is on c5 and white pawn is either c2,c3 or c4) so insted of C(6/2) we have 3 and we lost 2.3 bits. We decrease some redundancy if we store whit en-passant number also side from which can be done (3 possibilities-> left,right,both) and we know the possiton of pawn that can take en passant. (for instance from prevous en passant example whit black on c5 what can be in left, right or both. if it is on one site we have 2*3 (3 for storing psissibilites and 2 possible moves for black pawn on 7th or 6 rank) insted of C(6/2) and we reduce by 1.3 bits and if on boths sides we reduce by 4.2 bits. That way we can reduce by 2.3+1.3=3.6 bits per en passant.
bishops: bisops can be on opostite squares only, this reduce redundancy by 1 bit for each site.

If we sum up we need 150-56-4-3.6-2=85bits for storing chess position if there were no takings

And probably not much more if there are takings and promotions taken in account (but i will write about that later if somebody will find this long post usefull)

回复收藏 0 原文

枉心 2024-08-20 14:03:54

大多数人一直在对棋盘状态进行编码，但是关于动作本身......这是一个位编码描述。

每块的位数：

块 ID： 最多 4 位来识别每面 16 个块。可以推断出白色/黑色。在零件上定义顺序。当碎片的数量低于各自的 2 的幂时，使用更少的位来描述剩余的碎片。
典当：第一次移动有3种可能性，所以+2位（向前移动一个或两个方格，过路。）后续移动不允许向前移动两个，所以+1位就足够了。可以在解码过程中通过注意棋子何时到达最后一个等级来推断升级。如果知道该棋子将被提升，则解码器将期望另外 2 位指示它已被提升为 4 个主要棋子中的哪一个。
Bishop： +1 位用于使用对角线，最多 +4 位用于沿对角线的距离（16 种可能性）。解码器可以推断该块可以沿该对角线移动的最大可能距离，因此如果对角线较短，则使用较少的位。
骑士： 8 种可能的移动，+3 位
车： +1 位用于水平/垂直，+4 位用于沿线的距离。
King： 8 种可能的移动方式，+3 位。用“不可能”的移动来表示易位——因为只有当国王处于第一等级时才可能进行易位，因此用将国王“向后”移动的指令来编码此移动——即移出棋盘。
皇后： 8 个可能的方向，+3 位。沿线/对角线的距离最多 +4 位（如果对角线较短，则更少，如主教的情况）

假设所有棋子都在棋盘上，这些是每次移动的位： Pawn - 第一次移动 6 位， 5 随后。 7 如果晋升。主教：9 位（最多），马：7 位，车：9 位，国王：7 位，后：11 位（最多）。

回复收藏 0 原文

复古式 2024-08-20 14:03:54

问题是给出一种对典型国际象棋游戏最有效的编码，还是一种具有最短最坏情况编码的编码？

对于后者，最有效的方法也是最不透明的：创建所有可能对（初始棋盘，合法的移动顺序）的枚举，其中，绘制三次重复位置并且不超过-自最后一个棋子移动或捕获规则以来的五十次移动是递归的。然后，这个有限序列中的位置索引给出最短的最坏情况编码，而且对于典型情况也给出同样长的编码，并且我预计计算起来非常昂贵。最长可能的国际象棋游戏应该超过 5000 步，每个玩家通常在每个位置有 20-30 步可用（尽管当剩下的棋子很少时会更少） - 这给出了这种编码所需的大约 40000 位。

可以应用枚举的思想来提供更容易处理的解决方案，如上面 Henk Holterman 对编码动作的建议中所述。我的建议：不是最小的，但比我看过的上面的例子要短，并且合理易处理：

64位来表示哪些方块被占用（占用矩阵），加上每个占用的方块中包含哪些部分的列表（可以有 3 位用于棋子，4 位用于其他棋子）：这为起始位置提供了 190 位。由于船上的数量不能超过 32 个，因此占用矩阵的编码是多余的因此可以对公共棋盘位置之类的内容进行编码，例如 33 个设置位加上公共棋盘列表中的棋盘索引。
1 位表示谁先走
按照 Henk 的建议编写移动代码：通常每对白/黑移动 10 位，尽管有些移动将占用 0 位，当玩家没有替代动作时。

这表明需要 490 位来编码典型的 30 步游戏，并且对于典型的游戏来说是相当有效的表示。

关于编码三次重复位置和自上次棋子移动或捕获规则以来不超过五十次移动：如果您将过去的移动编码回到最后一个棋子移动或捕获，那么您有足够的信息来决定这些规则是否适用：不需要整个游戏历史。

Is the problem to give an encoding that is most efficient for typical chess games, or one that has the shortest worst case encoding?

For the latter, the most efficient way is also the most opaque: create an enumeration of all possible pairs (initial board, legal sequence of moves), which, with the draw-on-thrice-repeated-position and no-more-than-fifty-moves since last-pawn-move-or-capture rules, is recursive. Then the index of a position in this finite sequence gives the shortest worst-case encoding, but also and equally long encoding for typical cases, and is, I expect, very expensive to compute. The longest possible chess game is supposed to be over 5000 moves, with there typically being 20-30 moves available in each position to each player (though fewer when there are few pieces left) - this gives something like 40000 bits needed for this encoding.

The idea of enumeration can be applied to give a more tractable solution, as described in Henk Holterman's suggestion for encoding moves above. My suggestion: not minimal, but shorter than the examples above I've looked at, and reasonable tractable:

64 bits to represent which squares are occupied (occupancy matrix), plus list of which pieces are in each occupied square (can have 3 bits for pawns, and 4 bits for other pieces): this gives 190 bits for start position. Since there can't be more than 32 pieces on board, the encoding of the occupancy matrix is redundant & so something like common board positions can be encoded, say as 33 set bits plus index of board from common boards list.
1 bit to say who makes first move
Code moves per Henk's suggestion: typically 10 bits per pair of white/black move, though some moves will take 0 bits, when a player has no alternative moves.

This suggests 490 bits to code a typical 30-move game, and would be a reasonably efficient representation for typical games.

Abouth encoding the draw-on-thrice-repeated-position and no-more-than-fifty-moves since last-pawn-move-or-capture rules: if you encode thepast moves back to the last pawn move or capture, then you have enough information to decide whether these rules apply: no need for the whole game history.

回复收藏 0 原文

別甾虛僞 2024-08-20 14:03:54

棋盘上的位置可以用 7 位（0-63 和 1 值指定它不再在棋盘上）来定义。
因此，对于棋盘上的每个棋子，请指定它所在的位置。

32 块 * 7 位 = 224 位

编辑：
正如卡德里安指出的……我们还有“将典当提升为皇后”的案例。我建议我们在末尾添加额外的位来指示哪个棋子已被提升。

因此，对于每个已升级的 pawn，我们在 224 位后面加上 5 位，这 5 位表示已升级的 pawn 的索引，如果它是列表的末尾，则为 11111。

所以最小情况（无提升）是 224 位 + 5（无提升）。对于每个升级的棋子添加 5 位。

编辑：
正如毛茸茸的青蛙指出的那样，我们在末尾还需要另一个位来指示轮到谁了;^)

回复收藏 0 原文

帅哥哥的热头脑 2024-08-20 14:03:54

我会使用游程长度编码。有些片段是唯一的（或者只存在两次），所以我可以省略它们后面的长度。与 cletus 一样，我需要 13 个独特的状态，因此我可以使用半字节（4 位）来对片段进行编码。初始板将如下所示：

White Rook, W. Knight, W. Bishop, W. Queen, W. King, W. Bishop, W. Knight, W. Rook,
W. Pawn, 8,
Empty, 16, Empty, 16
B. Pawn, 8,
B. Rook, B. Knight, B. Bishop, B. Queen, B. King, B. Bishop, B. Knight, B. Rook

这使我有 8+2+4+2+8 半字节 = 24 半字节 = 96 位。我无法用半字节对 16 进行编码，但由于“空，0”没有意义，我可以将“0”视为“16”。

如果棋盘是空的，但对于左上角的单个棋子，我得到“Pawn，1，Empty，16，Empty，16，Empty 16，Empty，15”= 10 半字节= 40 位。

最糟糕的情况是每块之间有一个空的方块。但对于该片段的编码，我只需要 16 个值中的 13 个，所以也许我可以使用另一个值来表示“Empty1”。然后，我需要 64 个半字节 == 128 位。

对于运动，我需要 3 位用于该块（颜色是由白色总是先移动的事实给出的）加上 5 位（0..63）用于新位置 = 每个运动一个字节。大多数时候，我不需要旧的位置，因为只有一个棋子在范围内。对于奇怪的情况，我必须使用单个未使用的代码（我只需要 7 个代码来编码该片段），然后使用 5 位用于旧位置，5 位用于新位置。

这使我能够在 13 次咬合中对易位进行编码（我可以将国王移向车，这足以说明我的意图）。

[编辑] 如果您允许使用智能编码器，那么我需要 0 位用于初始设置（因为它不必以任何方式编码：它是静态的）加上每次移动一个字节。

[EDIT2] 这就留下了典当转换。如果一个棋子到达最后一行，我可以将它移动到适当的位置来表示“变换”，然后为它所替换的棋子添加 3 位（您不必使用皇后；您可以用任何东西替换棋子但国王）。

I'd use a run length encoding. Some pieces are unique (or exist only twice), so I can omit the length after them. Like cletus, I need 13 unique states, so I can use a nibble (4 bits) to encode the piece. The initial board would then look like this:

White Rook, W. Knight, W. Bishop, W. Queen, W. King, W. Bishop, W. Knight, W. Rook,
W. Pawn, 8,
Empty, 16, Empty, 16
B. Pawn, 8,
B. Rook, B. Knight, B. Bishop, B. Queen, B. King, B. Bishop, B. Knight, B. Rook

which leaves me with 8+2+4+2+8 nibbles = 24 nibbles = 96 bits. I can't encode 16 with a nibble but since "Empty, 0" doesn't make sense, I can treat "0" as "16".

If the board is empty but for a single pawn in the upper left corner, I get "Pawn, 1, Empty, 16, Empty, 16, Empty 16, Empty, 15" = 10 nibbles = 40 bits.

The worst case is when I have an empty square between each piece. But for the encoding of the piece, I just need 13 out of 16 values, so maybe I can use another one to say "Empty1". Then, I need 64 nibbles == 128bits.

For the movements, I need 3 bits for the piece (the color is given by the fact that white always moves first) plus 5 bits (0..63) for the new position = one byte per movement. Most of the time, I don't need the old position since only a single piece will be within range. For the odd case, I must use the single unused code (I just need 7 codes to encode the piece) and then 5 bits for the old and 5 bits for the new position.

This allows me to encode castling in 13 bites (I can move the King towards the Rook which is enough to say what I intend).

[EDIT] If you allow a smart encoder, then I need 0 bits for the initial setup (because it doesn't have to be encoded in any way: It's static) plus one byte per move.

[EDIT2] Which leaves the pawn transformation. If a pawn reaches the last row, I can move it in place to say "transforms" and then add the 3 bits for the piece it is replaced with (you don't have to use a queen; you can replace the pawn with anything but the King).

回复收藏 0 原文

忘东忘西忘不掉你 2024-08-20 14:03:54

就像他们在书籍和纸张上编码游戏一样：每件作品都有一个符号；因为这是一个“合法”游戏，所以白色首先移动 - 无需分别编码白色或黑色，只需计算移动次数即可确定谁移动了。此外，每个动作都被编码为（棋子，结束位置），其中“结束位置”被减少到允许辨别歧义的最少量符号（可以为零）。游戏的长度决定了移动的数量。人们还可以对每一步的时间进行编码（自上次移动以来）。

可以通过为每个棋子（总共 32 个）分配一个符号或为类分配一个符号来完成棋子的编码，并使用结束位置来了解移动了哪一个棋子。例如，一个棋子有 6 个可能的结束位置；但平均每次只有几个人可以参与。因此，从统计上看，按结束位置进行编码可能最适合这种情况。

类似的编码用于计算神经科学（AER）中的尖峰序列。

缺点：您需要重玩整个游戏才能获得当前状态并生成子集，就像遍历链表一样。

回复收藏 0 原文

吾性傲以野 2024-08-20 14:03:54

板上有 64 个可能的位置，因此每个位置需要 6 位。有 32 个初始块，因此到目前为止总共有 192 位，其中每 6 位表示给定块的位置。我们可以预先确定碎片出现的顺序，所以我们不必说哪个是哪个。

如果棋盘上有一块棋子掉了怎么办？好吧，我们可以将一块棋子与另一块棋子放在同一位置，以表明它不在棋盘上，否则这是非法的。但我们也不知道第一块是否会出现在棋盘上。因此，我们添加 5 位来指示哪个片段是第一个片段（32 种可能性 = 5 位来表示第一个片段）。然后我们可以使用该位置来处理棋盘外的后续棋子。这样总共就有 197 位。棋盘上必须至少有一个棋子，这样才能起作用。

然后我们需要一位轮到谁了 - 使我们达到198 位。

典当促销怎么样？我们可以通过为每个 pawn 添加 3 位，然后添加 42 位来做到这一点。但随后我们可以注意到，大多数时候，典当都不会升级。

因此，对于棋盘上的每个棋子，“0”位表示它没有升级。如果棋盘上没有棋子，那么我们根本不需要棋子。然后我们可以使用可变长度的位串来进行他的促销。最常见的是女王，因此“10”可以表示女王。那么“110”代表车，“1110”代表主教，“1111”代表马。

初始状态将需要 198 + 16 = 214 位，因为所有 16 个棋子都在棋盘上并且未升级。具有两个升级的棋子皇后的最终游戏可能需要类似于 198 + 4 + 4 的东西，这意味着 4 个活着的棋子但未升级，以及 2 个皇后棋子，总共 206 位。看起来相当健壮！

===

正如其他人指出的那样，霍夫曼编码将是下一步。如果您观察几百万场比赛，您会发现每个棋子更有可能出现在某些方格上。例如，大多数时候，棋子保持在一条直线上，或者一个在左边/一个在右边。国王通常会留在本垒附近。

因此，为每个单独的位置设计一个霍夫曼编码方案。棋子可能平均只占用 3-4 个比特，而不是 6 个。国王也应该占用很少的比特。

同样在此方案中，将“采取”作为可能的位置。这也可以非常稳健地处理易位 - 每个车和王都会有一个额外的“原始位置，移动”状态。您还可以通过这种方式对 pawn 中的 en passant 进行编码 - “原始位置，可以 en passant”。

有了足够的数据，这种方法应该会产生非常好的结果。

There are 64 possible board positions, so you need 6 bits per position. There are 32 initial pieces, so we have 192 bits total so far, where every 6 bits indicates the position of the given piece. We can pre-determine the order the pieces appear in, so we don't have to say which is which.

What if a piece is off the board? Well, we can place a piece on the same spot as another piece to indicate that it is off the board, since that would be illegal otherwise. But we also don't know whether the first piece will be on the board or not. So we add 5 bits indicating which piece is the first one (32 possibilities = 5 bits to represent the first piece). Then we can use that spot for subsequent pieces that are off the board. That brings us to 197 bits total. There has to be at least one piece on the board, so that will work.

Then we need one bit for whose turn it is - brings us to 198 bits.

What about pawn promotion? We can do it a bad way by adding 3 bits per pawn, adding on 42 bits. But then we can notice that most of the time, pawns aren't promoted.

So, for every pawn that is on the board, the bit '0' indicates it is not promoted. If a pawn is not on the board then we don't need a bit at all. Then we can use variable length bit strings for which promotion he has. The most often it will be a queen, so "10" can mean QUEEN. Then "110" means rook, "1110" means bishop, and "1111" means knight.

The initial state will take 198 + 16 = 214 bits, since all 16 pawns are on the board and unpromoted. An end-game with two promoted pawn-queens might take something like 198 + 4 + 4, meaning 4 pawns alive and not promoted and 2 queen pawns, for 206 bits total. Seems pretty robust!

===

Huffman encoding, as others have pointed out, would be the next step. If you observe a few million games, you'll notice each piece is much more likely to be on certain squares. For example, most of the time, the pawns stay in a straight line, or one to the left / one to the right. The king will usually stick around the home base.

Therefore, devise a Huffman encoding scheme for each separate position. Pawns will likely only take on average 3-4 bits instead of 6. The king should take few bits as well.

Also in this scheme, include "taken" as a possible position. This can very robustly handle castling as well - each rook and king will have an extra "original position, moved" state. You can also encode en passant in the pawns this way - "original position, can en passant".

With enough data this approach should yield really good results.

回复收藏 0 原文

颜 2024-08-20 14:03:54

我尝试使用霍夫曼编码。这背后的理论是——在每场国际象棋比赛中，都会有一些棋子移动很多，而另一些棋子移动不多或很早就被淘汰。如果起始位置已经移除了一些棋子 - 那就更好了。对于方块来说也是如此 - 有些方块可以看到所有动作，而有些则不会被太多触动。

因此，我有两张霍夫曼表 - 一张用于碎片，另一张用于正方形。它们将通过查看实际游戏来生成。我可以为每一对棋子准备一张大桌子，但我认为这效率相当低，因为同一棋子再次在同一个棋子上移动的情况并不多。

每件作品都有一个指定的 ID。由于有 32 个不同的块，因此我只需要 5 位作为块 ID。不同游戏的棋子 ID 不会发生变化。对于方形 ID 也是如此，我需要 6 位。

霍夫曼树将通过按顺序遍历每个节点来编码（即，首先输出节点，然后从左到右输出其子节点）。对于每个节点，都有一位指定它是叶节点还是分支节点。如果它是叶节点，则后面会跟有给出 ID 的位。

起始位置将简单地由一系列棋子位置对给出。此后，每一步都会有一对棋子位置。您只需找到提到两次的第一块即可找到起始位置描述符的结尾（以及移动描述符的开头）。如果棋子被升级，将会有 2 个额外的位指定它会变成什么，但棋子 ID 不会改变。

为了考虑棋子在游戏开始时升级的可能性，哈夫曼树和数据之间还会有一个“升级表”。首先有 4 位指定升级了多少个 pawn。然后，对于每个 pawn，都会有其霍夫曼编码的 ID 和 2 位来指定它已变成什么。

哈夫曼树将通过考虑所有数据（起始位置和移动）和晋级表来生成。尽管通常升级表是空的或者只有几个条目。

用图形术语总结一下：

<Game> := <Pieces huffman tree> <squares huffman tree> <promotion table> <initial position> (<moves> | <1 bit for next move - see Added 2 below>)

<Pieces huffman tree> := <pieces entry 1> <pieces entry 2> ... <pieces entry N>
<pieces entry> := "0" | "1" <5 bits with piece ID>

<squares huffman tree> := <squares entry 1> <squares entry 2> ... <squares entry N>
<Squares entry> := "0" | "1" <6 bits with square ID>

<promotion table> := <4 bits with count of promotions> <promotion 1> <promotion 2> ... <promotion N>
<promotion> := <huffman-encoded piece ID> <2 bits with what it becomes>

<initial position> := <position entry 1> <position entry 2> ... <position entry N>
<moves> := <position entry 1> <position entry 2> ... <position entry N>
<position entry> := <huffman-encoded piece ID> <huffman-encoded squre ID> (<2 bits specifying the upgrade - optional>)

已添加：这仍然可以优化。每个棋子只有几个合法动作。不是简单地对目标方格进行编码，而是可以为每个棋子的可能移动提供基于 0 的 ID。每个棋子都会重复使用相同的 ID，因此总共不会超过 21 个不同的 ID（皇后最多可以有 21 个不同的可能移动选项）。将其放入霍夫曼表而不是字段中。

然而，这会给表示原始状态带来困难。人们可能会产生一系列的动作来将每个棋子放在它的位置上。在这种情况下，有必要以某种方式标记初始状态的结束和移动的开始。

或者，可以使用未压缩的 6 位方形 ID 来放置它们。

我不知道这是否会导致整体尺寸减小。也许吧，但应该尝试一下。

添加2：又一个特殊情况。如果游戏状态没有移动，那么区分谁接下来移动就变得很重要。为此在最后添加一位。 :)

I'd try to use Huffman encoding. The theory behind this is - in every chess game there will be some pieces that will move around a lot, and some that don't move much or get eliminated early. If the starting position has some pieces already removed - all the better. The same goes for squares - some squares get to see all action, while some don't get touched much.

Thus I'd have two Huffman tables - one for pieces, other for squares. They will be generated by looking at the actual game. I could have one large table for every piece-square pair, but I think that would be pretty inefficient because there aren't many instances of the same piece moving on the same square again.

Every piece would have an assigned ID. Since there are 32 different pieces, I would need just 5 bits for the piece ID. The piece IDs are not changing from game to game. The same goes for square IDs, for which I would need 6 bits.

The Huffman trees would be encoded by writing each node down as they are traversed in inorder (that is, first the node is output, then its children from left to right). For every node there will be one bit specifiying whether it's a leaf node or a branch node. If it's a leaf node, it will be followed by the bits giving the ID.

The starting position will simply be given by a series of piece-location pairs. After that there will be one piece-location pair for every move. You can find the end of the starting position descriptor (and the start of the moves descriptor) simply by finding the first piece that is mentioned twice. In case a pawn gets promoted there will be 2 extra bits specifying what it becomes, but the piece ID won't change.

To account for the possibility that a pawn is promoted at the start of the game there will also be a "promotion table" between the huffman trees and the data. At first there will be 4 bits specifying how many pawns are upgraded. Then for each pawn there will be its huffman-encoded ID and 2 bits specifying what it has become.

The huffman trees will be generated by taking into account all the data (both the starting position and the moves) and the promotion table. Although normally the promotion table will be empty or have just a few entries.

To sum it up in graphical terms:

<Game> := <Pieces huffman tree> <squares huffman tree> <promotion table> <initial position> (<moves> | <1 bit for next move - see Added 2 below>)

<Pieces huffman tree> := <pieces entry 1> <pieces entry 2> ... <pieces entry N>
<pieces entry> := "0" | "1" <5 bits with piece ID>

<squares huffman tree> := <squares entry 1> <squares entry 2> ... <squares entry N>
<Squares entry> := "0" | "1" <6 bits with square ID>

<promotion table> := <4 bits with count of promotions> <promotion 1> <promotion 2> ... <promotion N>
<promotion> := <huffman-encoded piece ID> <2 bits with what it becomes>

<initial position> := <position entry 1> <position entry 2> ... <position entry N>
<moves> := <position entry 1> <position entry 2> ... <position entry N>
<position entry> := <huffman-encoded piece ID> <huffman-encoded squre ID> (<2 bits specifying the upgrade - optional>)

Added: This could still be optimized. Every piece only has a few legal moves. Instead of simply encoding the target square one could give 0-based IDs for the possible moves of every piece. The same IDs would get reused for every piece, so in total there would be no more than 21 different IDs (the queen can have at most 21 diffferent possible move options). Put this in a huffman table instead of the fields.

This would however present a difficulty in representing the original state. One might generate a series of moves to put each piece in its place. In this case it would be necessary to somehow mark the end of the initial state and start of the moves.

Alternatively they could be placed by using uncompressed 6-bit square IDs.

Whether this would present an overall decrease in size - I don't know. Probably, but should experiment a bit.

Added 2: One more special case. If the game state has NO moves it becomes important to distinguish who moves next. Add one more bit at the end for that. :)

回复收藏 0 原文

携君以终年 2024-08-20 14:03:54

[正确阅读问题后编辑]如果您假设每个合法位置都可以从初始位置（这是“合法”的可能定义）到达，那么任何位置都可以表示为从一开始的移动序列。从非标准位置开始的比赛片段可以表示为到达起点所需的移动序列、打开摄像机的开关以及随后的移动。

因此，我们将初始板状态称为单个位“0”。

可以通过对方格进行编号并按（开始、结束）对移动进行排序来枚举从任何位置开始的移动，传统的 2 方格跳跃表示易位。无需对非法走法进行编码，因为棋盘位置和规则始终是已知的。打开相机的标志可以表示为特殊的带内移动，或者更明智地表示为带外移动编号。

双方各有 24 个开局棋子，每方可容纳 5 个棋子。后续的移动可能需要更多或更少的位，但合法的移动总是可枚举的，因此每个移动的宽度可以愉快地增长或扩展。我没有计算过，但我想 7 位位置会很少见。

使用该系统，可以用大约 500 位对 100 步半步游戏进行编码。
然而，使用一本入门书可能是明智的。假设它包含一百万个序列。那么，初始的 0 表示从标准板开始，后面跟着 20 位数字的 1 表示从该打开序列开始。具有某种传统开局的游戏可能会缩短 20 个半步或 100 个比特。

这不是最大可能的压缩，但是（没有开卷）如果您已经拥有问题所假设的国际象棋模型，那么实现起来会非常容易。

为了进一步压缩，您需要根据可能性而不是任意顺序对移动进行排序，并用更少的位对可能的序列进行编码（例如使用人们提到的霍夫曼标记）。

[edited after reading the question properly] If you assume every legal position can be reached from the initial position (which is a possible definition of "legal"), then any position can be expressed as the sequence of moves from the start. A snippet of play starting from a non-standard position can be expressed as the sequence of moves needed to reach the start, a switch to turn the camera on, followed by subsequent moves.

So let's call the initial board state the single bit "0".

Moves from any position can be enumerated by numbering the squares and ordering the moves by (start, end), with the conventional 2 square jump indicating castling. There is no need to encode illegal moves, because the board position and the rules are always already known. The flag to turn the camera on could either be expressed as a special in-band move, or more sensibly as an out-of-band move number.

There are 24 opening moves for either side, which can fit in 5 bits each. Subsequent moves might require more or less bits, but the legal moves are always enumerable, so the width of each move can happily grow or expand. I have not calculated, but I imagine 7 bit positions would be rare.

Using this system, a 100 half-move game could be encoded in approximately 500 bits.
However, it might be wise to use an opening book. Suppose it contains a million sequences. Let then, an initial 0 indicate a start from the standard board, and a 1 followed by a 20 bit number indicate a start from that opening sequence. Games with somewhat conventional openings might be shortened by say 20 half-moves, or 100 bits.

This is not the greatest possible compression, but (without the opening book) it would be very easy to implement if you already have a chess model, which the question assumes.

To further compress, you'd want to order the moves according to likelihood rather than in an arbitrary order, and encode the likely sequences in fewer bits (using e.g. Huffman tokens as people have mentioned).

回复收藏 0 原文

白衬杉格子梦 2024-08-20 14:03:54

如果计算时间不是问题，那么您可以使用确定性的可能位置生成器为给定位置分配唯一的 id。

首先从给定位置以确定性方式生成多个可能位置，例如从左下角开始移动到右上角。这决定了下一步需要多少位，某些情况下可能只有一位。然后，当进行移动时，仅存储该移动的唯一 ID。

只要以确定性方式处理，升级和其他规则就简单地算作有效的棋步，例如后、车、象，每个棋步都算作单独的棋步。

初始位置是最难的，可以生成大约 2.5 亿个可能的位置（我认为），这需要大约 28 位加上一个额外的位来确定谁的着法。

假设我们知道轮到谁了（每个回合从白色翻转到黑色），确定性生成器将类似于：

for each row
    for each column
        add to list ( get list of possible moves( current piece, players turn) )

“获取可能的移动列表”将执行以下操作：

if current piece is not null 
    if current piece color is the same as the players turn
        switch( current piece type )
            king - return list of possible king moves( current piece )
            queen - return list of possible queen moves( current piece )
            rook - return list of possible rook moves( current piece )
            etc.

如果国王处于检查状态，则每个“可能的 xxx 移动列表” ' 只返回改变检查情况的有效动作。

If computational time is not an issue then you could use a deterministic possible position generator to assign unique ids to a given position.

From a given position first generate number of possible positions in a deterministic manor, e.g. starting bottom left moving to top right. That determines how many bits you'll need for the next move, some situations it could be as little as one. Then when the move is made store just the unique id for that move.

Promotion and other rules simply count as valid moves as long as they are handled in a deterministic way, e.g. to queen, to rook, to bishop each count as a separate move.

The initial position is hardest and could generate around 250 million possible positions (I think) which would require about 28 bits plus an extra bit to determine whose move it is.

Assuming we know who's turn it is (each turn flips from white to black) the deterministic generator would look something like:

for each row
    for each column
        add to list ( get list of possible moves( current piece, players turn) )

'get list of possible moves' would do something like:

if current piece is not null 
    if current piece color is the same as the players turn
        switch( current piece type )
            king - return list of possible king moves( current piece )
            queen - return list of possible queen moves( current piece )
            rook - return list of possible rook moves( current piece )
            etc.

If the king is in check then each 'list of possible xxx moves' only returns valid moves that change the check situation.

回复收藏 0 原文

静待花开 2024-08-20 14:03:54

大多数答案都忽略了 3 倍重复。不幸的是，对于 3 倍重复，您必须存储到目前为止所玩的所有位置...

这个问题要求我们以节省空间的方式存储，因此我们实际上不需要存储位置，只要我们可以从移动列表中构造它即可（前提是我们有标准的起始位置）。我们可以优化 PGN，就这样我们就完成了。波纹管是一个简单的方案。

棋盘上有 64 个方格，64 = 2^ 6。如果我们只存储每次移动的初始和最终方格，则需要 12 位（稍后将解决升级问题）。请注意，该方案已经涵盖了玩家移动、强调、吃子、易位等；这些可以通过重放移动列表来构建。

为了提升，我们可以保留一个单独的向量数组，表示“在移动 N 时提升到棋子 XYZ”。我们可以保留一个 (int, byte) 向量。

优化 (To,From) 向量也是很诱人的，因为许多 (To,From) 向量在国际象棋中是不可能的。例如。不会有从 e1 到 d8 等的移动。但我想不出任何方案。任何进一步的想法都是非常受欢迎的。

回复收藏 0 原文

冰葑 2024-08-20 14:03:54

我已经考虑这个问题很长时间了（+- 2 小时）。并且没有明显的答案。

假设：

忽略时间状态（玩家过去没有时间限制，因此可以通过不玩来强制平局）
游戏是什么时候玩的？！？这很重要，因为规则随着时间的推移而变化（因此将在接下来的一点中假设现代游戏是现代游戏......）例如，请参考死亡典当规则（维基百科有一个非常著名的问题显示它），如果您想要的话回到过去，祝你好运，主教过去只能缓慢移动，并且过去常常使用骰子。哈哈。

...所以它是最新的现代规则。首先不考虑重复次数和移动重复次数限制。

-C 25 字节四舍五入 ( 64b+32*4b+5b= 325b)

=64 位（有/无）
+32*4 位 [ 1 位=颜色{黑/带}
+3bit=棋子类型{King,Queen,Bishop,kNight,Rook,Pawn,MovedPawn} 注意：移动的棋子...例如，如果它是上一回合中最后移动的棋子，则表明“en passant”是可行的。
]
+5bit 表示实际状态（轮到谁、过路人、双方是否有可能翻车）

到目前为止还不错。可能可以增强，但是需要考虑可变长度和促销！？

现在，以下规则仅适用于玩家申请平局的情况，这不是自动的！因此，如果没有玩家要求平局，那么考虑一下这 90 步没有吃子或棋子的移动是可行的！这意味着所有动作都需要记录......并且可用。

-D 位置的重复...例如上面提到的董事会状态（参见C）或不...（参见以下有关国际棋联规则的内容）
-E 这就留下了 50 移动余量的复杂问题，没有捕获或典当移动，那里需要计数器......但是。

那么你该如何处理呢？...好吧，确实没有办法。因为两个玩家都可能不想抽牌，或者意识到抽牌已经发生。
现在，如果 E 计数器可能就足够了......但这是技巧，甚至阅读 FIDE 规则（http://www.fide.com/component/handbook/?id=124&view=article）我不能找到答案...失去栖息能力怎么办？这是重复吗？我认为不是，但这是一个模糊的主题，没有得到解决，没有得到澄清。

所以这里有两个规则，即使尝试编码也是两个复杂或未定义的规则......干杯。

因此，真正对游戏进行编码的唯一方法是从头开始记录所有内容……这与“棋盘状态”问题冲突（或不冲突？）。

希望这有帮助...不要太多数学:-)只是为了表明有些问题并不那么容易，对于解释或预知知识来说太开放而无法正确和有效。我不会考虑采访他，因为这会带来太多麻烦。

I've thought about that one for a long time (+- 2hours). And there is no obvious answers.

Assuming:

Ignoring time state (A player didn't use to have a time limit therefore could force a draw by not playing)
When was the game played?!? It's important because the rules have changed over time (so will assume a modern game in the subsequent point a modern game...) Please refer to dead pawn rule for example (wikipedia has a very famous problem showing it), and if you want to go back in time good luck bishop used to only move slowly and dice used to be used. lol.

... so up to date modern rules it is. First irrespective of repetition and move repitition limit.

-C 25 bytes rounded ( 64b+32*4b+5b= 325b)

=64 bits (something/nothing)
+32*4 bits [ 1bit=color{black/withe}
+3bit=type of piece{King,Queen,Bishop,kNight,Rook,Pawn,MovedPawn} NB:Moved pawn... e.g if it was the last moved pawn in the previous turn indicating that an 'en passant' is feasible.
]
+5bit for the actual state (who's turn, en passant, possibility of rooking or not on each sides)

So far so good. Probably can be enhanced but then there would be variable length and promotion to take in consideration!?

Now, following rules are only applicable WHEN a player apllies for a draw, IT IS NOT automatic! So consider this 90 moves without a capture or a a pawn move is feasible if no player calls for a draw! Meaning that all moves need to be recorded... and available.

-D repetion of position... e.g. state of board as mentioned above (see C) or not... (see following regarding the FIDE rules)
-E That leaves the complex problem of 50 move allowance without capture or pawn move there a counter is necessary... However.

So how do you deal with that?... Well really there is no way. Because neither player may want to draw, or realize that it has occurred.
Now in case E a counter might suffice... but here is the trick and even reading the FIDE rules (http://www.fide.com/component/handbook/?id=124&view=article) I can't find an answer... what about loss of ability of rooking. Is that a repetition? I think not but then this is a blurred subject not addressed, not clarified.

So here is two rules that are two complex or undefined even to try to encode... Cheers.

So the only way to truly encode a game is to record all from start... which then conflict (or not?) with the "board state" question.

Hope this help... not too much math :-) Just to show that some question are not as easy, too open for interpretation or pre-knowledge to be a correct and efficient. Not one I would consider for interviewing as it open too much of a can of worm.

回复收藏 0 原文

愁杀 2024-08-20 14:03:54

Yacoby 解决方案中起始位置的可能改进

没有任何合法位置的每种颜色超过 16 块。在 64 个方格上最多放置 16 个黑色棋子和 16 个白色棋子的方法数量约为 3.63e27。 Log2(3.63e27)=91.55。这意味着您可以用 92 位对所有块的位置和颜色进行编码。这小于 Yacoby 解决方案所需的 64 位位置 + 最多 32 位颜色。在最坏的情况下，您可以节省 4 位，但代价是编码相当复杂。

另一方面，它会增加缺少 5 个或更多棋子的位置的大小。这些位置仅占所有位置的 4% 以下，但它们可能是您想要记录与初始位置不同的起始位置的大多数情况。

这就是完整的解决方案

根据上述方法对棋子的位置和颜色进行编码。 92 位。
要指定每个部分的类型，请使用霍夫曼代码：
兵：'0'，车：'100'，马：'101'，主教：'110'，后：'1110'，国王：'1111'。
整套片段需要 (16*1 + 12*3 + 4*4) = 68 位。
整个板位置可以用 92 + 68 = 最大 160 位进行编码。
应该添加额外的游戏状态：
转：1 位，哪个易位是可能的：4 位，“en passant”可能：最多 4 位（1 位表示是这种情况，3 位表示是哪一种）。
起始位置编码为 = 160 + 9 = 169 位
对于移动列表，枚举给定位置的所有可能的移动并将移动的位置存储在列表中。行动列表包括所有特殊情况（易位、过路和辞职）。仅使用必要数量的位来存储最高位置。平均来说，每个棋步不应超过 7 位（平均每棋 16 个可能棋子和 8 个合法棋步）。在某些情况下，当强制移动时，只需要 1 位（移动或放弃）。

回复收藏 0 原文

沉溺在你眼里的海 2024-08-20 14:03:54

棋盘上有 32 个棋子。每块棋子都有一个位置（64 个方格中的一个）。
所以你只需要 32 个正整数。

我知道 64 个位置保存在 6 位中，但我不会这样做。我会保留最后的一些标志（掉落的棋子，皇后的棋子）

回复收藏 0 原文

夏至、离别 2024-08-20 14:03:54

cletus' 的答案很好，但他忘了编码轮到谁了。它是当前状态的一部分，如果您使用该状态来驱动搜索算法（如 alpha-beta 导数），则它是必需的。

我不是国际象棋棋手，但我相信还有一个特殊情况：重复了多少步棋。一旦每个玩家都做出相同的动作三次，游戏就平局了，不是吗？如果是这样，那么您需要将该信息保存在状态中，因为在第三次重复之后，状态现在已终止。

回复收藏 0 原文

撩起发的微风 2024-08-20 14:03:54

正如其他几个人提到的，对于 32 个棋子中的每一个，您都可以存储它们所在的方格以及它们是否在棋盘上，这给出了 32 * (log2(64) + 1) = 224 位。

然而，主教只能占据黑色或白色方块，因此对于这些方块，您只需要 log2(32) 位作为该位置，即 28 * 7 + 4 * 6 = 220 位。

并且由于 pawn 不是从后面开始只能向前移动，因此它们只能在 56 上，因此应该可以使用此限制来减少 pawn 所需的位数。

回复收藏 0 原文

烟燃烟灭 2024-08-20 14:03:54

棋盘有 64 个方格，可以用 64 位来表示，显示方格是否为空。如果一个方格有棋子，我们只需要棋子信息。由于玩家 + 棋子占用 4 位（如前所述），我们可以用 64 + 4 * 32 = 192 位获取当前状态。加上当前回合，您就有 193 位。

然而，我们还需要对每个棋子的合法动作进行编码。首先，我们计算每个棋子的合法移动次数，并将这些位数附加在整个正方形的棋子标识符后面。我计算如下：

Pawn：前进，先转二前进，en passant * 2，晋级=7位。您可以将第一次前转和升级合并为一个位，因为它们不能在同一位置发生，因此您有 6 个。
Rook：7 个垂直方格，7 个水平方格 = 14 位
马：8 个方格 = 8 位
Bishop：2 对角线 * 7 = 14 位
Queen：7 个垂直，7 个水平，7 个对角线，7 个对角线 = 28 位
King：8 个周围的方块

这仍然意味着您需要根据当前位置来映射目标方块，但这（应该）是一个简单的计算。

由于我们有 16 个棋子、4 个车/马/象和 2 个后/王，所以这是 16 * 6 + 4 * 14 + 4 * 8 + 4 * 14 + 2 * 28 + 2 * 8 = 312 个比特，带来总共 505 位。

至于每件可能的移动所需的位数，可以添加一些优化，并且位数可能会减少，我只是使用简单的数字来处理。例如，对于滑动件，您可以存储它们可以移动的距离，但这需要额外的计算。

长话短说：仅在一个方格被占据时才存储额外的数据（棋子等），并且仅存储每个棋子代表其合法移动的最小位数。

编辑1：忘记了对任何棋子的易位和典当升级。这可以使显式位置总数达到 557 步（棋子多 3 位，王多 2 位）

回复收藏 0 原文

余罪 2024-08-20 14:03:54

每个棋子可以用 4 位表示（兵到王，6 种类型），黑/白 = 12 个值

棋盘上的每个方格可以用 6 位表示（x 坐标，y 坐标）。

初始位置最多需要 320 位（32 个棋子，4 + 6 位）。

后续的每次移动可以用 16 位表示（起始位置、结束位置、棋子）。

易位需要额外的 16 位，因为它是双重移动。

皇后棋子可以由 4 位中的 4 个备用值之一表示。

无需详细计算，与存储 32 * 7 位（预定义的棋子数组）或 64 * 4 位（预定义的方格分配）相比，这在第一次移动后开始节省空间。

两侧 10 次移动后，最大空间所需的是 640 位

...但话又说回来，如果我们唯一地标识每个棋子（5 位）并添加第六位来标记皇后棋子，那么我们每次移动只需要棋子 id + 到位置。这会将计算更改为...

初始位置 = 最大 384 位（32 块，6 + 6 位）
每次移动 = 12 位（目标位置，棋子 ID）

然后每侧移动 10 次后，所需的最大空间为 624 位

回复收藏 0 原文

御弟哥哥 2024-08-20 14:03:54

和 Robert G 一样，我倾向于使用 PGN，因为它是标准的并且可以被多种工具使用。

然而，如果我正在玩一个在遥远的太空探测器上的国际象棋人工智能，因此每一点都很珍贵，这就是我会做的动作。稍后我将重点讨论对初始状态的编码。

这些动作不需要记录状态；解码器可以跟踪状态以及在任何给定点什么动作是合法的。需要记录的所有动作是选择各种合法替代方案中的哪一个。由于玩家轮流进行，因此移动不需要记录玩家颜色。由于玩家只能移动自己的颜色棋子，因此第一个选择是玩家移动哪个棋子（稍后我将回到使用另一个选择的替代方案）。最多 16 块，最多需要 4 位。当玩家失去棋子时，选择的数量就会减少。此外，特定的游戏状态可能会限制棋子的选择。如果国王无法在不受到制止的情况下移动，则选择数量会减少一个。如果国王处于受制状态，任何无法让国王摆脱控制的棋子都不是可行的选择。从 a1 开始（h1 在 a2 之前）对行主顺序中的片段进行编号。

一旦指定了该件，它就只有一定数量的合法目的地。确切的数字很大程度上取决于棋盘布局和游戏历史，但我们可以计算出某些最大值和预期值。对于除了马之外的所有人以及在易位期间，一个棋子不能穿过另一个棋子。这将是移动限制的一个重要来源，但很难量化。棋子无法离开棋盘，这也将在很大程度上限制目的地的数量。

我们通过按以下顺序沿线对方块进行编号来编码大多数块的目的地：W、NW、N、NE（黑边是 N）。一条线从给定方向最远的方格开始，并朝该方向前进。对于无阻碍的国王，移动列表为 W、E、NW、SE、N、S、NE、SW。对于马，我们从2W1N开始，顺时针方向进行；目的地 0 是此顺序中的第一个有效目的地。

棋子：未移动的棋子有 2 个目的地选择，因此需要 1 位。如果一个 pawn 可以捕获另一个，无论是正常的还是路过的（解码器可以确定，因为它跟踪状态），它也有 2 或 3 个移动选择。除此之外，棋子只能有 1 个选择，不需要任何位。当棋子处于第 7 等级时，我们还会添加升级选择。由于棋子通常会晋升为皇后，然后是骑士，因此我们将选择编码为：
- 女王：0
- 骑士：10
- 主教：110
- 车：111
主教：在 {d,e}{4,5} 时最多 13 个目的地（4 位）。
Rook：最多 14 个目的地，4 位。
骑士：最多8个目的地，3位。
国王：当可以选择易车时，国王会回到S并且不能向下移动；总共有 7 个目的地。其余时间，国王最多有 8 步，最多给出 3 位。
后：与象或车的选择相同，总共 27 个选择，即 5 位

由于选择的数量并不总是 2 的幂，所以上面仍然浪费位。假设选项数量为 C 并且特定选项编号为 c，并且 n = ceil(lg(C >)) （对选择进行编码所需的位数）。我们通过检查下一个选择的第一位来利用这些否则被浪费的值。如果为 0，则不执行任何操作。如果它是 1 且 c+C c 2ⁿ，然后将 C 添加到 c。解码数字则相反：如果接收到的 c >= C，则减去 C 并将下一个数字的第一位设置为 1。如果c< 2ⁿ-C，然后将下一个数字的第一位设置为 0。如果 2ⁿ-C <=c< C，然后什么也不做。将此方案称为“饱和度”。

可能会缩短编码的另一种潜在选择类型是选择要捕获的对手棋子。这增加了移动第一部分的选择数量，选择一个棋子，最多增加一点（确切的数量取决于当前玩家可以移动的棋子数量）。这个选择之后是选择捕获棋子，这可能比任何给定玩家棋子的移动次数要少得多。一个棋子只能被任一基本方向的棋子加上骑士攻击，总共最多有10个攻击棋子；这为捕获移动提供了总共最多 9 位，但我预计平均为 7 位。这对于女王的捕获特别有利，因为它通常有很多合法的目的地。

随着饱和，捕获编码可能不再具有优势。我们可以允许这两个选项，指定在初始状态下使用的选项。如果不使用饱和度，游戏编码还可以使用未使用的选择数字 (C <= c <2ⁿ) 来更改游戏过程中的选项。任何时候C都是2的幂，我们就无法改变选项。

Like Robert G, I'd tend to use PGN since it's standard and can be used by a wide range of tools.

If, however, I'm playing a chess AI that's on a distant space probe, and thus every bit is precious, this is what I'd do for the moves. I'll come bock to encoding the initial state later.

The moves don't need to record state; the decoder can take keep track of state as well as what moves are legal at any given point. All the moves need to record is which of the various legal alternatives is chosen. Since players alternate, a move doesn't need to record player color. Since a player can only move their own color pieces, the first choice is which piece the player moves (I'll come back to an alternative that uses another choice later). With at most 16 pieces, this requires at most 4 bits. As a player loses pieces, the number of choices decreases. Also, a particular game state may limit the choice of pieces. If a king can't move without placing itself in check, the number of choices is reduced by one. If a king is in check, any piece that can't get the king out of check isn't a viable choice. Number the pieces in row major order starting at a1 (h1 comes before a2).

Once the piece is specified, it will only have a certain number of legal destinations. The exact number is highly dependent on board layout and game history, but we can figure out certain maximums and expected values. For all but the knight and during castling, a piece can't move through another piece. This will be a big source of move-limits, but it's hard to quantify. A piece can't move off of the board, which will also largely limit the number of destinations.

We encode the destination of most pieces by numbering squares along lines in the following order: W, NW, N, NE (black side is N). A line starts in the square furthest in the given direction that's legal to move to and proceeds towards the. For an unencumbered king, the list of moves is W, E, NW, SE, N, S, NE, SW. For the knight, we start with 2W1N and proceed clockwise; destination 0 is the first valid destination in this order.

Pawns: An unmoved pawn has 2 choices of destinations, thus requiring 1 bit. If a pawn can capture another, either normally or en passant (which the decoder can determine, since it's keeping track of state), it also has 2 or 3 choices of moves. Other than that, a pawn can only have 1 choice, requiring no bits. When a pawn is in its 7^th rank, we also tack on the promotion choice. Since pawns are usually promoted to queens, followed by knights, we encode the choices as:
- queen: 0
- knight: 10
- bishop: 110
- rook: 111
Bishop: at most 13 destinations when at {d,e}{4,5} for 4 bits.
Rook: at most 14 destinations, 4 bits.
Knights: at most 8 destinations, 3 bits.
Kings: When castling is an option, the king has it's back to S and can't move downwards; this gives a total of 7 destinations. The rest of the time, a king has at most 8 moves, giving a maximum of 3 bits.
Queen: Same as choices for bishop or rook, for a total of 27 choices, which is 5 bits

Since the number of choices isn't always a power of two, the above still wastes bits. Suppose the number of choices is C and the particular choice is numbered c, and n = ceil(lg(C)) (the number of bits required to encode the choice). We make use of these otherwise wasted values by examining the first bit of the next choice. If it's 0, do nothing. If it's 1 and c+C < 2ⁿ, then add C to c. Decoding a number reverses this: if the received c >= C, subtract C and set the first bit for the next number to 1. If c < 2ⁿ-C, then set the first bit for the next number to 0. If 2ⁿ-C <= c < C, then do nothing. Call this scheme "saturation".

Another potential type of choice that might shorten the encoding is to choose an opponents piece to capture. This increases the number of choices for the first part of a move, picking a piece, for at most an additional bit (the exact number depends on how many pieces the current player can move). This choice is followed by a choice of capturing piece, which is probably much smaller than the number of moves for any of the given players pieces. A piece can only be attacked by one piece from any cardinal direction plus the knights for a total of at most 10 attacking pieces; this gives a total maximum of 9 bits for a capture move, though I'd expect 7 bits on average. This would be particularly advantageous for captures by the queen, since it will often have quite a few legal destinations.

With saturation, capture-encoding probably doesn't afford an advantage. We could allow for both options, specifying in the initial state which are used. If saturation isn't used, the game encoding could also use unused choice numbers (C <= c < 2ⁿ) to change options during the game. Any time C is a power of two, we couldn't change options.

回复收藏 0 原文

北斗星光 2024-08-20 14:03:54

托马斯有正确的方法来对棋盘进行编码。然而，这应该与 ralu 存储动作的方法结合起来。列出所有可能的移动，写出表达该数字所需的位数。由于解码器正在执行相同的计算，因此它知道有多少可能并且可以知道要读取多少位，因此不需要长度代码。

因此，我们得到 164 位的棋子，4 位的易位信息（假设我们存储的是游戏的一个片段，否则可以重建），3 位的过路资格信息——简单地存储移动发生的列（如果无法通过，则在不可能的地方存储一列（此类列必须存在），并且 1 表示要移动的人。

移动通常需要 5 或 6 位，但可以从 1 到 8 不等。

一个额外的快捷方式 - 如果编码以 12 1 位开始（一种无效的情况 - 甚至一个片段的一侧也不会有两个国王），您将中止解码、擦拭棋盘并设置新游戏。下一位将是移动位。

回复收藏 0 原文

听风吹 2024-08-20 14:03:54

算法应确定性地枚举每次移动的所有可能目的地。目的地数量：

2 个主教，每个 13 个目的地 = 26
2 个车，每个 14 个目的地 = 28
2 个骑士，每个 8 个目的地 = 16 个
皇后，27 个目的地
国王，8 个目的地

8 个爪子在最坏（枚举方面）的情况下都可能成为皇后，从而使最大数量的可能目的地
9*27+26+28+16+8=321。因此，任何移动的所有目的地都可以通过 9 位数字来枚举。

双方的最大步数都是100（如果我没记错的话，不是棋手）。因此任何游戏都可以用 900 位来记录。加上每个片段的初始位置可以使用 6 位数字来记录，总共 32*6 =192 位。再加上一位“谁先动”的记录。因此，任何游戏都可以使用900+192+1=1093位来记录。

回复收藏 0 原文

情归归情 2024-08-20 14:03:54

存储棋盘状态

我想到的最简单的方法是首先有一个 8*8 位的数组来表示每个棋子的位置（所以如果那里有棋子则为 1，如果没有则为 0' t）。由于这是固定长度，我们不需要任何终止符。

接下来按位置顺序表示每个棋子。每块使用 4 位，这需要 32*4 位（总共 128 位）。这真的很浪费。

使用二叉树，我们可以用一个字节表示棋子，用 3 表示马、车和象，用 4 表示国王和王后。由于我们还需要存储棋子的颜色，因此需要一个额外的字节作为（如果这是错误的，请原谅我，我之前从未详细研究过霍夫曼编码 ):

Pawn : 2
Rook: 4
Knight: 4
Bishop: 4
King: 5
Queen: 5

鉴于总数：

2*16 + 4*4 + 4*4 + 4*4 + 2*5 + 2*5 = 100

使用固定大小的位组击败 28 位。

所以我发现的最好方法是将其存储在 8² + 100 位数组中

8*8 + 100 = 164

存储动作
我们需要知道的第一件事是哪一块正在移动到哪里。考虑到棋盘上最多有 32 个棋子，并且我们知道每个棋子是什么，而不是用一个代表正方形的整数，我们可以用一个整数来表示棋子的偏移量，这意味着我们只需要拟合 32 个可能的值来表示一个棋子。片。

不幸的是，有各种特殊规则，例如王位继承或推翻国王并组建共和国（特里普拉切特参考），因此在我们存储要移动的棋子之前，我们需要一个位来指示它是否是特殊的移动。

因此，对于每个正常的移动，我们都有必要的 1 + 5 = 6 位。（1 位类型，5 位为棋子）

棋子编号解码后，我们就知道了棋子的类型，并且每个棋子应该以最有效的方式表示其走法。例如（如果我的国际象棋规则符合标准）一个棋子总共有 4 种可能的走法（向左走、向右走、向前移动一处、向前移动两处）。
因此，为了表示棋子的移动，我们需要“6 + 2 = 8”位。（6 位用于初始移动头，2 位用于移动）

皇后的移动会更复杂，因为最好有一个方向（8 个可能的方向，因此 3 位）和总共 8 个可能的方格向每个方向移动（因此另外 3 位）。因此，要表示移动皇后，需要 6 + 3 + 3 = 12 位。

我想到的最后一件事是我们需要存储轮到哪些玩家。这应该是一个位（下一步移动白色或黑色）

结果格式
因此文件格式看起来像这样

[64 位] 初始片段位置
[最多 100 位] 初始片段
[1 位] 玩家回合
[n 位] 移动

移动位置
[1 位] 移动类型（特殊或普通）
[n 位] 移动详细信息

如果移动是正常移动，则移动详细信息看起来类似于
[5 位] 片
[n 位] 特定棋子走法（通常在 2 到 6 位范围内）

如果是特殊走法
它应该有一个整数类型，然后是任何附加信息（例如是否是王位转换）。我不记得特殊动作的数量，所以只表明这是一个特殊动作可能就可以了（如果只有一个）

Storing the board state

The simplest way I thought of is too first have a array of 8*8 bits representing the location of each piece (So 1 if there is a chess piece there and 0 if there isn't). As this is a fixed length we don't need any terminator.

Next represent every chess piece in order of its location. Using 4 bits per piece, this takes 32*4 bits (128 in total). Which is really really wasteful.

Using a binary tree, we can represent a pawn in one byte, a knight and rook and bishop in 3 and a king and queen in 4. As we also need to store the color of the piece, which takes an extra byte it ends up as (forgive me if this is wrong, I have never looked at Huffman coding in any detail before):

Pawn : 2
Rook: 4
Knight: 4
Bishop: 4
King: 5
Queen: 5

Given the totals:

2*16 + 4*4 + 4*4 + 4*4 + 2*5 + 2*5 = 100

Which beats using a fixed size set of bits by 28 bits.

So the best method I have found is to store it in a 8² + 100 bit array

8*8 + 100 = 164

Storing Moves
The first thing we need to know with is which piece is moving to where. Given that there are at maximum 32 pieces on the board and we know what each piece is, rather than an integer representing the square, we can have an integer representing the piece offset, which means we only have to fit 32 possible values to represent a piece.

Unfortunately there are various special rules, like castling or overthrowing the king and forming a republic (Terry Pratchett reference), so before we store the piece to move we need a single bit indicating if it is a special move or not.

So for each normal move we have a necessary 1 + 5 = 6 bits. (1 bit type, 5 bits for the piece)

After the piece number has been decoded, we then know the type of piece, and each piece should represent its move in the most efficient way. For example (If my chess rules are up to scratch) a pawn has a total of 4 possible moves (take left, take right, move one forward, move two forward).
So to represent a pawn move we need '6 + 2 = 8' bits. (6 bit for initial move header, 2 bits for what move)

Moving for the queen would be more complex, in that it would be best to have a direction (8 possible directions, so 3 bits) and a total of 8 possible squares to move to for each direction (so another 3 bits). So to represent moving a queen it would require 6 + 3 + 3 = 12 bits.

The last thing that occurrences to me is that we need to store which players turn it is. This should be a single bit (White or black to move next)

Resultant Format
So the file format would look something like this

[64 bits] Initial piece locations
[100 bits max] Initial pieces
[1 bit] Player turn
[n bits] Moves

Where a Move is
[1 bit] Move type (special or normal)
[n bits] Move Detail

If the Move is a normal move, the Move Detail looks something like
[5 bits] piece
[n bits] specific piece move (usually in the range of 2 to 6 bits]

If it is a special move
It should have an integer type and then any additional information (like if it is castling). I cannot remember the number of special moves, so it may be OK just to indicate that it is a special move (if there is only one)

回复收藏 0 原文

孤独患者 2024-08-20 14:03:54

在初始棋盘加上后续走势的基本情况下，请考虑以下事项。

使用国际象棋程序为所有可能的走法分配概率。例如，e2-e4 为 40%，d2-d4 为 20%，依此类推。如果某些动作是合法的但不被该程序考虑，则给它们一些较低的概率。使用算术编码来保存所采取的选择，第一次移动将是 0 到 0.4 之间的某个数字，第二次移动将是 0.4 和 0.6 之间的某个数字，依此类推。

对另一侧做同样的事情。例如，如果 e7-e5 有 50% 的机会作为 e2-e4 的响应，则编码数将在 0 到 0.2 之间。重复直到游戏结束。结果可能是一个非常小的范围。找到适合该范围的具有最小基数的二进制分数。这就是算术编码。

这比霍夫曼更好，因为它可以被认为是小数位编码（加上游戏结束时的一些四舍五入到整数位）。

结果应该比霍夫曼更紧凑，并且没有晋级、过路、50规则走法和其他细节的特殊情况，因为它们是由国际象棋评估程序处理的。

要重播，请再次使用国际象棋程序来评估棋盘并将所有概率分配给每个动作。使用算术编码值来确定实际下的哪一步棋。重复直到完成。

如果您的国际象棋程序足够好，您可以使用双态编码器获得更好的压缩，其中概率是根据黑棋和白棋的移动来定义的。在大约 200 多个状态的最极端情况下，这对所有可能的国际象棋游戏的整个集合进行编码，因此是不可行的。

这与 Darius 已经写的内容有很大不同的说法，只是通过一些算术编码如何工作的例子，以及使用现有国际象棋程序来帮助评估下一步棋的概率的真实例子。

回复收藏 0 原文

程序员难题：在整个游戏中对棋盘状态进行编码

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（30）

问题

棋盘状态与游戏状态

易位

En Passant

促销

僵局

当然我们还需要知道现在轮到

由于

简单内容

可变基本编码

可变宽度字母

霍夫曼编码

状态差异

棋子位置

组合方法

游戏状态

注释

代数表示法

结论

The Problem

Board State vs Game State

Castling

En Passant

Promotion

Stalemate

Whose Turn Is It?

Two Problems

Simple Contents

Variable Base Encoding

Variable Width Alphabets

Huffman Coding

State Difference

Piece Position

Combined Approaches

Game State

Annotations

Algebraic Notation

Conclusion

真正的大查找表方法

The really big lookup table approach

基本解决方案

进一步优化

Basic solution

Further optimizations

关于作者

相关话题

热门标签

推荐作者

Promise

qq_lbRlsh

待＂谢繁草

yy2010hell

漫无边际

傲娇萝莉攻

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。