返回介绍

Backtracking

发布于 2025-02-27 23:45:45 字数 3292 浏览 0 评论 0 收藏 0

The regular expression /\b([01]+b|\d+|[\da-f]+h)\b/ matches either a binary number followed by a b, a regular decimal number with no suffix character, or a hexadecimal number (that is, base 16, with the letters a to f standing for the digits 10 to 15) followed by an h. This is the corresponding diagram:

Visualization of /\b([01]+b|\d+|[\da-f]+h)\b/

When matching this expression, it will often happen that the top (binary) branch is entered even though the input does not actually contain a binary number. When matching the string "103" , for example, it becomes clear only at the 3 that we are in the wrong branch. The string does match the expression, just not the branch we are currently in.

So the matcher backtracks. When entering a branch, it remembers its current position (in this case, at the start of the string, just past the first boundary box in the diagram) so that it can go back and try another branch if the current one does not work out. For the string "103" , after encountering the 3 character, it will start trying the branch for decimal numbers. This one matches, so a match is reported after all.

The matcher stops as soon as it finds a full match. This means that if multiple branches could potentially match a string, only the first one (ordered by where the branches appear in the regular expression) is used.

Backtracking also happens for repetition operators like + and * . If you match /^.*x/ against "abcxe" , the .* part will first try to consume the whole string. The engine will then realize that it needs an x to match the pattern. Since there is no x past the end of the string, the star operator tries to match one character less. But the matcher doesn’t find an x after abcx either, so it backtracks again, matching the star operator to just abc . Now it finds an x where it needs it and reports a successful match from positions 0 to 4.

It is possible to write regular expressions that will do a lot of backtracking. This problem occurs when a pattern can match a piece of input in many different ways. For example, if we get confused while writing a binary-number regular expression, we might accidentally write something like /([01]+)+b/ .

Visualization of /([01]+)+b/

If that tries to match some long series of zeros and ones with no trailing b character, the matcher will first go through the inner loop until it runs out of digits. Then it notices there is no b, so it backtracks one position, goes through the outer loop once, and gives up again, trying to backtrack out of the inner loop once more. It will continue to try every possible route through these two loops. This means the amount of work doubles with each additional character. For even just a few dozen characters, the resulting match will take practically forever.

This is a book about getting computers to do what you want them to do. Computers are about as common as screwdrivers today, but they contain a lot more hidden complexity and thus are harder to operate and understand. To many, they remain alien, slightly threatening things.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。
列表为空,暂无数据
    我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
    原文