带有外观的定界符的扫描仪在扫描仪附近不起作用。Buffer_Size界限

发布于 2025-01-17 18:08:50 字数 1295 浏览 3 评论 0原文

如果我使用包含lookBehind的定界符使用scanner,例如(?扫描仪的内部缓冲区以“ lim”开始,即使在源文本中,它之前是“ de”

public class Scanning {
    final static int SCANNER_BUFFER_SIZE = 1024 * 2;
    final static int OFFSET = 5;
    final static String DATA = "=".repeat(SCANNER_BUFFER_SIZE - OFFSET) + "delimdelim" + "=".repeat(10);

    public static void main(String[] args) {
        Scanner scanner = new Scanner(new StringReader(DATA));

        scanner.useDelimiter("(?<=de)lim");

        while(scanner.hasNext()) {
            System.out.println(scanner.next().replaceAll("=+", "="));
        }
    }
}

我认为data应该在这些定界符上分配:

===(...)===delimdelim==========
             ^~~  ^~~

因此,输出应为:

=de
de
=

但是,此输出(OpenJDK版本“ 17.0.2” 2022-01-18):

=de
limde
=

我可以在调试器中看到,当扫描仪将返回时“ limde”scanner.bufscanner.matcher.text包含“ limdelim ...”,所以我怀疑这是原因。如果我将offset更改为eg 37,我的预期行为会发生。

我在scanner模式的文档中找不到任何对此行为的引用,这是打算的吗?在scanner的定界符中使用Lougaround不安全吗?

If I use a Scanner with a delimiter that contains a lookbehind, like (?<=de)lim, the delimiter is not skipped when I design my input such that the internal buffer of the Scanner starts with "lim", even though in the source text it is preceded by "de":

public class Scanning {
    final static int SCANNER_BUFFER_SIZE = 1024 * 2;
    final static int OFFSET = 5;
    final static String DATA = "=".repeat(SCANNER_BUFFER_SIZE - OFFSET) + "delimdelim" + "=".repeat(10);

    public static void main(String[] args) {
        Scanner scanner = new Scanner(new StringReader(DATA));

        scanner.useDelimiter("(?<=de)lim");

        while(scanner.hasNext()) {
            System.out.println(scanner.next().replaceAll("=+", "="));
        }
    }
}

I think that DATA should be split on these delimiters:

===(...)===delimdelim==========
             ^~~  ^~~

And so the output should be:

=de
de
=

However, this outputs (openjdk version "17.0.2" 2022-01-18):

=de
limde
=

I can see in the debugger that when the scanner is about to return "limde", scanner.buf and scanner.matcher.text contain "limdelim...", so I suspect that it the cause. If I alter OFFSET to e.g. 3 or 7, my expected behavior occurs.

I could not find any reference to this behavior in the documentation of Scanner or Pattern, so is this intended? Is it not safe to use lookaround for the delimiter of a Scanner?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。
列表为空,暂无数据
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文