带有外观的定界符的扫描仪在扫描仪附近不起作用。Buffer_Size界限
如果我使用包含lookBehind的定界符使用scanner
,例如(?扫描仪的内部缓冲区以
“ lim”
开始,即使在源文本中,它之前是“ de”
:
public class Scanning {
final static int SCANNER_BUFFER_SIZE = 1024 * 2;
final static int OFFSET = 5;
final static String DATA = "=".repeat(SCANNER_BUFFER_SIZE - OFFSET) + "delimdelim" + "=".repeat(10);
public static void main(String[] args) {
Scanner scanner = new Scanner(new StringReader(DATA));
scanner.useDelimiter("(?<=de)lim");
while(scanner.hasNext()) {
System.out.println(scanner.next().replaceAll("=+", "="));
}
}
}
我认为data
应该在这些定界符上分配:
===(...)===delimdelim==========
^~~ ^~~
因此,输出应为:
=de
de
=
但是,此输出(OpenJDK版本“ 17.0.2” 2022-01-18):
=de
limde
=
我可以在调试器中看到,当扫描仪将返回时“ limde”
,scanner.buf
和scanner.matcher.text
包含“ limdelim ...”
,所以我怀疑这是原因。如果我将offset
更改为eg 3
或7
,我的预期行为会发生。
我在scanner
或模式
的文档中找不到任何对此行为的引用,这是打算的吗?在scanner
的定界符中使用Lougaround不安全吗?
If I use a Scanner
with a delimiter that contains a lookbehind, like (?<=de)lim
, the delimiter is not skipped when I design my input such that the internal buffer of the Scanner starts with "lim"
, even though in the source text it is preceded by "de"
:
public class Scanning {
final static int SCANNER_BUFFER_SIZE = 1024 * 2;
final static int OFFSET = 5;
final static String DATA = "=".repeat(SCANNER_BUFFER_SIZE - OFFSET) + "delimdelim" + "=".repeat(10);
public static void main(String[] args) {
Scanner scanner = new Scanner(new StringReader(DATA));
scanner.useDelimiter("(?<=de)lim");
while(scanner.hasNext()) {
System.out.println(scanner.next().replaceAll("=+", "="));
}
}
}
I think that DATA
should be split on these delimiters:
===(...)===delimdelim==========
^~~ ^~~
And so the output should be:
=de
de
=
However, this outputs (openjdk version "17.0.2" 2022-01-18):
=de
limde
=
I can see in the debugger that when the scanner is about to return "limde"
, scanner.buf
and scanner.matcher.text
contain "limdelim..."
, so I suspect that it the cause. If I alter OFFSET
to e.g. 3
or 7
, my expected behavior occurs.
I could not find any reference to this behavior in the documentation of Scanner
or Pattern
, so is this intended? Is it not safe to use lookaround for the delimiter of a Scanner
?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论