扫描仪在大约 2400 个字符后截断我的字符串

发布于 2024-09-02 18:43:59 字数 368 浏览 6 评论 0原文

我有一些非常基本的代码,例如

while (scan.hasNextLine())
{
    String temp = scan.nextLine();
    System.out.println(temp);
}

scan 是文件上的扫描程序。

然而,在一个大约 6k 字符长的特定行上,temp 在大约 2470 个字符之后被切断。当它被切断时并没有什么特别的。它位于“澳大利亚”一词的中间。如果我从该行中删除字符,则删除的位置会发生变化;例如,如果我删除文件中的 0-100 个字符,那么扫描仪将获取之前的 100-2570 个字符。

我以前曾使用 Scanner 来处理较大的字符串。知道可能出了什么问题吗?

I've got some very basic code like

while (scan.hasNextLine())
{
    String temp = scan.nextLine();
    System.out.println(temp);
}

where scan is a Scanner over a file.

However, on one particular line, which is about 6k chars long, temp cuts out after something like 2470 characters. There's nothing special about when it cuts out; it's in the middle of the word "Australia." If I delete characters from the line, the place where it cuts out changes; e.g. if I delete characters 0-100 in the file then Scanner will get what was previously 100-2570.

I've used Scanner for larger strings before. Any idea what could be going wrong?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

我不是你的备胎 2024-09-09 18:43:59

据猜测,您可能在截止点遇到了流氓角色:在十六进制编辑器而不仅仅是文本编辑器中查看文件。也许字符串中间嵌入了一个空字符,或者可能是 \r ?在我看来,Scanner.nextLine() 不太可能随意截断它。

另一种想法是,您是否 100% 确定这都在那里?也许 System.out.println 正在切割字符串 - 再次由于其中嵌入了一些“奇怪”字符?如果打印 temp.length() 会发生什么?

编辑:我误解了如果你删除一些字符会发生什么。对此感到抱歉。其他一些需要检查的事情:

  • 如果您使用 BufferedReader.readLine() 而不是 Scanner 读取行,它是否会获取所有内容?
  • 您指定了正确的编码吗?我不明白为什么这会以这种特殊的方式出现,但这是值得考虑的事情......
  • 如果将行中的所有字符替换为“A”(在文件中),这会改变什么吗?
  • 如果您在此行之前添加额外的行(或删除之前的行),这会改变什么吗?

如果这一切都失败了,我只能调试到 Scanner.nextLine() - Java 的好处之一就是您可以调试到标准库。

At a guess, you may have a rogue character at the cut-off point: look at the file in a hex editor instead of just a text editor. Perhaps there's an embedded null character, or possibly \r in the middle of the string? It seems unlikely to me that Scanner.nextLine() would just chop it arbitrarily.

As another thought, are you 100% sure that it's not all there? Perhaps System.out.println is chopping the string - again due to some "odd" character embedded in it? What happens if you print temp.length()?

EDIT: I'd misinterpreted the bit about what happens if you cut out some characters. Sorry about that. A few other things to check:

  • If you read the lines with BufferedReader.readLine() instead of Scanner, does it get everything?
  • Are you specifying the right encoding? I can't see why this would show up in this particular way, but it's something to think about...
  • If you replace all the characters in the line with "A" (in the file) does that change anything?
  • If you add an extra line before this line (or remove a line before it) does that change anything?

Failing all of this, I'd just debug into Scanner.nextLine() - one of the nice things about Java is that you can debug into the standard libraries.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文