如何在 Java/Scala 中跳过流中的无效字符?

发布于 2024-12-02 21:15:02 字数 398 浏览 3 评论 0原文

例如,我有以下代码

Source.fromFile(new File( path), "UTF-8").getLines()

,它抛出异常

Exception in thread "main" java.nio.charset.MalformedInputException: Input length = 1
    at java.nio.charset.CoderResult.throwException(CoderResult.java:260)
    at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:319)

我不关心是否未读取某些行,但如何跳过无效字符并继续读取行?

For example I have following code

Source.fromFile(new File( path), "UTF-8").getLines()

and it throws exception

Exception in thread "main" java.nio.charset.MalformedInputException: Input length = 1
    at java.nio.charset.CoderResult.throwException(CoderResult.java:260)
    at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:319)

I don't care if some lines were not read, but how to skip invalid chars and continue reading lines?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

迟月 2024-12-09 21:15:02

您可以通过调用 CharsetDecoder.onMalformedInput

通常您不会直接看到 CharsetDecoder 对象,因为它将在幕后为您创建。因此,如果您需要访问它,则需要使用允许您直接指定 CharsetDecoder(而不仅仅是编码名称或 Charset)的 API。

此类 API 最基本的示例是 InputStreamReader

InputStream in = ...;
CharsetDecoder decoder = StandardCharsets.UTF_8.newDecoder();
decoder.onMalformedInput(CodingErrorAction.IGNORE);
Reader reader = new InputStreamReader(in, decoder);

请注意,此代码使用 Java 7 类 StandardCharsets,对于早期版本,您只需将其替换为 <代码> Charset.forName("UTF-8") (或使用 Charsets 类< /a> 来自 番石榴)。

You can influence the way that the charset decoding handles invalid input by calling CharsetDecoder.onMalformedInput.

Usually you won't ever see a CharsetDecoder object directly, because it will be created behind the scenes for you. So if you need access to it, you'll need to use API that allows you to specify the CharsetDecoder directly (instead of just the encoding name or the Charset).

The most basic example of such API is the InputStreamReader:

InputStream in = ...;
CharsetDecoder decoder = StandardCharsets.UTF_8.newDecoder();
decoder.onMalformedInput(CodingErrorAction.IGNORE);
Reader reader = new InputStreamReader(in, decoder);

Note that this code uses the Java 7 class StandardCharsets, for earlier versions you can simply replace it with Charset.forName("UTF-8") (or use the Charsets class from Guava).

爱已欠费 2024-12-09 21:15:02

好吧,如果不是 UTF-8,那就是别的东西了。诀窍是找出其他内容是什么,但如果您想要的只是避免错误,则可以使用不包含无效代码的编码,例如 latin1

Source.fromFile(new File( path), "latin1").getLines()

Well, if it isn't UTF-8, it is something else. The trick is finding out what that something else is, but if all you want is avoid the errors, you can use an encoding that doesn't have invalid codes, such as latin1:

Source.fromFile(new File( path), "latin1").getLines()
始终不够爱げ你 2024-12-09 21:15:02

我遇到了类似的问题,Scala 的一个内置编解码器为我解决了这个问题:

Source.fromFile(new File(path))(Codec.ISO8859).getLines()

I had a similar issue, and one of Scala's built-in codecs did the trick for me:

Source.fromFile(new File(path))(Codec.ISO8859).getLines()
薄凉少年不暖心 2024-12-09 21:15:02

如果你想使用 Scala 避免无效字符,我发现这对我有用。

import java.nio.charset.CodingErrorAction
import scala.io._

object HelloWorld {

  def main(args: Array[String]) = {
    implicit val codec = Codec("UTF-8")

    codec.onMalformedInput(CodingErrorAction.REPLACE)
    codec.onUnmappableCharacter(CodingErrorAction.REPLACE)

    val dataSource = Source.fromURL("https://www.foo.com")

    for (line <- dataSource.getLines) {

      println(line)
    }
  }
}

If you want to avoid invalid characters using Scala, I found this worked for me.

import java.nio.charset.CodingErrorAction
import scala.io._

object HelloWorld {

  def main(args: Array[String]) = {
    implicit val codec = Codec("UTF-8")

    codec.onMalformedInput(CodingErrorAction.REPLACE)
    codec.onUnmappableCharacter(CodingErrorAction.REPLACE)

    val dataSource = Source.fromURL("https://www.foo.com")

    for (line <- dataSource.getLines) {

      println(line)
    }
  }
}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文