如何在 Java/Scala 中跳过流中的无效字符?
例如,我有以下代码
Source.fromFile(new File( path), "UTF-8").getLines()
,它抛出异常
Exception in thread "main" java.nio.charset.MalformedInputException: Input length = 1
at java.nio.charset.CoderResult.throwException(CoderResult.java:260)
at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:319)
我不关心是否未读取某些行,但如何跳过无效字符并继续读取行?
For example I have following code
Source.fromFile(new File( path), "UTF-8").getLines()
and it throws exception
Exception in thread "main" java.nio.charset.MalformedInputException: Input length = 1
at java.nio.charset.CoderResult.throwException(CoderResult.java:260)
at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:319)
I don't care if some lines were not read, but how to skip invalid chars and continue reading lines?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
您可以通过调用
CharsetDecoder.onMalformedInput
。通常您不会直接看到
CharsetDecoder
对象,因为它将在幕后为您创建。因此,如果您需要访问它,则需要使用允许您直接指定CharsetDecoder
(而不仅仅是编码名称或Charset
)的 API。此类 API 最基本的示例是
InputStreamReader
:请注意,此代码使用 Java 7 类
StandardCharsets
,对于早期版本,您只需将其替换为 <代码> Charset.forName("UTF-8") (或使用Charsets
类< /a> 来自 番石榴)。You can influence the way that the charset decoding handles invalid input by calling
CharsetDecoder.onMalformedInput
.Usually you won't ever see a
CharsetDecoder
object directly, because it will be created behind the scenes for you. So if you need access to it, you'll need to use API that allows you to specify theCharsetDecoder
directly (instead of just the encoding name or theCharset
).The most basic example of such API is the
InputStreamReader
:Note that this code uses the Java 7 class
StandardCharsets
, for earlier versions you can simply replace it withCharset.forName("UTF-8")
(or use theCharsets
class from Guava).好吧,如果不是 UTF-8,那就是别的东西了。诀窍是找出其他内容是什么,但如果您想要的只是避免错误,则可以使用不包含无效代码的编码,例如
latin1
:Well, if it isn't UTF-8, it is something else. The trick is finding out what that something else is, but if all you want is avoid the errors, you can use an encoding that doesn't have invalid codes, such as
latin1
:我遇到了类似的问题,Scala 的一个内置编解码器为我解决了这个问题:
I had a similar issue, and one of Scala's built-in codecs did the trick for me:
如果你想使用 Scala 避免无效字符,我发现这对我有用。
If you want to avoid invalid characters using Scala, I found this worked for me.