使用 groovy 将 ISO-8859-1 转换为 UTF-8
我需要将 ISO-8859-1 文件转换为 utf-8 编码,而不丢失内容信息...
我有一个如下所示的文件:
<?xml version="1.0" encoding="ISO-8859-1" ?>
<HelloEncodingWorld>Üöäüßßß Test!!!</HelloEncodingWorld>
我不想将其编码为 UTF-8。 我尝试了以下操作:
f=new File('c:/temp/myiso88591.xml').getText('ISO-8859-1')
ts=new String(f.getBytes("UTF-8"), "UTF-8")
g=new File('c:/temp/myutf8.xml').write(ts)
由于字符串不兼容而不起作用。 然后我读了一些关于 bytestreamreaders/writers/streamingmarkupbuilder 等的内容...
然后我尝试
f=new File('c:/temp/myiso88591.xml').getText('ISO-8859-1')
mb = new groovy.xml.StreamingMarkupBuilder()
mb.encoding = "UTF-8"
new OutputStreamWriter(new FileOutputStream('c:/temp/myutf8.xml'),'utf-8') << mb.bind {
mkp.xmlDeclaration()
out << f
}
这完全不是我想要的..
我只想获取使用 ISO-8859-1 阅读器读取的 xml 内容,然后将它到一个新的(旧的)文件中...为什么这么复杂:-/
结果应该是这样,并且文件应该真正用 utf-8 编码:
<?xml version="1.0" encoding="UTF-8" ?>
<HelloEncodingWorld>Üöäüßßß Test!!!</HelloEncodingWorld>
感谢您的任何答案 干杯
i need to convert a ISO-8859-1 file to utf-8 encoding, without loosing content intormations...
i have a file which looks like this:
<?xml version="1.0" encoding="ISO-8859-1" ?>
<HelloEncodingWorld>Üöäüßßß Test!!!</HelloEncodingWorld>
Not i want to encode it into UTF-8.
I tried following:
f=new File('c:/temp/myiso88591.xml').getText('ISO-8859-1')
ts=new String(f.getBytes("UTF-8"), "UTF-8")
g=new File('c:/temp/myutf8.xml').write(ts)
didnt work due to String incompatibilities.
Then i read something about bytestreamreaders/writers/streamingmarkupbuilder and other...
then i tried
f=new File('c:/temp/myiso88591.xml').getText('ISO-8859-1')
mb = new groovy.xml.StreamingMarkupBuilder()
mb.encoding = "UTF-8"
new OutputStreamWriter(new FileOutputStream('c:/temp/myutf8.xml'),'utf-8') << mb.bind {
mkp.xmlDeclaration()
out << f
}
this was totally not that what i wanted..
I just want to get the content of an xml read with an ISO-8859-1 reader and then put it into a new (old) file... why this is so complicated :-/
The result should just be, and the file should be really encoded in utf-8:
<?xml version="1.0" encoding="UTF-8" ?>
<HelloEncodingWorld>Üöäüßßß Test!!!</HelloEncodingWorld>
Thanks for any answers
Cheers
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
(我刚刚尝试了一下,它有效:-)
与 java 中的相同:库为您进行转换...
正如 deceze 所说:当您指定编码时,它将被转换为内部格式(utf-16 afaik)。当您在写入字符串时指定其他编码时,它将转换为该编码。
但如果您使用 XML,则无论如何都不必担心编码,因为 XML 解析器会处理它。它将读取第一个字符
并根据这些字符确定基本编码。之后,它就可以从 xml 标头读取编码信息并使用它。
(I just gave it a try, it works :-)
same as in java: the libraries do the conversion for you...
as deceze said: when you specify an encoding, it will be converted to an internal format (utf-16 afaik). When you specify another encoding when you write the string, it will be converted to this encoding.
But if you work with XML, you shouldn't have to worry about the encoding anyway because the XML parser will take care of it. It will read the first characters
<?xml
and determines the basic encoding from those characters. After that, it is able to read the encoding information from your xml header and use this.使其更加 Groovy 一点,并且不需要整个文件适合内存,您可以使用读取器和写入器来流式传输文件。当我的文件对于普通的旧 Unix
iconv(1)
来说太大时,这是我的解决方案。Making it a little more Groovy, and not requiring the whole file to fit in memory, you can use the readers and writers to stream the file. This was my solution when I had files too big for plain old Unix
iconv(1)
.