在 Go 中解组 ISO-8859-1 XML 输入
当您的 XML 输入未采用 UTF-8 编码时,xml 包的 Unmarshal
函数似乎需要 CharsetReader
。
你在哪里可以找到这样的东西?
When your XML input isn't encoded in UTF-8, the Unmarshal
function of the xml package seems to require a CharsetReader
.
Where do you find such a thing ?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(7)
更新了 2015 年和 2015 年的答案超越:
Updated answer for 2015 & beyond:
扩展@anschel-schaffer-cohen的建议和@mjibson的评论,
使用上面提到的 go-charset 包,您可以使用这三行
来实现所需的结果。 调用,让
charset
知道其数据文件的位置只需记住在应用程序启动时的某个时刻
。 编辑
而不是上面的,
charset.CharsetDir =
等,只导入数据文件更明智。它们被视为嵌入式资源:go install
只会做它的事情,这也避免了部署的麻烦(在哪里/如何获取与执行应用程序相关的数据文件?)。使用带有下划线的 import 只是调用包的
init()
函数,该函数将所需的内容加载到内存中。Expanding on @anschel-schaffer-cohen suggestion and @mjibson's comment,
using the go-charset package as mentioned above allows you to use these three lines
to achieve the required result. just remember to let
charset
know where its data files are by callingat some point when the app starts up.
EDIT
Instead of the above,
charset.CharsetDir =
etc. it's more sensible to just import the data files. they are treated as an embedded resource:go install
will just do its thing, this also avoids the deployment headache (where/how do I get data files relative to the executing app?).using import with an underscore just calls the package's
init()
func which loads the required stuff into memory.下面是一个示例 Go 程序,它使用 CharsetReader 函数将 XML 输入从 ISO-8859-1 转换为 UTF-8。该程序打印测试文件 XML 注释。
使用
encoding="ISO-8859-1"
将 XML 从io.Reader
r
解组为结构结果
,在使用程序中的CharsetReader
函数从ISO-8859-1
转换为UTF-8
时,写入:Here's a sample Go program which uses a CharsetReader function to convert XML input from ISO-8859-1 to UTF-8. The program prints the test file XML comments.
To unmarshal XML with
encoding="ISO-8859-1"
from anio.Reader
r
into a structureresult
, while using theCharsetReader
function from the program to translate fromISO-8859-1
toUTF-8
, write:似乎有一个外部库可以处理此问题:
go-charset
< /a>.我自己没试过;它对你有用吗?There appears to be an external library which handles this:
go-charset
. I haven't tried it myself; does it work for you?编辑:不要使用这个,使用 go-charset 答案。
这是 @peterSO 代码的更新版本,适用于 go1:
调用方式:
Edit: do not use this, use the go-charset answer.
Here's an updated version of @peterSO's code that works with go1:
Called with:
目前 go 发行版中或我能找到的其他任何地方都没有提供任何内容。这并不奇怪,因为该钩子不到一个月 在撰写本文时。
由于 CharsetReader 被定义为
CharsetReader func(charset string, input io.Reader) (io.Reader, os.Error)
,因此您可以创建自己的。测试中有一个示例,但这可能不是对你来说绝对有用。
There aren't any provided in the go distribution at the moment, or anywhere else I can find. Not surprising as that hook is less than a month old at the time of writing.
Since a CharsetReader is defined as
CharsetReader func(charset string, input io.Reader) (io.Reader, os.Error)
, you could make your own.There's one example in the tests, but that might not be exactly useful to you.
接受的答案对我不起作用(可能是因为我收到的 XML 没有设置应有的
encoding="ISO-8859-1"
字段)。因此,我找到了一个仅从 ISO 格式转换为 UTF-8 的库,我的代码最终看起来像这样:
The accepted answer didn't work for me (probably because the XML I was receiving did not have the
encoding="ISO-8859-1"
field set as it should).So instead I found a library that just converts from the ISO format to the UTF-8, my code ended up looking something like this: