取消转义并转换字符串编码
我必须在 Java 中将 String 解析为 Date 对象。 我得到的字符串遵循模式 MMM d yyyy HH:mm:ss z
,区域设置设置为 French
。
由于法国口音的编码,当日期为二月、八月或十二月时,会出现此问题。例如,我得到déc。 15 2011 16:55:38 CET
2011 年 12 月 15 日。
我无法更改字符串的创建方式,因此我必须处理我这边的错误编码。似乎生成时字符串编码错误(UTF-8 内容编码为 ISO 8859-1),然后转义。
现在我使用:
stringFromXML = stringFromXML.replaceAll("é", "é");
stringFromXML = stringFromXML.replaceAll("û", "û");
它有效,因为法语月份中唯一的重音是 é
和 û
但有没有更干净的方法来取消转义和转换字符?
I have to parse a String to a Date object in Java.
The string I get following the pattern MMM d yyyy HH:mm:ss z
with locale set to French
.
The problem occures when the date is in february, august or december due to encoding of french accents. For example, I get déc. 15 2011 16:55:38 CET
for december 15th 2011.
I can't change the way the string is created so I have to deal with the bad encoding on my side. It seems that when generated the string is badly encoded (UTF-8 content encoded as ISO 8859-1) then escapde.
For now I use :
stringFromXML = stringFromXML.replaceAll("é", "é");
stringFromXML = stringFromXML.replaceAll("û", "û");
It works because the only accent in french month are é
and û
but is there a cleaner way to unescape and convert characters?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
您需要两个步骤:
解析数字字符引用,例如,按照 Andy 的建议使用
StringEscapeUtils
:通过将字符视为 UTF-8 代码单元来修复编码:
You need two steps:
Resolve numeric character references, for example, using
StringEscapeUtils
as suggested by Andy:Fix encoding by treating characters as UTF-8 code units:
如果您不介意这种依赖性,您可以使用 Apache Commons StringEscapeUtils 来执行此操作。
来自 StringEscapeUtils.unescapeHtml 的 JavaDoc:
它还应该适用于您输入中的数字实体。
You could use Apache Commons StringEscapeUtils to do this if you don't mind that dependency.
From the JavaDoc for StringEscapeUtils.unescapeHtml:
It should also work with numeric entities like you have in your input.
以防万一其他人正在寻找与我相同的解决方案。我试图解码从 okhttp (android) 请求中获得的字符,例如:
Ã
到à
所以按照@axtavt的建议,我使用了
StringEscapeUtils
,但为了做到这一点,我将此依赖项添加到了我的 gradle 中:并通过以下方式修复了角色问题
Just in case someone else is looking for the same solution as me. I was trying to decode characters that I got from okhttp (android) requests like:
Ã
toÃ
So as suggested by @axtavt, I used
StringEscapeUtils
, but to do so I added this dependency to my gradle:And fixed character issues by