取消转义并转换字符串编码

发布于 2025-01-02 19:57:57 字数 582 浏览 3 评论 0原文

我必须在 Java 中将 String 解析为 Date 对象。我得到的字符串遵循模式 MMM d yyyy HH:mm:ss z ，区域设置设置为 French。

由于法国口音的编码，当日期为二月、八月或十二月时，会出现此问题。例如，我得到dÃ©c。 15 2011 16:55:38 CET 2011 年 12 月 15 日。

我无法更改字符串的创建方式，因此我必须处理我这边的错误编码。似乎生成时字符串编码错误（UTF-8 内容编码为 ISO 8859-1），然后转义。

现在我使用：

stringFromXML = stringFromXML.replaceAll("&#195;&#169;", "é");
stringFromXML = stringFromXML.replaceAll("&#195;&#187;", "û");

它有效，因为法语月份中唯一的重音是 é 和 û 但有没有更干净的方法来取消转义和转换字符？

原文

I have to parse a String to a Date object in Java.
The string I get following the pattern MMM d yyyy HH:mm:ss z with locale set to French.

The problem occures when the date is in february, august or december due to encoding of french accents. For example, I get dÃ©c. 15 2011 16:55:38 CET for december 15th 2011.

I can't change the way the string is created so I have to deal with the bad encoding on my side. It seems that when generated the string is badly encoded (UTF-8 content encoded as ISO 8859-1) then escapde.

For now I use :

stringFromXML = stringFromXML.replaceAll("Ã©", "é");
stringFromXML = stringFromXML.replaceAll("Ã»", "û");

It works because the only accent in french month are é and û but is there a cleaner way to unescape and convert characters?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

梨涡少年 2025-01-09 19:57:57

您需要两个步骤：

解析数字字符引用，例如，按照 Andy 的建议使用 StringEscapeUtils：
```
字符串未转义 = StringEscapeUtils.unescapeHtml(in);
```

通过将字符视为 UTF-8 代码单元来修复编码：

String out = new String(unescaped.getBytes("ISO-8859-1"), "UTF-8");

You need two steps:

Resolve numeric character references, for example, using StringEscapeUtils as suggested by Andy:
```
String unescaped = StringEscapeUtils.unescapeHtml(in);
```

Fix encoding by treating characters as UTF-8 code units:

String out = new String(unescaped.getBytes("ISO-8859-1"), "UTF-8");

回复收藏 0 原文

黯然#的苍凉 2025-01-09 19:57:57

如果您不介意这种依赖性，您可以使用 Apache Commons StringEscapeUtils 来执行此操作。

来自 StringEscapeUtils.unescapeHtml 的 JavaDoc：

将包含实体的字符串转义为包含实体的字符串
与转义对应的实际 Unicode 字符。支持
HTML 4.0 实体。
例如，字符串“<Français>”将变成“”

它还应该适用于您输入中的数字实体。

回复收藏 0 原文

冷夜 2025-01-09 19:57:57

以防万一其他人正在寻找与我相同的解决方案。我试图解码从 okhttp (android) 请求中获得的字符，例如：
Ã 到 à

所以按照@axtavt的建议，我使用了 StringEscapeUtils，但为了做到这一点，我将此依赖项添加到了我的 gradle 中：

compile 'org.apache.commons:commons-lang3:3.4'

并通过以下方式修复了角色问题

return StringEscapeUtils.unescapeHtml3(word);

Just in case someone else is looking for the same solution as me. I was trying to decode characters that I got from okhttp (android) requests like:
Ã to Ã

So as suggested by @axtavt, I used StringEscapeUtils, but to do so I added this dependency to my gradle:

compile 'org.apache.commons:commons-lang3:3.4'

And fixed character issues by

return StringEscapeUtils.unescapeHtml3(word);

回复收藏 0 原文

~没有更多了~

关于作者

马蹄踏│碎落叶

暂无简介

文章

27 人气

关注发私信

友情链接

文江博客

取消转义并转换字符串编码

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（3）

关于作者

相关话题

热门标签

推荐作者

十二

飞烟轻若梦

OPleyuhuo

wxb0109

旧城空念

-小熊_

友情链接

取消转义并转换字符串编码

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（3）

关于作者

相关话题

热门标签

推荐作者

十二

飞烟轻若梦

OPleyuhuo

wxb0109

旧城空念

-小熊_

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。