如何使用java转换特殊字符?

发布于 2024-08-22 04:38:16 字数 207 浏览 9 评论 0原文

我有这样的字符串:

Avery® Laser & Inkjet Self-Adhesive

我需要将它们转换为

Avery Laser & Inkjet Self-Adhesive.

即删除特殊字符并将 html 特殊字符转换为常规字符。

I have strings like:

Avery® Laser & Inkjet Self-Adhesive

I need to convert them to

Avery Laser & Inkjet Self-Adhesive.

I.e. remove special characters and convert html special chars to regular ones.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

洋洋洒洒 2024-08-29 04:38:16
Avery® Laser & Inkjet Self-Adhesive

首先使用 StringEscapeUtils#unescapeHtml4() (或 #unescapeXml(),具体取决于原始格式)以取消转义 & 转换为 &。然后使用 String#replaceAll()[^\x20-\x7e] 删除不在可打印 ASCII 范围

总结:

String clean = StringEscapeUtils.unescapeHtml4(dirty).replaceAll("[^\\x20-\\x7e]", "");

..产生

Avery Laser & Inkjet Self-Adhesive

(没有像您的示例中那样的尾随点,但原始版本中不存在;))

也就是说,这看起来更像是对解决方法的请求,而不是请求到解决方案。如果您详细说明功能要求和/或该字符串的来源,我们也许能够提供正确的解决方案。 ® 看起来像是由于使用错误的编码读取字符串而导致的,而 & 看起来像是由于使用基于文本的解析器来读取字符串而导致的。读取字符串而不是使用成熟的 HTML 解析器。

Avery® Laser & Inkjet Self-Adhesive

First use StringEscapeUtils#unescapeHtml4() (or #unescapeXml(), depending on the original format) to unescape the & into a &. Then use String#replaceAll() with [^\x20-\x7e] to get rid of characters which aren't inside the printable ASCII range.

Summarized:

String clean = StringEscapeUtils.unescapeHtml4(dirty).replaceAll("[^\\x20-\\x7e]", "");

..which produces

Avery Laser & Inkjet Self-Adhesive

(without the trailing dot as in your example, but that wasn't present in the original ;) )

That said, this however look like more a request to workaround than a request to solution. If you elaborate more about the functional requirement and/or where this string did originate, we may be able to provide the right solution. The ® namely look like to be caused by using the wrong encoding to read the string in and the & look like to be caused by using a textbased parser to read the string in instead of a fullfledged HTML parser.

浅忆流年 2024-08-29 04:38:16

您可以使用 StringEscapeUtils 来自 Apache Commons Text 项目的类。

You can use the StringEscapeUtils class from Apache Commons Text project.

空心↖ 2024-08-29 04:38:16

也许你可以使用类似的东西:

yourTxt = yourTxt.replaceAll("&", "&");

在某些项目中我做了类似的事情:

public String replaceAcutesHTML(String str) {

str = str.replaceAll("á","á");
str = str.replaceAll("é","é");
str = str.replaceAll("í","í");
str = str.replaceAll("ó","ó");
str = str.replaceAll("ú","ú");
str = str.replaceAll("Á","Á");
str = str.replaceAll("É","É");
str = str.replaceAll("Í","Í");
str = str.replaceAll("Ó","Ó");
str = str.replaceAll("Ú","Ú");
str = str.replaceAll("ñ","ñ");
str = str.replaceAll("Ñ","Ñ");

return str;

}

Maybe you can use something like:

yourTxt = yourTxt.replaceAll("&", "&");

in some project I did something like:

public String replaceAcutesHTML(String str) {

str = str.replaceAll("á","á");
str = str.replaceAll("é","é");
str = str.replaceAll("í","í");
str = str.replaceAll("ó","ó");
str = str.replaceAll("ú","ú");
str = str.replaceAll("Á","Á");
str = str.replaceAll("É","É");
str = str.replaceAll("Í","Í");
str = str.replaceAll("Ó","Ó");
str = str.replaceAll("Ú","Ú");
str = str.replaceAll("ñ","ñ");
str = str.replaceAll("Ñ","Ñ");

return str;

}

优雅的叶子 2024-08-29 04:38:16

如果您想模仿 php 函数 htmlspecialchars_decode 使用 php 函数 get_html_translation_table() 转储表,然后使用 java 代码,例如,

    static Hashtable html_specialchars_table = new Hashtable();
    static {
            html_specialchars_table.put("<","<");
            html_specialchars_table.put(">",">");
            html_specialchars_table.put("&","&");
    }
    static String htmlspecialchars_decode_ENT_NOQUOTES(String s){
            Enumeration en = html_specialchars_table.keys();
            while(en.hasMoreElements()){
                    String key = (String)en.nextElement();
                    String val = (String)html_specialchars_table.get(key);
                    s = s.replaceAll(key, val);
            }
            return s;
    }

Incase you want to mimic what php function htmlspecialchars_decode does use php function get_html_translation_table() to dump the table and then use the java code like,

    static Hashtable html_specialchars_table = new Hashtable();
    static {
            html_specialchars_table.put("<","<");
            html_specialchars_table.put(">",">");
            html_specialchars_table.put("&","&");
    }
    static String htmlspecialchars_decode_ENT_NOQUOTES(String s){
            Enumeration en = html_specialchars_table.keys();
            while(en.hasMoreElements()){
                    String key = (String)en.nextElement();
                    String val = (String)html_specialchars_table.get(key);
                    s = s.replaceAll(key, val);
            }
            return s;
    }
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文