当前位置：文江博客话题详情

HTML 取消转义字符串时 Android 内存不足异常

发布于 2024-11-09 06:59:27 字数 370 浏览 0 评论 0原文

我必须 HTML 转义字符串（HTML 转义的 XML 文件（所以我的字符串中的所有 val 都是 & lt;TAG& gt;val& lt;/TAG& gt; 等等）大小 ~1,4MB，以便我可以在解析器中使用未转义的 XML）

我遇到的问题是，当我在使用时尝试获取未转义的字符串时，我总是遇到内存不足的异常StringEscapeUtils.unescapeHtml(String)（apache-commons-lang-2.6 库）。

我还尝试了基本 android api 的方法来对字符串进行转义，但除了速度慢之外，内存不足异常甚至在较小的字符串（~700kb）中发生。

有人可以建议我如何处理这样的字符串转换而不遇到内存不足异常吗？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

筑梦 2024-11-16 06:59:27

Java 有一些很好的核心工具可以让这件事变得非常简单。
下面的解决方案使用正则表达式来浏览您的内容并允许您替换字符。该解决方案确实需要做一些工作，因为您需要提供转义码。您可以在此处找到转义代码列表 [http://www.w3.org/TR/html4/sgml/entities.html][1] 或通过 Google 搜索其他代码。

下面是代码：

import java.util.regex.*;
import java.util.*;

public class HtmlUnescape {
    public static void main(String[] args){
        HashMap<String,String> codes = new HashMap<String,String>();
        codes.put("<", "<");
        codes.put(">", ">");
        codes.put(""", "\"");

        String html = "<html><head><title>Hello</title></head><body><h1>The great escape "example"</h1></body></html>";

        Matcher matcher = Pattern.compile("&#*\\w\\w\\w?\\w?;").matcher(html);
        StringBuffer matchBuffer = new StringBuffer();
        while(matcher.find()){
            matcher.appendReplacement(matchBuffer, codes.get(matcher.group()));
        }
        matcher.appendTail(matchBuffer);
        System.out.println (matchBuffer.toString());
    }
}

代码中发生了什么：

首先，哈希存储要取消转义的代码。
其次，变量 html 存储要处理的转义 HTML。
接下来，我们使用正则表达式来搜索并替换转义代码：
- Matcher.find(),
- Matcher.appendReplacement() 和
- Matcher.appendTail() 方法。

尝试一下。我对像您这样的大文件的性能没有任何了解。但是，代码非常简单，您可以对其进行调整以获得所需的性能。

Java has some good core facilities to get do this really simple.
The solution below uses regular expression to go through your content and allows you to replace the characters. This solution does require to do a little work in that you need to provide the escape codes. You can find a list of escape codes here [http://www.w3.org/TR/html4/sgml/entities.html][1] or Google the web for others.

Here is the code below:

import java.util.regex.*;
import java.util.*;

public class HtmlUnescape {
    public static void main(String[] args){
        HashMap<String,String> codes = new HashMap<String,String>();
        codes.put("<", "<");
        codes.put(">", ">");
        codes.put(""", "\"");

        String html = "<html><head><title>Hello</title></head><body><h1>The great escape "example"</h1></body></html>";

        Matcher matcher = Pattern.compile("&#*\\w\\w\\w?\\w?;").matcher(html);
        StringBuffer matchBuffer = new StringBuffer();
        while(matcher.find()){
            matcher.appendReplacement(matchBuffer, codes.get(matcher.group()));
        }
        matcher.appendTail(matchBuffer);
        System.out.println (matchBuffer.toString());
    }
}

What is going on in the code:

First, the hash stores the codes to unescape.
Second, variable html stores escape HTML to process.
Next, we use the regex expression to search and replace the escaped codes using:
- Matcher.find(),
- Matcher.appendReplacement(), and
- Matcher.appendTail() methods.

Try that. I have no insight on performance of large files such as yours. But, the code is simple enough to where you can tweak it to get the desired performance.

回复收藏 0 原文

~没有更多了~