我正在使用 Html.fromHtml(STRING).toString() 将可能包含或不包含 html 和/或 html 实体的字符串转换为纯文本字符串。
这相当慢,我想我最后的计算是平均花费了大约 22 毫秒。对于大量的这些,它可以在一分钟内添加。因此,我正在寻找更快、性能更强的选项。
有没有办法加快速度或者有其他可用的解码选项?
编辑:由于似乎没有更快的内置方法或专门为性能而构建的方法,因此我将奖励任何可以为我指明图书馆方向的人,该图书馆
- :与 Android
- 兼容 许可免费使用
- 比
Html.fromHtml(String).toString();
更快 请注意,我已经使用此方法尝试过 Jsoup: Jsoup.parse(String).text ()
而且速度较慢。
I am using Html.fromHtml(STRING).toString() to convert a string that may or may not have html and/or html entities in it, to a plain text string.
This is pretty slow, I think my last calculation was that it took about 22ms on avg. With a large batch of these it can add over a minute. So I am looking for a faster, performance built option.
Is there anyway to speed this up or are there other decoding options available?
Edit: Since there doesn't appear to be a built in method that is faster or built for performance specifically, I will reward the bounty to anyone that can point me in the direction of a library that:
- Works well with Android
- Licensed for free use
- Faster than
Html.fromHtml(String).toString();
As a note, I already tried Jsoup with this method: Jsoup.parse(String).text()
and it was slower.
发布评论
评论(6)
org.apache.commons.lang.StringEscapeUtils 的 unescapeHtml()。该库可在 Apache 站点上找到。
(编辑:2019 年 6 月 - 有关该库的更新请参阅下面的评论)
What about org.apache.commons.lang.StringEscapeUtils's unescapeHtml(). The library is available on Apache site.
(EDIT: June 2019 - See the comments below for updates about the library)
fromHtml()
没有高性能的 HTML 解析器,而且我不知道SpannedString
上的toString()
实现有多快。我怀疑其中任何一个都是针对您的场景而设计的。理想情况下,字符串在到达低功耗手机之前是干净的。要么在构建过程中清理它们(对于资源/资产),要么在服务器上清理它们(在下载它们之前)。
如果出于某种原因,您绝对需要在设备上清理它们,那么您也许可以使用 NDK 创建一个 C/C++ 库来更快地为您进行清理。
fromHtml()
does not have a high-performance HTML parser, and I have no idea how quick thetoString()
implementation onSpannedString
is. I doubt either were designed for your scenario.Ideally, the strings are clean before they get to a low-power phone. Either clean them up in the build process (for resources/assets), or clean them up on a server (before you download them).
If, for whatever reason, you absolutely need to clean them up on the device, you can perhaps use the NDK to create a C/C++ library that does the cleaning for you faster.
这是一个非常快速且简单的选项:
Unbescape
它极大地提高了我们的解析性能,这需要每个字符串都通过解码器运行。
This is an incredibly fast and simple option:
Unbescape
It greatly improved our parsing performance which requires every string to be run through a decoder.
您是否看过从文本 JavaScript 中剥离 HTML
Have you looked at Strip HTML from Text JavaScript
任何解析都需要一些时间。 22ms 对我来说似乎很快。
不管怎样,你能在后台做吗?可以帮助你进行某种缓存吗?
Any parsing will take some time. 22ms seems to me like fast.
Anyway, can you do it in background? Can help you some kind of caching?
虽然我还没有尝试过,但我找到了一些可能的解决方案:
我希望它有帮助。
Although I have not tried them yet, I found some possible solutions:
I hope it helps.