在 Java / C / Objective-C 中使用简单的通配符逻辑解析文本
我正在寻找一个快速的库/类来使用如下表达式解析纯文本:
Text is: Name:John
Age32< ;br>
模式为:{*}姓名:{%}
{*}年龄{%}
并且它会找到两个值:John
和 32
。 目的是在不涉及重型工具的情况下解析简单的 HTML 网页。它不应该在内部使用字符串操作或正则表达式,但可能会逐个字符进行解析。
I'm looking for a fast library/class to parse plain text using expressions like below:
Text is: <b>Name:</b>John<br><i>Age</i>32<br>
Pattern is: {*}Name:</b>{%}<br>{*}Age</i>{%}<br>
And it will find me two values: John
and 32
.
Intent is to parse simple HTML web pages without involving heavy duty tools. It should not be using string operations or regexps internally but probably do char by char parsing.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
由于您似乎要求用户指定您想要的 HTML 内容,因此在这里使用正则表达式可能没问题(为什么您讨厌它们?)。它不再是 HTML 解析,只是简单的文本匹配,这就是正则表达式的设计目的。
这是一个示例:
这会将您需要的内容留在捕获组中。
Since you appear to be asking the user to specify the HTML content you want, it's probably alright to use regular expressions here (why do you have an aversion to them?). It's not HTML parsing, anymore, just simple text matching, which is what regular expressions are designed for.
Here's an example:
Which will leave what you need in your capturing groups.
正则表达式替换会起作用。只需让它同时返回两个值,如“John%32”,然后拆分响应以获得两个单独的值。
A regex replacement would work. Just get it to return both values together like "John%32" and then split the response to get the two separate values.
在这里手动实现逐个字符解析确实没有任何优势,因为此类问题已经基本上解决了。
开发逐个字符的方法最终可能相当于手动实现上述两个选项之一,这并不是一件容易实现的事情。
There's really no advantage to character-by-character parsing manually implemented here, as such problems have been by and large solved for these types of problems.
Developing a character-by-character approach will probably end up being equivalent to manually implementing one of the above two options, which is not a trivial thing to implement.