使用antlr重写url
我的java程序需要重写html中的url(及时)。我正在寻找合适的工具,想知道 Antlr 是否适合我?
例如:
<html><body> <img src="foo.jpg" /> </body></html>
应该重写为:
<html><body> <img src="http://foo.com/foo.jpg" /> </body></html>
我想从流中读取/写入流(逐字节)。
My java program needs to rewrite urls in html (just in time). I am looking for the right tool and wonder if antlr is doing the job for me?
For example:
<html><body> <img src="foo.jpg" /> </body></html>
should be rewritten as:
<html><body> <img src="http://foo.com/foo.jpg" /> </body></html>
I want to read/write from/to a stream (byte by byte).
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
正如 khmarbaise 所说,首先确定正则表达式是否可以做到这一点。但在某些情况下,他们不能[*],然后我认为,ANTLR 可能确实是一个合理的选择。
[*] 有关于此的数学背景,请参阅 http://en.wikipedia.org/wiki /Formal_grammar#The_Chomsky_hierarchy
更新
现在您更新了您的问题,我明白您真正想要做什么:为了修改完整的 HTML 文件,我会使用像 NekoHTML 这样的解析器或其他东西类似: http://www.benmccann.com/dev -blog/java-html-parsing-library-comparison/
然后你可以使用这些来提取URL。然后
不要不 使用正则表达式解析整个HTML文件!理论上你可以使用 ANTLR 来实现这一点,但是很难让它可靠地工作。
As khmarbaise said, first make sure, if regular expressions can do it. But there are cases, in which they can't [*], and then I think, ANTLR might really be a legitimate choice.
[*] For the mathematical background on this, see http://en.wikipedia.org/wiki/Formal_grammar#The_Chomsky_hierarchy
Update
Now that you updated your question, I see what you really want to do: For modifying a complete HTML file, I'd use a parser like NekoHTML, or something similar: http://www.benmccann.com/dev-blog/java-html-parsing-library-comparison/
Then you can use these to extract the URL. Then
Do not use regular expressions to parse the entire HTML file! You could use ANTLR for that in theory, but it would be very hard to make that work reliably.
正则表达式呢?
What about Regular expressions ?