使用 grep 从本地文件中的 HTML 标记内获取文本
输入文件摘录
<TD class="clsTDLabelWeb" width="28%">Municipality: </TD>
<TD style="WIDTH: 394px" class="clsTDLabelSm" colSpan="5">
<span id="DInfo1_Municipality">JUPITER</span></TD>
我的正则表达式
(?<=<span id="DInfo1_Municipality">)([^</span>]*)
我有一个 HTML 文件保存到磁盘上。 我想使用 grep 搜索文件并输出特定范围的内容,尽管我不知道这是否是 grep 的正确用法。当我使用从另一个文件读取的表达式在文件上运行 grep 时(这样我就不会混淆转义任何特殊字符),它不会输出任何内容。我已经测试了 RegExr 中的表达式,它与“JUPITER”匹配,这正是我想要返回的内容。非常感谢您的帮助!
所需输出
JUPITER
Possible Duplicate:
RegEx match open tags except XHTML self-contained tags
Excerpt From Input File
<TD class="clsTDLabelWeb" width="28%">Municipality: </TD>
<TD style="WIDTH: 394px" class="clsTDLabelSm" colSpan="5">
<span id="DInfo1_Municipality">JUPITER</span></TD>
My Regular Expression
(?<=<span id="DInfo1_Municipality">)([^</span>]*)
I have an HTML file saved to disk. I would like to use grep to search through the file and output the contents of a specific span, though I don't know if this is a proper use of grep. When I run grep on the file with the expression read from another file (so I dont mess up escaping any special characters), it doesn't output anything. I have tested the expression in RegExr and it matches "JUPITER" which is exactly what I want returned. Thank you so much for your help!
Desired Output
JUPITER
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
尝试一下:
或者使用 GNU grep 和你的正则表达式:
Give this a try:
or with GNU
grep
and your regex:Grep 不支持这种类型的正则表达式(lookbehind 断言),并且它是一个非常糟糕的工具,但对于给出的示例来说它是可行的,在很多情况下都会崩溃。
像这样疯狂的事情,不是一个好主意。
Grep doesn't support that type of regex (lookbehind assertions), and its a very poor tool for this, but for the example given it is workable, will break under many situtions.
something crazy like that, not a good idea.