正则表达式获取标签内的值
我返回了一组 XML 示例:
<rsp stat="ok">
<site>
<id>1234</id>
<name>testAddress</name>
<hostname>anotherName</hostname>
...
</site>
<site>
<id>56789</id>
<name>ba</name>
<hostname>alphatest</hostname>
...
</site>
</rsp>
我想提取
中的所有内容,但不提取标签本身,并且仅针对第一个实例(或基于在其他一些测试中选择哪个项目)。
这可以用正则表达式吗?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
<免责声明>
我不使用 Objective-C您应该使用 XML 解析器,不是正则表达式。 XML 不是常规语言,< a href="https://stackoverflow.com/questions/968919/when-not-to-use-regex-in-c-or-java-c-etc">因此不容易解析 by 不要这样做。
您可以使用 Expat,并具有 目标 C 绑定。
<disclaimer>
I don't use Objective-C</disclaimer>
You should be using an XML parser, not regexes. XML is not a regular language, hence not easely parseable by a regular expression. Don't do it.
You could use Expat, with has Objective C bindings.
在不了解您的语言或环境的情况下,这里有一些 Perl 表达式。希望它能为您的应用提供正确的想法。
用于捕获标签文本内容的正则表达式将如下所示:
这将捕获每个标签中的内容。您必须循环匹配才能提取所有内容。请注意,这不考虑自终止标签。您需要一个具有负向后查找功能的正则表达式引擎来完成此任务。在不了解您的环境的情况下,很难说它是否会受到支持。
您还可以使用以下内容从源中删除所有标签:
另外,根据您的环境,如果您可以使用 XML 解析库,它将使您的生活变得更加轻松。毕竟,通过采用正则表达式方法,您将失去 XML 真正为您提供的一切(结构化数据、上下文感知等)。
Without knowing your language or environment, here are some perl expressions. Hopefully it will give you the right idea for your application.
Your regular expression to capture the text content of a tag would look something like this:
This will capture the content in each tag. You will have to loop on the match to extract all content. Note that this does not account for self-terminated tags. You would need a regex engine with negative lookbehinds to accomplish that. Without knowing your environment, it's hard to say if it would be supported.
You could also just strip all tags from your source using something like:
Also depending on your environment, if you can use an XML-parsing library, it will make your life much easier. After all, by taking the regex approach, you lose everything that XML really offers you (structured data, context awareness, etc).
完成此类任务的最佳工具是 XPath。
如果您想要 id 为 56789 的站点的名称,请改用此 XPath:
/rsp/site[id='56789']/name
。我建议您阅读 W3Schools XPath 教程,快速了解 XPath 语法。The best tool for this kind of task is XPath.
If you want the name of the site which has id 56789, use this XPath:
/rsp/site[id='56789']/name
instead. I suggest you read W3Schools XPath tutorial for a quick overview of the XPath syntax.正如其他人所说,您确实应该使用
NSXMLParser
用于此类事情。但是,如果您仅需要提取名称标签中的内容,则RegexKitLite可以很容易地做到这一点:
As others say, you should really be using
NSXMLParser
for this sort of thing.HOWEVER, if you only need to extract the stuff in the name tags, then RegexKitLite can do it quite easily:
小心命名空间:
等效的 XML 会破坏基于正则表达式的代码。对于 XML,请使用 XML 解析器。对于此类事情,XPath 是您的朋友。下面的 XPath 代码将返回包含您想要的信息的字符串序列:
Cocoa 有 对 XPath 的 NSXML 支持。
Careful about namespaces:
is equivalent XML that will break regexp based code. For XML, use an XML parser. XPath is your friend for things like this. The XPath code below will return a sequence of strings with the info you want:
Cocoa has NSXML support for XPath.