当前位置：文江博客话题详情

正则表达式获取标签内的值

发布于 2024-08-11 01:38:37 字数 498 浏览 7 评论 0 原文

我返回了一组 XML 示例：

<rsp stat="ok">
  <site>
    <id>1234</id>
    <name>testAddress</name>
    <hostname>anotherName</hostname>
    ...

  </site>
  <site>
    <id>56789</id>
    <name>ba</name>
    <hostname>alphatest</hostname>
    ...
  </site>
</rsp>

我想提取中的所有内容，但不提取标签本身，并且仅针对第一个实例（或基于在其他一些测试中选择哪个项目）。

这可以用正则表达式吗？

原文

I have a sample set of XML returned back:

<rsp stat="ok">
  <site>
    <id>1234</id>
    <name>testAddress</name>
    <hostname>anotherName</hostname>
    ...

  </site>
  <site>
    <id>56789</id>
    <name>ba</name>
    <hostname>alphatest</hostname>
    ...
  </site>
</rsp>

I want to extract everything within <name></name> but not the tags themselves, and to have that only for the first instance (or based on some other test select which item).

Is this possible with regex?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

情未る 2024-08-18 01:38:37

_{<免责声明>我不使用 Objective-C}

您应该使用 XML 解析器，不是正则表达式。 XML 不是常规语言，< a href="https://stackoverflow.com/questions/968919/when-not-to-use-regex-in-c-or-java-c-etc">因此不容易解析 by 不要这样做。

切勿使用正则表达式或基本字符串解析来处理 XML。目前常用的每种语言都具有完美的 XML 支持。 XML 是一个看似复杂的标准，您的代码不太可能是正确的，因为它能够正确解析所有格式良好的 XML 输入，即使确实如此，您也是在浪费时间，因为（正如刚才提到的）每种语言常见用法有 XML 支持。使用正则表达式解析XML是不专业的。

您可以使用 Expat，并具有目标 C 绑定。

Apple 的选项是：

CF xml 解析器

基于树的 Cocoa 解析器（仅限 10.4）

回复收藏 0 原文

只怪假的太真实 2024-08-18 01:38:37

在不了解您的语言或环境的情况下，这里有一些 Perl 表达式。希望它能为您的应用提供正确的想法。

用于捕获标签文本内容的正则表达式将如下所示：

m/>([^<]*)</

这将捕获每个标签中的内容。您必须循环匹配才能提取所有内容。请注意，这不考虑自终止标签。您需要一个具有负向后查找功能的正则表达式引擎来完成此任务。在不了解您的环境的情况下，很难说它是否会受到支持。

您还可以使用以下内容从源中删除所有标签：

s/<[^>]*>//g

另外，根据您的环境，如果您可以使用 XML 解析库，它将使您的生活变得更加轻松。毕竟，通过采用正则表达式方法，您将失去 XML 真正为您提供的一切（结构化数据、上下文感知等）。

Without knowing your language or environment, here are some perl expressions. Hopefully it will give you the right idea for your application.

Your regular expression to capture the text content of a tag would look something like this:

m/>([^<]*)</

This will capture the content in each tag. You will have to loop on the match to extract all content. Note that this does not account for self-terminated tags. You would need a regex engine with negative lookbehinds to accomplish that. Without knowing your environment, it's hard to say if it would be supported.

You could also just strip all tags from your source using something like:

s/<[^>]*>//g

Also depending on your environment, if you can use an XML-parsing library, it will make your life much easier. After all, by taking the regex approach, you lose everything that XML really offers you (structured data, context awareness, etc).

回复收藏 0 原文

苦笑流年记忆 2024-08-18 01:38:37

完成此类任务的最佳工具是 XPath。

NSURL *rspURL = [NSURL fileURLWithPath:[@"~/rsp.xml" stringByExpandingTildeInPath]];
NSXMLDocument *document = [[[NSXMLDocument alloc] initWithContentsOfURL:rspURL options:NSXMLNodeOptionsNone error:NULL] autorelease];

NSArray *nodes = [document nodesForXPath:@"/rsp/site[1]/name" error:NULL];
NSString *name = [nodes count] > 0 ? [[nodes objectAtIndex:0] stringValue] : nil;

如果您想要 id 为 56789 的站点的名称，请改用此 XPath：/rsp/site[id='56789']/name。我建议您阅读 W3Schools XPath 教程，快速了解 XPath 语法。

The best tool for this kind of task is XPath.

NSURL *rspURL = [NSURL fileURLWithPath:[@"~/rsp.xml" stringByExpandingTildeInPath]];
NSXMLDocument *document = [[[NSXMLDocument alloc] initWithContentsOfURL:rspURL options:NSXMLNodeOptionsNone error:NULL] autorelease];

NSArray *nodes = [document nodesForXPath:@"/rsp/site[1]/name" error:NULL];
NSString *name = [nodes count] > 0 ? [[nodes objectAtIndex:0] stringValue] : nil;

If you want the name of the site which has id 56789, use this XPath: /rsp/site[id='56789']/name instead. I suggest you read W3Schools XPath tutorial for a quick overview of the XPath syntax.

回复收藏 0 原文

转角预定愛 2024-08-18 01:38:37

正如其他人所说，您确实应该使用 NSXMLParser 用于此类事情。

但是，如果您仅需要提取名称标签中的内容，则RegexKitLite可以很容易地做到这一点：

NSString * xmlString = ...;
NSArray * captures = [xmlString arrayOfCaptureComponentsMatchedByRegex:@"<name>(.*?)</name>"];
for (NSArray * captureGroup in captures) {
  NSLog(@"Name: %@", [captureGroup objectAtIndex:1];
}

As others say, you should really be using NSXMLParser for this sort of thing.

HOWEVER, if you only need to extract the stuff in the name tags, then RegexKitLite can do it quite easily:

NSString * xmlString = ...;
NSArray * captures = [xmlString arrayOfCaptureComponentsMatchedByRegex:@"<name>(.*?)</name>"];
for (NSArray * captureGroup in captures) {
  NSLog(@"Name: %@", [captureGroup objectAtIndex:1];
}

回复收藏 0 原文

挽清梦 2024-08-18 01:38:37

小心命名空间：

<prefix:name xmlns:prefix="">testAddress</prefix:name>

等效的 XML 会破坏基于正则表达式的代码。对于 XML，请使用 XML 解析器。对于此类事情，XPath 是您的朋友。下面的 XPath 代码将返回包含您想要的信息的字符串序列：

./rsp/site/name/text()

Cocoa 有对 XPath 的 NSXML 支持。

Careful about namespaces:

<prefix:name xmlns:prefix="">testAddress</prefix:name>

is equivalent XML that will break regexp based code. For XML, use an XML parser. XPath is your friend for things like this. The XPath code below will return a sequence of strings with the info you want: