正则表达式获取标签内的值

发布于 2024-08-11 01:38:37 字数 498 浏览 7 评论 0 原文

我返回了一组 XML 示例:

<rsp stat="ok">
  <site>
    <id>1234</id>
    <name>testAddress</name>
    <hostname>anotherName</hostname>
    ...

  </site>
  <site>
    <id>56789</id>
    <name>ba</name>
    <hostname>alphatest</hostname>
    ...
  </site>
</rsp>

我想提取 中的所有内容,但不提取标签本身,并且仅针对第一个实例(或基于在其他一些测试中选择哪个项目)。

这可以用正则表达式吗?

I have a sample set of XML returned back:

<rsp stat="ok">
  <site>
    <id>1234</id>
    <name>testAddress</name>
    <hostname>anotherName</hostname>
    ...

  </site>
  <site>
    <id>56789</id>
    <name>ba</name>
    <hostname>alphatest</hostname>
    ...
  </site>
</rsp>

I want to extract everything within <name></name> but not the tags themselves, and to have that only for the first instance (or based on some other test select which item).

Is this possible with regex?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

情未る 2024-08-18 01:38:37

<免责声明>我不使用 Objective-C

您应该使用 XML 解析器不是正则表达式XML 不是常规语言,< a href="https://stackoverflow.com/questions/968919/when-not-to-use-regex-in-c-or-java-c-etc">因此不容易解析 by 不要这样做

切勿使用正则表达式或基本字符串解析来处理 XML。目前常用的每种语言都具有完美的 XML 支持。 XML 是一个看似复杂的标准,您的代码不太可能是正确的,因为它能够正确解析所有格式良好的 XML 输入,即使确实如此,您也是在浪费时间,因为(正如刚才提到的)每种语言常见用法有 XML 支持。使用正则表达式解析XML是不专业的。

您可以使用 Expat,并具有 目标 C 绑定

Apple 的选项是

  1. CF xml 解析器
  2. 基于树的 Cocoa 解析器(仅限 10.4)

<disclaimer>I don't use Objective-C</disclaimer>

You should be using an XML parser, not regexes. XML is not a regular language, hence not easely parseable by a regular expression. Don't do it.

Never use regular expressions or basic string parsing to process XML. Every language in common usage right now has perfectly good XML support. XML is a deceptively complex standard and it's unlikely your code will be correct in the sense that it will properly parse all well-formed XML input, and even it if does, you're wasting your time because (as just mentioned) every language in common usage has XML support. It is unprofessional to use regular expressions to parse XML.

You could use Expat, with has Objective C bindings.

Apple's options are:

  1. The CF xml parser
  2. The tree based Cocoa parser (10.4 only)
只怪假的太真实 2024-08-18 01:38:37

在不了解您的语言或环境的情况下,这里有一些 Perl 表达式。希望它能为您的应用提供正确的想法。

用于捕获标签文本内容的正则表达式将如下所示:

m/>([^<]*)</

这将捕获每个标签中的内容。您必须循环匹配才能提取所有内容。请注意,这不考虑自终止标签。您需要一个具有负向后查找功能的正则表达式引擎来完成此任务。在不了解您的环境的情况下,很难说它是否会受到支持。

您还可以使用以下内容从源中删除所有标签:

s/<[^>]*>//g

另外,根据您的环境,如果您可以使用 XML 解析库,它将使您的生活变得更加轻松。毕竟,通过采用正则表达式方法,您将失去 XML 真正为您提供的一切(结构化数据、上下文感知等)。

Without knowing your language or environment, here are some perl expressions. Hopefully it will give you the right idea for your application.

Your regular expression to capture the text content of a tag would look something like this:

m/>([^<]*)</

This will capture the content in each tag. You will have to loop on the match to extract all content. Note that this does not account for self-terminated tags. You would need a regex engine with negative lookbehinds to accomplish that. Without knowing your environment, it's hard to say if it would be supported.

You could also just strip all tags from your source using something like:

s/<[^>]*>//g

Also depending on your environment, if you can use an XML-parsing library, it will make your life much easier. After all, by taking the regex approach, you lose everything that XML really offers you (structured data, context awareness, etc).

苦笑流年记忆 2024-08-18 01:38:37

完成此类任务的最佳工具是 XPath

NSURL *rspURL = [NSURL fileURLWithPath:[@"~/rsp.xml" stringByExpandingTildeInPath]];
NSXMLDocument *document = [[[NSXMLDocument alloc] initWithContentsOfURL:rspURL options:NSXMLNodeOptionsNone error:NULL] autorelease];

NSArray *nodes = [document nodesForXPath:@"/rsp/site[1]/name" error:NULL];
NSString *name = [nodes count] > 0 ? [[nodes objectAtIndex:0] stringValue] : nil;

如果您想要 id 为 56789 的站点的名称,请改用此 XPath:/rsp/site[id='56789']/name。我建议您阅读 W3Schools XPath 教程,快速了解 XPath 语法。

The best tool for this kind of task is XPath.

NSURL *rspURL = [NSURL fileURLWithPath:[@"~/rsp.xml" stringByExpandingTildeInPath]];
NSXMLDocument *document = [[[NSXMLDocument alloc] initWithContentsOfURL:rspURL options:NSXMLNodeOptionsNone error:NULL] autorelease];

NSArray *nodes = [document nodesForXPath:@"/rsp/site[1]/name" error:NULL];
NSString *name = [nodes count] > 0 ? [[nodes objectAtIndex:0] stringValue] : nil;

If you want the name of the site which has id 56789, use this XPath: /rsp/site[id='56789']/name instead. I suggest you read W3Schools XPath tutorial for a quick overview of the XPath syntax.

转角预定愛 2024-08-18 01:38:37

正如其他人所说,您确实应该使用 NSXMLParser 用于此类事情。

但是,如果您需要提取名称标签中的内容,则RegexKitLite可以很容易地做到这一点:

NSString * xmlString = ...;
NSArray * captures = [xmlString arrayOfCaptureComponentsMatchedByRegex:@"<name>(.*?)</name>"];
for (NSArray * captureGroup in captures) {
  NSLog(@"Name: %@", [captureGroup objectAtIndex:1];
}

As others say, you should really be using NSXMLParser for this sort of thing.

HOWEVER, if you only need to extract the stuff in the name tags, then RegexKitLite can do it quite easily:

NSString * xmlString = ...;
NSArray * captures = [xmlString arrayOfCaptureComponentsMatchedByRegex:@"<name>(.*?)</name>"];
for (NSArray * captureGroup in captures) {
  NSLog(@"Name: %@", [captureGroup objectAtIndex:1];
}
挽清梦 2024-08-18 01:38:37

小心命名空间:

<prefix:name xmlns:prefix="">testAddress</prefix:name>

等效的 XML 会破坏基于正则表达式的代码。对于 XML,请使用 XML 解析器。对于此类事情,XPath 是您的朋友。下面的 XPath 代码将返回包含您想要的信息的字符串序列:

./rsp/site/name/text()

Cocoa 有 对 XPath 的 NSXML 支持

Careful about namespaces:

<prefix:name xmlns:prefix="">testAddress</prefix:name>

is equivalent XML that will break regexp based code. For XML, use an XML parser. XPath is your friend for things like this. The XPath code below will return a sequence of strings with the info you want:

./rsp/site/name/text()

Cocoa has NSXML support for XPath.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文