捕获组在 NSRegularExpression 中不起作用

发布于 2024-11-26 10:15:07 字数 1158 浏览 4 评论 0原文

为什么这段代码只输出整个正则表达式匹配而不是捕获组?

输入

@"A long string containing Name:</td><td>A name here</td> amongst other things"

预期输出

A name here

实际输出

Name:</td><td>A name here</td>

代码

NSString *htmlString = @"A long string containing Name:</td><td>A name here</td> amongst other things";
NSRegularExpression *nameExpression = [NSRegularExpression regularExpressionWithPattern:@"Name:</td>.*\">(.*)</td>" options:NSRegularExpressionSearch error:nil];

NSArray *matches = [nameExpression matchesInString:htmlString
                                  options:0
                                    range:NSMakeRange(0, [htmlString length])];
for (NSTextCheckingResult *match in matches) {
    NSRange matchRange = [match range];
    NSString *matchString = [htmlString substringWithRange:matchRange];
    NSLog(@"%@", matchString);
}

代码取自Apple文档。 我知道还有其他库可以执行此操作,但我想坚持使用为此任务内置的库。

Why is this code only spitting out the entire regex match instead of the capture group?

Input

@"A long string containing Name:</td><td>A name here</td> amongst other things"

Output expected

A name here

Actual output

Name:</td><td>A name here</td>

Code

NSString *htmlString = @"A long string containing Name:</td><td>A name here</td> amongst other things";
NSRegularExpression *nameExpression = [NSRegularExpression regularExpressionWithPattern:@"Name:</td>.*\">(.*)</td>" options:NSRegularExpressionSearch error:nil];

NSArray *matches = [nameExpression matchesInString:htmlString
                                  options:0
                                    range:NSMakeRange(0, [htmlString length])];
for (NSTextCheckingResult *match in matches) {
    NSRange matchRange = [match range];
    NSString *matchString = [htmlString substringWithRange:matchRange];
    NSLog(@"%@", matchString);
}

Code taken from Apple docs.
I know there are other libraries to do this but i want to stick with what's built in for this task.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

樱花坊 2024-12-03 10:15:07

您将使用以下命令访问第一组范围:

for (NSTextCheckingResult *match in matches) {
    //NSRange matchRange = [match range];
    NSRange matchRange = [match rangeAtIndex:1];
    NSString *matchString = [htmlString substringWithRange:matchRange];
    NSLog(@"%@", matchString);
}

You will access the first group range using :

for (NSTextCheckingResult *match in matches) {
    //NSRange matchRange = [match range];
    NSRange matchRange = [match rangeAtIndex:1];
    NSString *matchString = [htmlString substringWithRange:matchRange];
    NSLog(@"%@", matchString);
}
另类 2024-12-03 10:15:07

不要使用正则表达式或 NSScanner 解析 HTML。沿着这条路走下去就是疯狂。

这个问题已经被问过很多次了。

在 iPhone 上解析 HTML

我挑选的数据就像Name: A name和i一样简单
认为它很简单,只需使用正则表达式而不是
项目中包括一个完整的 HTML 解析器。

你和我都坚决主张“率先进入市场拥有巨大优势”。

不同之处在于,使用适当的 HTML 解析器时,您要考虑文档的结构。使用正则表达式,您依赖于文档永远不会以语法上完全有效的方式更改格式。

即,如果输入为 Name: A name 会怎样?您的正则表达式解析器刚刚在输入时中断,该输入既是有效的 HTML,又从标记内容的角度来看与原始输入相同。

Don't parse HTML with regular expressions or NSScanner. Down that path lies madness.

This has been asked many times on SO.

parsing HTML on the iPhone

The data i am picking out is as simple as <td>Name: A name</td> and i
think its simple enough to just use regular expressions instead of
including a full blown HTML parser in the project.

Up to you and I'm a strong advocate for "first to market has huge advantage".

The difference being that with a proper HTML parser, you are considering the structure of the document. Using regular expressions, you are relying on the document never changing format in ways that are syntactically otherwise perfectly valid.

I.e. what if the input were <td class="name">Name: A name</td>? Your regex parser just broke on input that is both valid HTML and, from a tag contents perspective, identical to the original input.

自控 2024-12-03 10:15:07

在 swift3 中

//: Playground - noun: a place where people can play

import UIKit

/// Two groups. 1: [A-Z]+, 2: [0-9]+
var pattern = "([A-Z]+)([0-9]+)"

let regex = try NSRegularExpression(pattern: pattern, options:[.caseInsensitive])

let str = "AA01B2C3DD4"
let strLen = str.characters.count
let results = regex.matches(in: str, options: [], range: NSMakeRange(0, strLen))

let nsStr = str as NSString

for a in results {

    let c = a.numberOfRanges 
    print(c)

    let m0 = a.rangeAt(0)  //< Ex: 'AA01'
    let m1 = a.rangeAt(1)  //< Group 1: Alpha chars, ex: 'AA'
    let m2 = a.rangeAt(2)  //< Group 2: Digital numbers, ex: '01'
    // let m3 = a.rangeAt(3) //< Runtime exceptions

    let s = nsStr.substring(with: m2)
    print(s)
}

In swift3

//: Playground - noun: a place where people can play

import UIKit

/// Two groups. 1: [A-Z]+, 2: [0-9]+
var pattern = "([A-Z]+)([0-9]+)"

let regex = try NSRegularExpression(pattern: pattern, options:[.caseInsensitive])

let str = "AA01B2C3DD4"
let strLen = str.characters.count
let results = regex.matches(in: str, options: [], range: NSMakeRange(0, strLen))

let nsStr = str as NSString

for a in results {

    let c = a.numberOfRanges 
    print(c)

    let m0 = a.rangeAt(0)  //< Ex: 'AA01'
    let m1 = a.rangeAt(1)  //< Group 1: Alpha chars, ex: 'AA'
    let m2 = a.rangeAt(2)  //< Group 2: Digital numbers, ex: '01'
    // let m3 = a.rangeAt(3) //< Runtime exceptions

    let s = nsStr.substring(with: m2)
    print(s)
}
花桑 2024-12-03 10:15:07

HTML 不是常规语言,无法使用正则表达式进行正确解析。 这是一个经典的答案解释这是程序员常见的错误假设。

HTML isn't a regular language and can't be properly parsed using regular expressions. Here's a classic SO answer explaining this common programmer misassumption.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文