捕获组在 NSRegularExpression 中不起作用
为什么这段代码只输出整个正则表达式匹配而不是捕获组?
输入
@"A long string containing Name:</td><td>A name here</td> amongst other things"
预期输出
A name here
实际输出
Name:</td><td>A name here</td>
代码
NSString *htmlString = @"A long string containing Name:</td><td>A name here</td> amongst other things";
NSRegularExpression *nameExpression = [NSRegularExpression regularExpressionWithPattern:@"Name:</td>.*\">(.*)</td>" options:NSRegularExpressionSearch error:nil];
NSArray *matches = [nameExpression matchesInString:htmlString
options:0
range:NSMakeRange(0, [htmlString length])];
for (NSTextCheckingResult *match in matches) {
NSRange matchRange = [match range];
NSString *matchString = [htmlString substringWithRange:matchRange];
NSLog(@"%@", matchString);
}
代码取自Apple文档。 我知道还有其他库可以执行此操作,但我想坚持使用为此任务内置的库。
Why is this code only spitting out the entire regex match instead of the capture group?
Input
@"A long string containing Name:</td><td>A name here</td> amongst other things"
Output expected
A name here
Actual output
Name:</td><td>A name here</td>
Code
NSString *htmlString = @"A long string containing Name:</td><td>A name here</td> amongst other things";
NSRegularExpression *nameExpression = [NSRegularExpression regularExpressionWithPattern:@"Name:</td>.*\">(.*)</td>" options:NSRegularExpressionSearch error:nil];
NSArray *matches = [nameExpression matchesInString:htmlString
options:0
range:NSMakeRange(0, [htmlString length])];
for (NSTextCheckingResult *match in matches) {
NSRange matchRange = [match range];
NSString *matchString = [htmlString substringWithRange:matchRange];
NSLog(@"%@", matchString);
}
Code taken from Apple docs.
I know there are other libraries to do this but i want to stick with what's built in for this task.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
您将使用以下命令访问第一组范围:
You will access the first group range using :
不要使用正则表达式或 NSScanner 解析 HTML。沿着这条路走下去就是疯狂。
这个问题已经被问过很多次了。
在 iPhone 上解析 HTML
你和我都坚决主张“率先进入市场拥有巨大优势”。
不同之处在于,使用适当的 HTML 解析器时,您要考虑文档的结构。使用正则表达式,您依赖于文档永远不会以语法上完全有效的方式更改格式。
即,如果输入为
Name: A name
会怎样?您的正则表达式解析器刚刚在输入时中断,该输入既是有效的 HTML,又从标记内容的角度来看与原始输入相同。Don't parse HTML with regular expressions or NSScanner. Down that path lies madness.
This has been asked many times on SO.
parsing HTML on the iPhone
Up to you and I'm a strong advocate for "first to market has huge advantage".
The difference being that with a proper HTML parser, you are considering the structure of the document. Using regular expressions, you are relying on the document never changing format in ways that are syntactically otherwise perfectly valid.
I.e. what if the input were
<td class="name">Name: A name</td>
? Your regex parser just broke on input that is both valid HTML and, from a tag contents perspective, identical to the original input.在 swift3 中
In swift3
HTML 不是常规语言,无法使用正则表达式进行正确解析。 这是一个经典的答案解释这是程序员常见的错误假设。
HTML isn't a regular language and can't be properly parsed using regular expressions. Here's a classic SO answer explaining this common programmer misassumption.