正则表达式不匹配

发布于 2024-08-31 13:58:35 字数 1071 浏览 10 评论 0原文

我有一个 HTML 非常不干净的字符串。在解析它之前，我想将其转换为：

<TABLE><TR><TD width="33%" nowrap=1><font size="1" face="Arial">
NE
</font> </TD>
<TD width="33%" nowrap=1><font size="1" face="Arial">
DEK
</font> </TD>
<TD width="33%" nowrap=1><font size="1" face="Arial">
143
</font> </TD>
</TR></TABLE>

in NE DEK 143 这样更容易解析。我有这个正则表达式（RegexKitLite）：

NSString *str = [dataString stringByReplacingOccurrencesOfRegex:@"<TABLE><TR><TD width=\"33%\" nowrap=1><font size=\"1\" face=\"Arial\">(.+?)<\\/font> <\\/TD>(.+?)<TD width=\"33%\" nowrap=1><font size=\"1\" face=\"Arial\">(.+?)<\\/font> <\\/TD>(.+?)<TD width=\"33%\" nowrap=1><font size=\"1\" face=\"Arial\">(.+?)<\\/font> <\\/TD>(.+?)<\\/TR><\\/TABLE>" 
                                                     withString:@"$1 $3 $5"];

我不是正则表达式专家。有人可以帮我吗？

问候，渡渡鸟

原文

I've got a string with very unclean HTML. Before I parse it, I want to convert this:

<TABLE><TR><TD width="33%" nowrap=1><font size="1" face="Arial">
NE
</font> </TD>
<TD width="33%" nowrap=1><font size="1" face="Arial">
DEK
</font> </TD>
<TD width="33%" nowrap=1><font size="1" face="Arial">
143
</font> </TD>
</TR></TABLE>

in NE DEK 143 so it is a bit easier to parse. I've got this regular expression (RegexKitLite):

NSString *str = [dataString stringByReplacingOccurrencesOfRegex:@"<TABLE><TR><TD width=\"33%\" nowrap=1><font size=\"1\" face=\"Arial\">(.+?)<\\/font> <\\/TD>(.+?)<TD width=\"33%\" nowrap=1><font size=\"1\" face=\"Arial\">(.+?)<\\/font> <\\/TD>(.+?)<TD width=\"33%\" nowrap=1><font size=\"1\" face=\"Arial\">(.+?)<\\/font> <\\/TD>(.+?)<\\/TR><\\/TABLE>" 
                                                     withString:@"$1 $3 $5"];

I'm no an expert in Regex. Can someone help me out here?

Regards, dodo

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

相权↑美人 2024-09-07 13:58:35

阿玛戈什和博宾斯（链接问题的获胜回答者）对此的看法总体上是正确的。然而，由于您只是进行清理，因此正则表达式实际上就可以了。

首先，剥离标签：

s/<.*?>//

然后将所有多余空格折叠为一个：

s/\s+/ /

然后删除前导/尾随空格：

s/^\s+|\s+$//

然后获取值：

^([^ ]+) ([^ ]+) ([^ ]+)$

Amarghosh, and bobince, the winning answerer of linked question, is generally right about this. However, since you are just sanitising, regexps are actually just fine.

First, strip the tags:

s/<.*?>//

Then collapse all extra spaces into one:

s/\s+/ /

Then remove leading/trailing space:

s/^\s+|\s+$//

Then get the values:

^([^ ]+) ([^ ]+) ([^ ]+)$

回复收藏 0 原文

北风几吹夏 2024-09-07 13:58:35

我对你的正则表达式可能失败的原因有一些怀疑（不知道 iPhone SDK 中字符串转义的规则）：点 . 用于必须匹配换行符的地方，斜杠看起来像它被不必要地转义等，

但是：在您的示例中，您尝试提取的文本的特点是没有被标签包围。

因此，搜索所有出现的 (?m)^[^<>\r\n]$ 应找到所有匹配项。

回复收藏 0 原文

不甘平庸 2024-09-07 13:58:35

如果您确定 html 代码层次结构，那么您可以提取字体标签包含的文本：

Regex r = Regex(@"<\s*font((\s+[^<>]*)|(\s*))>(?<desiredText>[^<>]*)<\s*/\s*font\s*>")
//C# example
foreach(Match m in r.Matches(txt))
   result += m.Groups["desiredText"].Value.Trim()

;
它将是由字体标签包围的文本，边缘没有空格符号。

If you sure of your html-code hierarchy, then you can just extract text enclosed by font-tags:

Regex r = Regex(@"<\s*font((\s+[^<>]*)|(\s*))>(?<desiredText>[^<>]*)<\s*/\s*font\s*>")
//C# example
foreach(Match m in r.Matches(txt))
   result += m.Groups["desiredText"].Value.Trim()

;
It will be text enclosed by font-tags without white-space symbols by edges.

回复收藏 0 原文

~没有更多了~