使用 RegExp 获取 HTML 标签内的内容

发布于 2024-10-12 03:40:01 字数 414 浏览 4 评论 0原文

我想使用 regexp 从一个大的表格单元格文件中提取内容，并使用 PHP 处理数据。

这是我想要匹配的数据：

<td>Current Value: </td><td>100.178</td>

我尝试使用此正则表达式来匹配和检索文本：

preg_match("<td>Current Value: </td><td>(.+?)</td>", $data, $output);

但是我收到“未知修饰符”警告，并且我的变量 $output 为空。

我怎样才能做到这一点 - 您能否给我一个解决方案如何工作的简短摘要，以便我可以尝试理解为什么我的代码没有？

原文

I'd like to extract the content from a large file of table cells using regexp and process the data using PHP.

Here's the data I would like to match:

<td>Current Value: </td><td>100.178</td>

I tried using this regexp to match and retrieve the text:

preg_match("<td>Current Value: </td><td>(.+?)</td>", $data, $output);

However I get an "Unknown modifier" warning and my variable $output comes out empty.

How can I accomplish this - and could you give me a brief summary of how the solution works so I can try to understand why my code didn't?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

鸠书 2024-10-19 03:40:01

您需要在正则表达式周围添加分隔符：

preg_match("#<td>Current Value: </td><td>(.+?)</td>#", $data, $output);

标准分隔符是 /，但如果您愿意，您可以使用其他非字母数字字符（这在这里很有意义，因为正则表达式本身包含斜杠）。在您的情况下，正则表达式引擎认为您想使用尖括号作为分隔符 - 但失败了。

还有一个提示（除了规范的劝告“你不应该用正则表达式解析 HTML”（我认为在这样的特定情况下这是完全可以的））：使用 ([^<>]+)而不是<代码>(.*?)。这可以确保您的正则表达式永远不会跨越嵌套标签，这是处理标记语言时常见的错误来源。

You need to add delimiters around your regex:

preg_match("#<td>Current Value: </td><td>(.+?)</td>#", $data, $output);

The standard delimiter is /, but you can use other non-alphanumeric characters if you wish (which makes sense here because the regex itself contains slashes). In your case, the regex engine thought you wanted to use angle brackets as delimiters - and failed.

One more tip (aside from the canonical exhortation "Thou shalt not parse HTML with regexen" (which I think is perfectly OK in a specific case like this)): Use ([^<>]+) instead of (.*?). This ensures that your regex will never travel across nested tags, a common source of errors when dealing with markup languages.

回复收藏 0 原文