正则表达式使用 Yahoo Pipes 删除链接

发布于 2024-08-14 17:59:48 字数 302 浏览 10 评论 0原文

每个人。我正在开展学校项目，并且一直在努力使用雅虎管道清理提要中的所有链接。

例如，从我的 item.description 中删除 Go to Source。

离开“转到源代码”而不使用活动链接

我正在使用正则表达式模块，并且我尝试使用此表达式

#</?a[^>]*>#iu

但没有成功。请有人帮我解决这个问题。

原文

everyone. i am working on school project and i have been struggling to clean all links in a feed using yahoo pipes.

For instance removing <a href="http://mickey.com">Go to Source</a> from my item.description.

Leaving the" Go to source" without the active link

I am using the regex module and i tried to use this expression

#</?a[^>]*>#iu

But no success. Please can someone help me with this.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

如此安好 2024-08-21 17:59:48

本质上，您想要的是：

<a.*?>(.*?)</a>

这将捕获 $1 中的链接文本。 “.*？”是一个非贪婪的匹配 - 这意味着它会匹配任何东西，但尽可能少的匹配。

为了更加安全，您可能需要在奇怪的位置和大小写选项中接受一些空格：

<\s*[Aa].*?>(.*?)<\s*/[Aa]\s*>

即使这不是万无一失的，但应该可以处理大多数情况。

如果您使用“regex”模块而不是“string regex”模块，请不要忘记 g 和 s 选项。

Essentially,what you want is:

<a.*?>(.*?)</a>

This will capture the link text in $1. ".*?" is a non-greedy match - meaning that is will match anything, but as few times a possible.

To be extra safe, you may want to accept some spaces in odd places and case options:

<\s*[Aa].*?>(.*?)<\s*/[Aa]\s*>

Even this is not bulletproof, but should handle most cases.

Don't forget the g and s options if you are using the "regex" module rather than the "string regex" one.

回复收藏 0 原文

西瓜 2024-08-21 17:59:48

试试这个：

$html = 'This is some text <a href="http://mickey.com">Go to Source</a> more text';
$result = preg_replace('%<a[ ]{1}.*?>(.*?)</a>%i', '$1', $html);
echo $result // echo's "This is some text Go to Source  more text"

try this:

$html = 'This is some text <a href="http://mickey.com">Go to Source</a> more text';
$result = preg_replace('%<a[ ]{1}.*?>(.*?)</a>%i', '$1', $html);
echo $result // echo's "This is some text Go to Source  more text"

回复收藏 0 原文

吹泡泡o 2024-08-21 17:59:48

HTML 至少是一种上下文无关的语言。使用正则表达式不可能正确解析 CFL。因此，这是不可能的。使用适当的 HTML 解析库并重新设计 DOM 树或偶数流（取决于接口）以适合您想要执行的操作。

回复收藏 0 原文

美人骨 2024-08-21 17:59:48

HTML 是不是常规语言，以及无法通过正则表达式匹配。您可以将一些可能与某些 HTML 匹配的内容放在一起，有时会起作用，但一旦出现一些奇怪的情况，就会意外地失败。

现在，遗憾的是，Yahoo Pipes 似乎不包含 HTML 解析器。根据此博客条目，但是，您可以通过 HTML Tidy 传输数据，然后使用它们的 < href="http://pipes.yahoo.com/pipes/docs?doc=sources#FetchData" rel="nofollow noreferrer">获取数据模块，可以解析 XML 以以结构化格式提取数据。之后处理 XML 的工具并不理想（它们似乎不支持像 XPath 或 CSS 选择器查询那样有用的东西），但至少您可以以结构化格式处理数据，这些数据已经由适当的解析器解析过。 HTML 解析器。