将推文文本中的 @replies 替换为 HTML 超链接，而不替换电子邮件地址

发布于 2024-07-13 22:16:19 字数 701 浏览 7 评论 0原文

我正在使用正则表达式使用以下 PHP 代码检测 Twitter 流中的 @replies。在第一个模式中，我在字符串的开头替换@replies；在第二个中，我替换了空格后面的@replies。

$text = preg_replace('!^@([A-Za-z0-9_]+)!', '<a href="http://twitter.com/$1" target="_blank">@$1</a>', $text);
$text = preg_replace('! @([A-Za-z0-9_]+)!', ' <a href="http://twitter.com/$1" target="_blank">@$1</a>', $text);

如何最好地结合这两个规则而不出现错误标记 [email protected] 作为回复？

原文

I'm detecting @replies in a Twitter stream with the following PHP code using regexes. In the first pattern, I replace @replies at the beginning of the string; in the second, I replace the @replies which follow a space.

$text = preg_replace('!^@([A-Za-z0-9_]+)!', '<a href="http://twitter.com/$1" target="_blank">@$1</a>', $text);
$text = preg_replace('! @([A-Za-z0-9_]+)!', ' <a href="http://twitter.com/$1" target="_blank">@$1</a>', $text);

How can I best combine these two rules without false flagging [email protected] as a reply?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

苏佲洛 2024-07-20 22:16:20

好的，再想一想，不标记whatever@email 意味着前一个元素必须是“非单词”项目，因为单词中可能包含的任何其他元素都可以被标记为电子邮件，因此它会导致：

!(^|\W)@([A-Za-z0-9_]+)!

但是你必须使用 $2 而不是 $1。

OK, on a second thought, not flagging whatever@email means that the previous element has to be a "non-word" item, because any other element that could be contained in a word could be signaled as an email, so it would lead:

!(^|\W)@([A-Za-z0-9_]+)!

but then you have to use $2 instead of $1.

回复收藏 0 原文

南街女流氓 2024-07-20 22:16:20

由于 ^ 不必位于 RE 的开头，因此您可以使用分组和 | 来组合这些 RE。

如果您不想重新插入捕获的空白，则必须使用“正向后查找”：

$text = preg_replace('/(?<=^|\s)@(\w+)/',
    '<a href="http://twitter.com/$1" target="_blank">@$1</a>', $text);

或“负向后查找”：

$text = preg_replace('/(?<!\S)@(\w+)/',
    '<a href="http://twitter.com/$1" target="_blank">@$1</a>', $text);

...无论您认为更容易理解哪个。

Since the ^ does not have to stand at the beginning of the RE, you can use grouping and | to combine those REs.

If you don't want re-insert the whitespace you captured, you have to use "positive lookbehind":

$text = preg_replace('/(?<=^|\s)@(\w+)/',
    '<a href="http://twitter.com/$1" target="_blank">@$1</a>', $text);

or "negative lookbehind":

$text = preg_replace('/(?<!\S)@(\w+)/',
    '<a href="http://twitter.com/$1" target="_blank">@$1</a>', $text);

...whichever you find easier to understand.

回复收藏 0 原文

薄情伤 2024-07-20 22:16:20

这是我的组合方式

$text = preg_replace('!(^| )@([A-Za-z0-9_]+)!', '$1<a href="http://twitter.com/$2" target="_blank">@$2</a>', $text);

Here's how I'd do the combination

$text = preg_replace('!(^| )@([A-Za-z0-9_]+)!', '$1<a href="http://twitter.com/$2" target="_blank">@$2</a>', $text);

回复收藏 0 原文

剪不断理还乱 2024-07-20 22:16:20

在非捕获组中使用交替，如果使用 \K 匹配，则忘记空格。

使用 (\w+) 捕获字母数字和下划线字符。

全字符串匹配将保留 @。
捕获组 1 将包含 @ 之后的文本。

代码：(演示)

echo preg_replace(
         '/(?:^| \K)@(\w+)/',
         '<a href="http://twitter.com/$1" target="_blank">$0</a>',
         $tweet
     );

Use alternation in the non-capturing group and forget the space if matched using \K.

Use (\w+) to capture alphanumeric and underscore characters.

The fullstring match will retain the @.
Capture group 1 will contain the text after the @.

Code: (Demo)

echo preg_replace(
         '/(?:^| \K)@(\w+)/',
         '<a href="http://twitter.com/$1" target="_blank">$0</a>',
         $tweet
     );

回复收藏 0 原文

烂人 2024-07-20 22:16:20

$text = preg_replace('/(^|\W)@(\w+)/', '<a href="http://twitter.com/$2" target="_blank">@$2</a>', $text);

$text = preg_replace('/(^|\W)@(\w+)/', '<a href="http://twitter.com/$2" target="_blank">@$2</a>', $text);

回复收藏 0 原文

夏末染殇 2024-07-20 22:16:20

preg_replace('%(?<!\S)@([A-Za-z0-9_]+)%', '<a href="http://twitter.com/$1" target="_blank">@$1</a>', $text);

(? 被粗略地翻译为“前面没有非空白字符”。有点像双重否定，但也适用于字符串/行的开头。

这不会消耗任何前面的字符，不会使用任何捕获组，并且不会匹配诸如 "[email protected]"，这是一个有效的电子邮件地址。

测试：

Input = 'foo bar [email protected] bee @def goo@doo @woo'
Output = 'foo bar [email protected] bee <a href="http://twitter.com/def" target="_blank">@def</a> goo@doo <a href="http://twitter.com/woo" target="_blank">@woo</a>'

preg_replace('%(?<!\S)@([A-Za-z0-9_]+)%', '<a href="http://twitter.com/$1" target="_blank">@$1</a>', $text);

(?<!\S) is loosely translated to "no preceding non-whitespace character". Sort of a double-negation, but also works at the start of the string/line.

This won't consume any preceding character, won't use any capturing group, and won't match strings such as "[email protected]", which is a valid e-mail address.

Tested:

Input = 'foo bar [email protected] bee @def goo@doo @woo'
Output = 'foo bar [email protected] bee <a href="http://twitter.com/def" target="_blank">@def</a> goo@doo <a href="http://twitter.com/woo" target="_blank">@woo</a>'

回复收藏 0 原文