egrep 正则表达式在 PHP 中工作,但在 unix shell 中不起作用 - 转义问题?

发布于 2024-12-08 18:01:57 字数 563 浏览 0 评论 0原文

我认为我的问题与转义在 PHP 中使用正则表达式与在 Bash 命令行中使用它之间的差异有关。

这是我在 PHP 中工作的正则表达式:

$emailregex = '^[_a-z0-9-]+(\.[_a-z0-9-]+)*@[a-z0-9-]+(\.[a-z0-9-]+)*(\.[a-z]{2,6})$';

所以我尝试在命令行中给出以下内容,但它似乎与任何内容都不匹配。 (其中 emails.txt 是一个很长的纯文本文件,其中包含数千个(可能格式错误)电子邮件地址,每行一个)。

 [root@host dir]# egrep '^[_a-z0-9-]+(\.[_a-z0-9-]+)*@[a-z0-9-]+(\.[a-z0-9-]+)*(\.[a-z]{2,6})$' emails.txt

我尝试用双引号而不是单引号包围正则表达式,但这没有什么区别。 我需要在正则表达式中添加一些反斜杠吗?

解决了!谢谢你! 我的文件是在 Windows 中创建的,行尾标记中的额外 CR 与正则表达式中的美元符号不一致。

I think my problem has something to do with escaping differences between using a regex within PHP versus using it at Bash commandline.

Here is my regex that is working in PHP:

$emailregex = '^[_a-z0-9-]+(\.[_a-z0-9-]+)*@[a-z0-9-]+(\.[a-z0-9-]+)*(\.[a-z]{2,6})

So I try giving the following at commandline and it doesn't seem to match anything.
(where emails.txt is a long plain text file with thousands of (possibly badly-formed) email addresses, one per line).

 [root@host dir]# egrep '^[_a-z0-9-]+(\.[_a-z0-9-]+)*@[a-z0-9-]+(\.[a-z0-9-]+)*(\.[a-z]{2,6})

I have tried surrounding the regex with double-quotemarks instead of single-quotemarks, but it made no difference.
Do I need to add some backslashes into the regex?

SOLVED! Thank you!
My file was created in Windows and extra CR in the END-OF-LINE markers did not agree with the dollar sign in the regex.

;

So I try giving the following at commandline and it doesn't seem to match anything.
(where emails.txt is a long plain text file with thousands of (possibly badly-formed) email addresses, one per line).


I have tried surrounding the regex with double-quotemarks instead of single-quotemarks, but it made no difference.
Do I need to add some backslashes into the regex?

SOLVED! Thank you!
My file was created in Windows and extra CR in the END-OF-LINE markers did not agree with the dollar sign in the regex.

emails.txt

I have tried surrounding the regex with double-quotemarks instead of single-quotemarks, but it made no difference.
Do I need to add some backslashes into the regex?

SOLVED! Thank you!
My file was created in Windows and extra CR in the END-OF-LINE markers did not agree with the dollar sign in the regex.

;

So I try giving the following at commandline and it doesn't seem to match anything.
(where emails.txt is a long plain text file with thousands of (possibly badly-formed) email addresses, one per line).

I have tried surrounding the regex with double-quotemarks instead of single-quotemarks, but it made no difference.
Do I need to add some backslashes into the regex?

SOLVED! Thank you!
My file was created in Windows and extra CR in the END-OF-LINE markers did not agree with the dollar sign in the regex.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

坏尐絯℡ 2024-12-15 18:01:57

单引号应该与 bash 一起使用...

它对我来说适用于这个简单的情况:

echo [email protected] | egrep '^[_a-z0-9-]+(\.[_a-z0-9-]+)*@[a-z0-9-]+(\.[a-z0-9-]+)*(\.[a-z]{2,6})

在您的文本文件中,该行必须仅包含电子邮件地址。行中任何额外的空格都会使其失效。例如,这不会打印任何内容:

echo " [email protected]" | egrep '^[_a-z0-9-]+(\.[_a-z0-9-]+)*@[a-z0-9-]+(\.[a-z0-9-]+)*(\.[a-z]{2,6})

您的问题可能是您有一个 dos 格式的文件。在这种情况下,额外的 \r 将使正则表达式不匹配,因为它会认为行尾有一个额外的字符。您可以对其运行 dos2unix ,或者通过从正则表达式中删除开始和结束标记来减少正则表达式的限制:

egrep '[_a-z0-9-]+(\.[_a-z0-9-]+)*@[a-z0-9-]+(\.[a-z0-9-]+)*(\.[a-z]{2,6})'

在您的文本文件中,该行必须仅包含电子邮件地址。行中任何额外的空格都会使其失效。例如,这不会打印任何内容:


您的问题可能是您有一个 dos 格式的文件。在这种情况下,额外的 \r 将使正则表达式不匹配,因为它会认为行尾有一个额外的字符。您可以对其运行 dos2unix ,或者通过从正则表达式中删除开始和结束标记来减少正则表达式的限制:



您的问题可能是您有一个 dos 格式的文件。在这种情况下,额外的 \r 将使正则表达式不匹配,因为它会认为行尾有一个额外的字符。您可以对其运行 dos2unix ,或者通过从正则表达式中删除开始和结束标记来减少正则表达式的限制:

在您的文本文件中,该行必须仅包含电子邮件地址。行中任何额外的空格都会使其失效。例如,这不会打印任何内容:

您的问题可能是您有一个 dos 格式的文件。在这种情况下,额外的 \r 将使正则表达式不匹配,因为它会认为行尾有一个额外的字符。您可以对其运行 dos2unix ,或者通过从正则表达式中删除开始和结束标记来减少正则表达式的限制:

Single quotes should work with bash...

It works for me with this simple case:

echo [email protected] | egrep '^[_a-z0-9-]+(\.[_a-z0-9-]+)*@[a-z0-9-]+(\.[a-z0-9-]+)*(\.[a-z]{2,6})

In your text file, the line has to only contain the email address. Any additional spaces on the line will throw it off. For example this doesn't print anything:

echo " [email protected]" | egrep '^[_a-z0-9-]+(\.[_a-z0-9-]+)*@[a-z0-9-]+(\.[a-z0-9-]+)*(\.[a-z]{2,6})

Your problem might be that you have a dos formatted file. In that case the extra \r will make it so that the regex doesn't match since it will think there's an extra character at the end of the line. You can run dos2unix against it, or make your regex less restrictive by removing the beginning and end markers from your regex:

egrep '[_a-z0-9-]+(\.[_a-z0-9-]+)*@[a-z0-9-]+(\.[a-z0-9-]+)*(\.[a-z]{2,6})'

In your text file, the line has to only contain the email address. Any additional spaces on the line will throw it off. For example this doesn't print anything:


Your problem might be that you have a dos formatted file. In that case the extra \r will make it so that the regex doesn't match since it will think there's an extra character at the end of the line. You can run dos2unix against it, or make your regex less restrictive by removing the beginning and end markers from your regex:



Your problem might be that you have a dos formatted file. In that case the extra \r will make it so that the regex doesn't match since it will think there's an extra character at the end of the line. You can run dos2unix against it, or make your regex less restrictive by removing the beginning and end markers from your regex:

In your text file, the line has to only contain the email address. Any additional spaces on the line will throw it off. For example this doesn't print anything:

Your problem might be that you have a dos formatted file. In that case the extra \r will make it so that the regex doesn't match since it will think there's an extra character at the end of the line. You can run dos2unix against it, or make your regex less restrictive by removing the beginning and end markers from your regex:

南风几经秋 2024-12-15 18:01:57

WWorks 对我来说:

JPP-MacBookPro-4:tmp jpp$ cat emails.txt
[email protected]
[email protected]
not an email
[email protected]

JPP-MacBookPro-4:tmp jpp$ egrep '^[_a-z0-9-]+(\.[_a-z0-9-]+)*@[a-z0-9-]+(\.[a-z0-9-]+)*(\.[a-z]{2,6})

小心尾随空格/制表符/和返回 - 他们有一种咬正则表达式的方式

这里有一个关于 shell 引用的很棒的参考 http://www.mpi-inf.mpg.de/~uwe/lehre/unixffb/quoting-guide.html

emails.txt [email protected] [email protected] [email protected] JPP-MacBookPro-4:tmp jpp$

小心尾随空格/制表符/和返回 - 他们有一种咬正则表达式的方式

这里有一个关于 shell 引用的很棒的参考 http://www.mpi-inf.mpg.de/~uwe/lehre/unixffb/quoting-guide.html

WWorks for me:

JPP-MacBookPro-4:tmp jpp$ cat emails.txt
[email protected]
[email protected]
not an email
[email protected]

JPP-MacBookPro-4:tmp jpp$ egrep '^[_a-z0-9-]+(\.[_a-z0-9-]+)*@[a-z0-9-]+(\.[a-z0-9-]+)*(\.[a-z]{2,6})

Beware trailing whitespace/tabs/and returns - they have a way of biting regexs

There is a great ref on shell quoting here http://www.mpi-inf.mpg.de/~uwe/lehre/unixffb/quoting-guide.html

emails.txt [email protected] [email protected] [email protected] JPP-MacBookPro-4:tmp jpp$

Beware trailing whitespace/tabs/and returns - they have a way of biting regexs

There is a great ref on shell quoting here http://www.mpi-inf.mpg.de/~uwe/lehre/unixffb/quoting-guide.html

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文