为什么 Textpad 会询问您是否要使用 POSIX 正则表达式语法?

发布于 2024-07-28 23:46:25 字数 741 浏览 8 评论 0原文

我需要从文档中分离出一堆图像网址,其中图像与这样的名称相关联:

bellpepper = "http://images.com/bellpepper.jpg"
cabbage = "http://images.com/cabbage.jpg"
lettuce = "http://images.com/lettuce.jpg"
pumpkin = "http://images.com/pumpkin.jpg"

我想通过删除变量名称、等号和双引号来删除文件中除网址之外的所有文本,所以我有一个新文件,其中只是 URL 列表,每行一个。

我尝试过各种方法来识别非- 通过选中“查找”对话框窗口中的“正则表达式”复选框,在 Textpad 中使用正则表达式的 URL 数据,但 Textpad 似乎不喜欢其中任何一个。

下面

Configure->Preferences->Editor

有一个选项:

"Use POSIX regular expression syntax"

相对于什么?

我执行此正则表达式操作时遇到的问题是否可能与 Textpad 的正则表达式实现的一些怪癖有关?

I need to separate out a bunch of image urls from a document in which the images are associated with names like this:

bellpepper = "http://images.com/bellpepper.jpg"
cabbage = "http://images.com/cabbage.jpg"
lettuce = "http://images.com/lettuce.jpg"
pumpkin = "http://images.com/pumpkin.jpg"

I want to remove all text except the URLs from the file by deleting the variable name, equals sign and double quotes so I have a new file that is just a list of URLs, one per line.

I've tried various ways of identifying the non-URL data using regular expressions in Textpad by checking the "Regular expression" checkbox in the Find dialog window but Textpad doesn't seem to like any of them.

Under

Configure->Preferences->Editor

there's an option:

"Use POSIX regular expression syntax"

As opposed to what?

Is it possible that my problems performing this regex operation have to do with some quirk of Textpad's implementation of regex?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

翻身的咸鱼 2024-08-04 23:46:25

POSIX 替代方案与 TextPad 默认值相反。 从搜索/替换帮助文档:

TextPad 的正则表达式基于 POSIX 标准 P1003.2,但语法可以是 POSIX 或 UNIX 扩展正则表达式(默认)的语法。

要在 TextPad 中完成工作,请使用以下命令:

Find in: ^[^"]*"\([^"]*\)"
Replace with: \1

编辑:

分解表达式:

^ - start of line
[^"]* - in a set the caret ^ is for negation, 
        so a greedy match of anything that is not a "
        in this case, everything up to the first quote
" - the first quote per line in your source text
\(...\) - puts together a group that can be referenced later
[^"]* - same explanation as above, this time matching the url in question
" - the last quote on the line

此外,查看 TextPad 中有关正则表达式的帮助文档,有一个合法表达式的图表,其中列出了 ' “默认”和“POSIX”版本并排。 唯一的区别似乎是默认值中分组括号 () 和出现次数花句 {} 的转义,以及 POSIX 版本中缺少转义。

考虑到这一点,要在选中“使用 POSIX 正则表达式语法”选项的情况下在 TextPad 中完成工作,请将上面的“Find in”表达式替换为以下内容:

Find in: ^[^"]*"([^"]*)"

The POSIX alternative is as opposed to the TextPad default. From the Search/Replace help doc:

TextPad's regular expressions are based on POSIX standard P1003.2, but the syntax can be that of POSIX, or UNIX extended regular expressions (the default).

to get the job done in TextPad, use the following:

Find in: ^[^"]*"\([^"]*\)"
Replace with: \1

edit:

to break the expression down:

^ - start of line
[^"]* - in a set the caret ^ is for negation, 
        so a greedy match of anything that is not a "
        in this case, everything up to the first quote
" - the first quote per line in your source text
\(...\) - puts together a group that can be referenced later
[^"]* - same explanation as above, this time matching the url in question
" - the last quote on the line

Also, looking through the help doc on Regex in TextPad, there is a chart of legal expressions listing both the 'Default' and the 'POSIX' versions side by side. The only difference seems to be the escaping of the Grouping parens () and the Occurance curlies {} in the Default and the lack of escaping in the POSIX version.

With that in mind, to get the job done in TextPad with the 'use POSIX regular expression syntax' option checked, swap out the above 'Find in' expression with the following:

Find in: ^[^"]*"([^"]*)"
岁月苍老的讽刺 2024-08-04 23:46:25

除了 POSIX 之外,还有 Perl 风格的正则表达式。

Besides POSIX there are also Perl style regular expressions.

拥抱没勇气 2024-08-04 23:46:25

原始的基本正则表达式(例如“sed”上的正则表达式)与我们最常用的正则表达式有一些差异。 例如,您使用 \(\) 来指示组,而不是 ( 和 ),并且没有“+”修饰符。

另外,我在链接的问题上注意到您的“*”位于括号之外而不是内部。 这意味着第一组中只会匹配一个字符。

The original basic regular expressions, such as may be found on "sed", have some differences to what we most often use. For example, you use \( and \) to indicate groups, instead of ( and ), and there is no "+" modifier.

Also, I note on the linked question that your "*" is outside the parenthesis instead of inside. That means only one char will be matched on the first group.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文