与SED匹配WebVTT文件中的时间戳

发布于 2025-01-28 18:17:35 字数 428 浏览 4 评论 0原文

我有以下可匹配和删除.webvtt字幕文件(YouTube的默认值)中的PCRE2 REGEX:

^[0-9].:[0-9].:[0-9].+$

这更改了:

00:00:00.126 --> 00:00:10.058
How are you today?

00:00:10.309 --> 00:00:19.272
Not bad, you?

00:00:19.559 --> 00:00:29.365
Been better.

How are you today?

Not bad, you?

Been better.

如何将此PCRE2 REGEX转换为惯用性)等效于sed的正则味道?

I have the following PCRE2 regex that works to match and remove timestamp lines in a .webVTT subtitle file (the default for YouTube):

^[0-9].:[0-9].:[0-9].+$

This changes this:

00:00:00.126 --> 00:00:10.058
How are you today?

00:00:10.309 --> 00:00:19.272
Not bad, you?

00:00:19.559 --> 00:00:29.365
Been better.

To this:

How are you today?

Not bad, you?

Been better.

How would I convert this PCRE2 regex to an idiomatic (read: sane-looking) equivalent for sed's flavour of regex?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

嘿嘿嘿 2025-02-04 18:17:35

使用sed

$ sed -En '/^[0-9].:[0-9].:[0-9].+$/!p' file
How are you today?

Not bad, you?

Been better.

或,不匹配以整数结尾的行

$ sed  -n '/[0-9]$/!p' file
How are you today?

Not bad, you?

Been better.

Using your regex with sed

$ sed -En '/^[0-9].:[0-9].:[0-9].+$/!p' file
How are you today?

Not bad, you?

Been better.

Or, do not match lines that end with an integer

$ sed  -n '/[0-9]$/!p' file
How are you today?

Not bad, you?

Been better.
時窥 2025-02-04 18:17:35

您的模式不是特定的PCRE2模式,仅使用SED,您必须逃脱\+将其作为1次或更多次的量词。

在您使用点匹配任何字符(并查看示例数据)的位置上,也有一个数字。

您可以使模式更具体,并完全省略量词。如果图案匹配,只需防止线打印即可。

sed -n '/^[0-9][0-9]:[0-9][0-9]:[0-9]/!p' file
  • -n防止SED
  • !P在模式不匹配

输出的情况下打印行的默认打印

How are you today?

Not bad, you?

Been better.

Your pattern is not a specific PCRE2 pattern, only using sed you have to escape the \+ to make it a quantifier for 1 or more times.

At the positions that you use a dot to match any character (and looking at the example data) there is a digit as well.

You could make the pattern a bit more specific, and omit the quantifier at all. Just prevent the line from printing if the pattern matches.

sed -n '/^[0-9][0-9]:[0-9][0-9]:[0-9]/!p' file
  • -n prevents the default printing in sed
  • !p prints the line if the pattern does not match

Output

How are you today?

Not bad, you?

Been better.
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文