grep utf8/unicode 支持/ u 修饰符

发布于 2025-01-11 16:53:05 字数 475 浏览 0 评论 0原文

我正在尝试验证特定格式的 vtt 文件。正则表达式可以正常工作，但 UTF8 字符会导致问题。我尝试使用 (?u) 但没有成功

我使用的正则表达式是：

grep -P '(?m)^(\d+:\d+[.]\d+\s*-->\s*\d+:\d+[.]\d+|\s*[\w\s]+)|^\s*$' . -r -v

u 标志允许正则表达式在这里按预期工作，https://regex101.com/r/21HW2A/1，但我找不到办法在 grep 中。我是否需要将 \w 交换为所有允许的字母数字字符，或者可以在 grep 中以某种方式使用 u 修饰符吗？

原文

I'm trying to validate vtt files for a particular format. The regex is functional but UTF8 characters are causing issues. I tried using (?u) with no luck

The regex I'm using is:

grep -P '(?m)^(\d+:\d+[.]\d+\s*-->\s*\d+:\d+[.]\d+|\s*[\w\s]+)|^\s*
The u flag allows the regex to work as expected here, https://regex101.com/r/21HW2A/1, but I can't find a way to do that in grep. Do I need to swap the \w to all allowed alphanumeric chars or can the u modifier be used in grep somehow?
 . -r -v

The u flag allows the regex to work as expected here, https://regex101.com/r/21HW2A/1, but I can't find a way to do that in grep. Do I need to swap the \w to all allowed alphanumeric chars or can the u modifier be used in grep somehow?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

自此以后，行同陌路 2025-01-18 16:53:05

\w 可以转换为 \p{L}，它不需要 u 修饰符来支持 unicode。

完整解决方案：

grep -P '(?m)^(\d+:\d+[.]\d+\s*-->\s*\d+:\d+[.]\d+|\s*[\p{L}\s]+)|^\s*
 . -r -v

The \w can be converted to \p{L} which doesn't require the u modifier for unicode support.

Full solution:

grep -P '(?m)^(\d+:\d+[.]\d+\s*-->\s*\d+:\d+[.]\d+|\s*[\p{L}\s]+)|^\s*
 . -r -v

回复收藏 0 原文

~没有更多了~

关于作者

荒芜了季节

暂无简介

文章

26 人气

关注发私信

友情链接

文江博客

grep utf8/unicode 支持/ u 修饰符

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

卷耳

佚名

℉服软

qq_2gSKZM

凉宸

gyhjy

友情链接

grep utf8/unicode 支持/ u 修饰符

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

卷耳

佚名

℉服软

qq_2gSKZM

凉宸

gyhjy

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。