grep utf8/unicode 支持/ u 修饰符
我正在尝试验证特定格式的 vtt 文件。正则表达式可以正常工作,但 UTF8 字符会导致问题。我尝试使用 (?u)
但没有成功
我使用的正则表达式是:
grep -P '(?m)^(\d+:\d+[.]\d+\s*-->\s*\d+:\d+[.]\d+|\s*[\w\s]+)|^\s*$' . -r -v
u
标志允许正则表达式在这里按预期工作,https://regex101.com/r/21HW2A/1,但我找不到办法在 grep
中。我是否需要将 \w
交换为所有允许的字母数字字符,或者可以在 grep
中以某种方式使用 u
修饰符吗?
I'm trying to validate vtt files for a particular format. The regex is functional but UTF8 characters are causing issues. I tried using (?u)
with no luck
The regex I'm using is:
grep -P '(?m)^(\d+:\d+[.]\d+\s*-->\s*\d+:\d+[.]\d+|\s*[\w\s]+)|^\s*
The u
flag allows the regex to work as expected here, https://regex101.com/r/21HW2A/1, but I can't find a way to do that in grep
. Do I need to swap the \w
to all allowed alphanumeric chars or can the u
modifier be used in grep
somehow?
. -r -v
The u
flag allows the regex to work as expected here, https://regex101.com/r/21HW2A/1, but I can't find a way to do that in grep
. Do I need to swap the \w
to all allowed alphanumeric chars or can the u
modifier be used in grep
somehow?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
\w
可以转换为\p{L}
,它不需要u
修饰符来支持 unicode。完整解决方案:
The
\w
can be converted to\p{L}
which doesn't require theu
modifier for unicode support.Full solution: