TCL 正则表达式示例
我想通过编写正则表达式来获取以 abc_ 或 xyz_ 开头的字符串中的单词。 这是我的脚本:
[regexp -nocase -- {.*\s+(abc_|xyz_\S+)\s+.*} $str all necessaryStr]
因此,如果我在 str1 和 str2 上应用上面编写的正则表达式,我想从 $str1 获取“xyz_hello”,从 $str2 获取“abc_bye”。
set str1 "gfrdgasjklh dlasd =-0-489 xyz_hello sddf 89rn sf n9"
set str2 "dytfasjklh abc_bye dlasd =-0tyj-489 sddf tyj89rn sjf n9"
但我的正则表达式不起作用。我的问题是:
1)我的正则表达式有什么问题? 2)使用正则表达式查找以某些预定义前缀开头的作品是否很好,或者最好使用字符串函数(字符串匹配等)?
I want to get a word in a string which starts with abc_ or with xyz_ by writing a regexp.
Here my script:
[regexp -nocase -- {.*\s+(abc_|xyz_\S+)\s+.*} $str all necessaryStr]
So if I apply the above written regexp on str1 and str2 I want to get "xyz_hello" from $str1 and "abc_bye" from $str2.
set str1 "gfrdgasjklh dlasd =-0-489 xyz_hello sddf 89rn sf n9"
set str2 "dytfasjklh abc_bye dlasd =-0tyj-489 sddf tyj89rn sjf n9"
But my regexps does not work. And my questions are:
1) What is wrong with my regexp?
2) Is it good to find the works starting with some predefined prefixes with regexp or it is better to use string functions (string match or so)?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
您的问题并不清楚什么是单词。是否允许进一步使用下划线?允许使用数字吗?那么“仅由前缀组成的单词”(例如“abc_”或“xyz”)怎么样?
做出保守的假设(基于您的示例),您只期望英文字母表中的字母,至少还有一个字符,并且您不关心大小写,您可以简化您的正则表达式:
这将设置
match< /code> 到匹配的单词。如果您对某个单词的定义与我的假设不同,您可以替换方括号中的内容。
关于是否更喜欢正则表达式而不是字符串函数的第二个问题将取决于上下文,并且可能会导致主观领域。
需要考虑的一些事情:
我的建议是使用您最舒服的方式。为您的代码编写一组良好的单元测试,然后仅当您在分析过程中发现了瓶颈时才进行优化。
It is not clear in your question what consitutes a word. Are further underscores permitted? Are digits permitted? What about "words that consist of just the prefix", e.g. "abc_" or "xyz"?
Making the conservative assumptions (based on your examples) that you are expecting only letters from the English alphabet, at least one further character, and you don't care about case, you can simplify your regexp:
This will set
match
to the matching word. You can replace the conents of the square brackets if your definition of a word differs from my assumptions.Your second question about whether to prefer regexp to string functions will depend upon context, and could lead into subjective territory.
Some things to consider:
My recommendation would be to use whichever you are most comfortable with. Write a good set of unit tests for your code, then optimise later only if you have identified a bottleneck there during profiling.
根据您所写的内容,您似乎是以
abc_
或xyz_
(无论如何)开头的单词,后面只有字母。匹配这个的一个很好的第一次尝试是这样的:它的特殊功能是:
\y
意味着它只在单词开头匹配(理论上单词结尾也是如此,但在所有情况下我们都在它后面跟一个字母! )(?:…)
进行分组而不捕获\w
或\S
而不是[az]
,但这些确实会改变匹配内容的语义 (\w< /code> 会告诉你程序标识符中通常允许使用哪些符号,而
\S
会告诉你非空格)。On the basis of what you've written, you seem to be words beginning with
abc_
orxyz_
(in any case) and having just letters after that. A good first attempt at matching this is this:The special features of this are:
\y
means this only matches at word start (theoretically word end too, but we follow it by a letter in all cases!)(?:…)
is grouping without capturing\w
or\S
instead of[a-z]
, but these do change the semantics of what's matched (\w
will give you about what symbols are usually allowed in program identifiers, and\S
will give you non-spaces).我已经修复了它:
[regexp -nocase -- {.*\s+((abc_|xyz_)\S+)\s+.*} $str all requiredStr ]
但仍然想知道正则表达式是否是最佳解决方案或字符串函数更好(更快) 、方便、灵活)。
I have fixed it:
[regexp -nocase -- {.*\s+((abc_|xyz_)\S+)\s+.*} $str all necessaryStr ]
But still would like to know if the regexp is the best solution or string function are better (faster, convenient, flexible).