如何确保我的正则表达式不会匹配太多
文件中的单词很少,且开头带有数字。我想提取特定的 no 行。当给定 1 时,它也会提取第 1 行以及 11, 21
FILE.txt 的内容:
1.sample lines of 2.sentences present in ... ... 10.the 11.file
当执行时 pro 1 file.txt
给出第 1,10 行和第 11 行的结果 因为这三个结果的字符串中有 1。即
脚本的输出:
1.sample 10.the 11.file
预期输出:我期望的输出 仅是第 1 行内容,而不是第 10 行或第 11 行内容。 即
预期输出:
1.sample
我当前的代码:
proc pro { pattern args} {
set file [open $args r]
set lnum 0
set occ 0
while {[gets $file line] >=0} {
incr lnum
if {[regexp $pattern $line]} {
incr occ
puts "The pattern is present in line: $lnum"
puts "$line"
} else {
puts "not found"
}
}
puts "total number of occurencese : $occ"
close $file
}
程序工作正常,但问题是我正在检索我不希望与预期行一起检索的行。由于我想要检索的数字 (1) 存在于其他字符串中,例如 11、21、14 等,这些行也会被打印。
请容忍我解释问题的不清楚方式。
A file has few words with numbers in the begining of them. i want to extract a particular no line.when given 1, it extracts line 1 also with 11, 21
FILE.txt has contents:
1.sample lines of 2.sentences present in ... ... 10.the 11.file
when Executed pro 1 file.txt
gives results from line 1,10 and also from line 11
as these three results have 1 in their string. i.e
Output of the script:
1.sample 10.the 11.file
Expected output: the output which i am expecting
is only line 1 contents and not the line 10 or line 11 contents.
i.e
Expected output:
1.sample
My current code:
proc pro { pattern args} {
set file [open $args r]
set lnum 0
set occ 0
while {[gets $file line] >=0} {
incr lnum
if {[regexp $pattern $line]} {
incr occ
puts "The pattern is present in line: $lnum"
puts "$line"
} else {
puts "not found"
}
}
puts "total number of occurencese : $occ"
close $file
}
the program is working fine but the thing is i am retrieving lines that i dont want to along with the expected line. As the number (1) which i want to retrieve is present in the other strings such as 11, 21, 14 etc these lines are also getting printed.
kindly tolerate my unclear way of explaining the question.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
您可以按照 glen 建议使用单词边界来解决问题,但您还可以考虑以下事项:
如果每个行号后面都有一个
.
那么您可以将其用作正则表达式中的分隔符我也会建议使用
^
(在行开头匹配),这样即使数字出现在该行其他地方,也不会被计算在内。tcl 单词边界在 http://www.regular-expressions.info/wordboundaries.html< /a>
You can solve the problem using word boundaries as suggested by glen but you can also consider the following things:
If after every line number there is a
.
then you can use it as delimiter in regular expressionI would also suggest to use
^
(match at the beginning of line) so that even if number is present in the line elsewhere it would not get counted.tcl word boundaries are well explained at http://www.regular-expressions.info/wordboundaries.html
您必须确保您的模式仅在单词边界之间匹配:
请参阅 正则表达式语法。
You have to ensure your pattern matches only between word boundaries:
See the documentation for regular expression syntax.
如果您想要做的事情与您所描述的一样受到限制,为什么不直接使用类似的东西
If what you're looking to do is as constrained as what you're describing, why not just use something like