如何确保我的正则表达式不会匹配太多

发布于 2024-11-30 14:07:12 字数 929 浏览 2 评论 0原文

文件中的单词很少,且开头带有数字。我想提取特定的 no 行。当给定 1 时,它也会提取第 1 行以及 11, 21

FILE.txt 的内容:

1.sample
lines of
2.sentences
present in
...
...
10.the 
11.file

当执行时 pro 1 file.txt 给出第 1,10 行和第 11 行的结果 因为这三个结果的字符串中有 1。即

脚本的输出:

1.sample
10.the 
11.file

预期输出:我期望的输出 仅是第 1 行内容,而不是第 10 行或第 11 行内容。 即

预期输出:

1.sample

我当前的代码:

proc pro { pattern args} {

   set file [open $args r]
   set lnum 0
   set occ 0 
   while {[gets $file line] >=0} {
      incr lnum
      if {[regexp $pattern $line]} {
          incr occ
          puts "The pattern is present in line: $lnum" 
          puts "$line"
      } else {
         puts "not found"
      }
   }
   puts "total number of occurencese : $occ"
   close $file
}

程序工作正常,但问题是我正在检索我不希望与预期行一起检索的行。由于我想要检索的数字 (1) 存在于其他字符串中,例如 11、21、14 等,这些行也会被打印。

请容忍我解释问题的不清楚方式。

A file has few words with numbers in the begining of them. i want to extract a particular no line.when given 1, it extracts line 1 also with 11, 21

FILE.txt has contents:

1.sample
lines of
2.sentences
present in
...
...
10.the 
11.file

when Executed pro 1 file.txt
gives results from line 1,10 and also from line 11
as these three results have 1 in their string. i.e

Output of the script:

1.sample
10.the 
11.file

Expected output: the output which i am expecting
is only line 1 contents and not the line 10 or line 11 contents.
i.e

Expected output:

1.sample

My current code:

proc pro { pattern args} {

   set file [open $args r]
   set lnum 0
   set occ 0 
   while {[gets $file line] >=0} {
      incr lnum
      if {[regexp $pattern $line]} {
          incr occ
          puts "The pattern is present in line: $lnum" 
          puts "$line"
      } else {
         puts "not found"
      }
   }
   puts "total number of occurencese : $occ"
   close $file
}

the program is working fine but the thing is i am retrieving lines that i dont want to along with the expected line. As the number (1) which i want to retrieve is present in the other strings such as 11, 21, 14 etc these lines are also getting printed.

kindly tolerate my unclear way of explaining the question.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

零度℉ 2024-12-07 14:07:12

您可以按照 glen 建议使用单词边界来解决问题,但您还可以考虑以下事项:

如果每个行号后面都有一个 . 那么您可以将其用作正则表达式中的分隔符

regexp "^$lineNo\\." $a

我也会建议使用 ^ (在行开头匹配),这样即使数字出现在该行其他地方,也不会被计算在内。

tcl 单词边界在 http://www.regular-expressions.info/wordboundaries.html< /a>

You can solve the problem using word boundaries as suggested by glen but you can also consider the following things:

If after every line number there is a . then you can use it as delimiter in regular expression

regexp "^$lineNo\\." $a

I would also suggest to use ^ (match at the beginning of line) so that even if number is present in the line elsewhere it would not get counted.

tcl word boundaries are well explained at http://www.regular-expressions.info/wordboundaries.html

诗化ㄋ丶相逢 2024-12-07 14:07:12

您必须确保您的模式仅在单词边界之间匹配:

if {[regexp "\\m$pattern\\M" $line]} { ...

请参阅 正则表达式语法

You have to ensure your pattern matches only between word boundaries:

if {[regexp "\\m$pattern\\M" $line]} { ...

See the documentation for regular expression syntax.

森林散布 2024-12-07 14:07:12

如果您想要做的事情与您所描述的一样受到限制,为什么不直接使用类似的东西

if { [string range $line 0 [string length $pattern]] eq "${pattern}." } {
    ...
}

If what you're looking to do is as constrained as what you're describing, why not just use something like

if { [string range $line 0 [string length $pattern]] eq "${pattern}." } {
    ...
}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文