使用 Tcl 正则表达式从段落中提取前面有两个不同字符串的 2 个数字

发布于 2024-08-31 15:12:19 字数 1662 浏览 3 评论 0原文

我需要提取两个不同的数字,前面有两个不同的字符串。 员工 ID--> Employee16(我需要 16)和 员工链接-->员工链接:2(我需要 2 个)。 源字符串如下所示:

Employee16, Employee name is QueenRose
  Working for 46w0d
  Billing is Distributed
  65537 assigned tasks, 0 reordered, 0 unassigned
  0 discarded, 0 lost received, 5/255 load
  received sequence unavailable, 0xC2E7 sent sequence
  Employee links: 2 active, 0 inactive (max not set, min not set)
    Dt3/5/10:0, since 46w0d, no tasks pending
    Dt3/5/10:10, since 21w0d, no tasks rcvd
 Employee is currently working in Hardware section.

Employee19, Employee name is Edward11
  Working  for 48w4d
  Billing is Distributed
  206801498 assigned tasks, 0 reordered, 0 unassigned
  655372 discarded, 0 lost received, 9/255 load
  received sequence unavailable, 0x23CA sent sequence
  Employee links: 7 active, 0 inactive (max not set, min not set)
    Dt3/5/10:0, since 47w2d, tasks pending
    Dt3/5/10:10, since 28w6d, no tasks pending
    Dt3/5/10:11, since 18w4d, no tasks pending
    Dt3/5/10:12, since 18w4d, no tasks pending
    Dt3/5/10:13, since 18w4d, no tasks pending
    Dt3/5/10:14, since 18w4d, no tasks pending
    Dt3/5/10:15, since 7w2d, no tasks pending
   Employee is currently working in Hardware sectione.

Employee6 (inactive)
  Employee links: 2
    Dt3/5/10:0 (inactive)
    Dt3/5/10:10 (inactive)

Employee7 (inactive)
  Employee links: 2
    Dt3/5/10:0 (inactive)
    Dt3/5/10:10 (inactive)

尝试了以下内容:

Employee(\d+)[^\n\r]*[^M]*Employee links:\s+(\d+)

期望输出如下:

16  2
19  7
 6  2
 7  2

但未列出所有 ID 和链接。 有人能帮我得到这个吗?

I need to extract two different numbers preceded by two different strings.
Employee Id--> Employee16(I need 16) and
Employee links--> Employee links:2 (I need 2).
Source String looks like following:

Employee16, Employee name is QueenRose
  Working for 46w0d
  Billing is Distributed
  65537 assigned tasks, 0 reordered, 0 unassigned
  0 discarded, 0 lost received, 5/255 load
  received sequence unavailable, 0xC2E7 sent sequence
  Employee links: 2 active, 0 inactive (max not set, min not set)
    Dt3/5/10:0, since 46w0d, no tasks pending
    Dt3/5/10:10, since 21w0d, no tasks rcvd
 Employee is currently working in Hardware section.

Employee19, Employee name is Edward11
  Working  for 48w4d
  Billing is Distributed
  206801498 assigned tasks, 0 reordered, 0 unassigned
  655372 discarded, 0 lost received, 9/255 load
  received sequence unavailable, 0x23CA sent sequence
  Employee links: 7 active, 0 inactive (max not set, min not set)
    Dt3/5/10:0, since 47w2d, tasks pending
    Dt3/5/10:10, since 28w6d, no tasks pending
    Dt3/5/10:11, since 18w4d, no tasks pending
    Dt3/5/10:12, since 18w4d, no tasks pending
    Dt3/5/10:13, since 18w4d, no tasks pending
    Dt3/5/10:14, since 18w4d, no tasks pending
    Dt3/5/10:15, since 7w2d, no tasks pending
   Employee is currently working in Hardware sectione.

Employee6 (inactive)
  Employee links: 2
    Dt3/5/10:0 (inactive)
    Dt3/5/10:10 (inactive)

Employee7 (inactive)
  Employee links: 2
    Dt3/5/10:0 (inactive)
    Dt3/5/10:10 (inactive)

Tried with the following:

Employee(\d+)[^\n\r]*[^M]*Employee links:\s+(\d+)

Expecting output to be like:

16  2
19  7
 6  2
 7  2

But is not listing all the Ids and links.
Can anybody help me getting this?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

云醉月微眠 2024-09-07 15:12:19

最简单的方法是通过两个单独的匹配步骤从两个不同的位置提取。如果您首先将整个文本分成几个段落,那么到目前为止也是最简单的。

员工 ID--> Employee16(我需要 16)

我会像这样提取一个:(

regexp -line {^Employee(\d+),} $paragraph -> employeeNumber

此任务需要行匹配模式,而不是默认的“整个字符串”匹配模式。)

员工链接-->员工链接:2(我需要2)

对于这个,再次假设我们只查看单个员工的总体记录:

regexp -line {^\s+Employee links:\s*(\d+)(.*)$} $paragraph -> links rest

在这种情况下,我不仅提取了 $ links 以及该行的 $rest ,因为您似乎需要能够考虑这是否重要。当然,以下内容可能更有用:

regexp -line {^\s+Employee links:\s*(\d+)(?:\s+active,\s+(\d+)\s+inactive)?} \
        $paragraph -> activeLinks inactiveLinks

在这种情况下,如果仅存在第一个数字,则 $inactiveLinks 将有一个空字符串(这似乎在员工不活动时发生) ;在这种情况下,你需要做一些简单的逻辑来整理)。

最后,当使用regexp时,不要忘记检查结果是否匹配!
希望这有帮助。

It's easiest to extract from the two different locations as two separate matching steps. It's also by far easiest if you split the whole text up into paragraphs first.

Employee Id--> Employee16 (I need 16)

I'd extract a single one like this:

regexp -line {^Employee(\d+),} $paragraph -> employeeNumber

(You want line matching mode for this task, rather than the default "whole string" matching mode.)

Employee links--> Employee links:2 (I need 2)

For this one, again already assuming that we're only looking at the overall record for a single employee:

regexp -line {^\s+Employee links:\s*(\d+)(.*)$} $paragraph -> links rest

In this case, I've extracted not just the $links but also the $rest of the line, since it seems that you might need to be able to think about whether that matters. Of course, it might be that the following is even more useful:

regexp -line {^\s+Employee links:\s*(\d+)(?:\s+active,\s+(\d+)\s+inactive)?} \
        $paragraph -> activeLinks inactiveLinks

In this case, the $inactiveLinks will have an empty string if only the first number was present (which seems to happen when the employee is inactive; you'll need to do a trivial bit of logic to tidy up in that case).

Finally, when using regexp, don't forget to check the result to see if it matched!
Hope this helps.

霓裳挽歌倾城醉 2024-09-07 15:12:19

我本来打算提供一个完整的答案,但后来我读了多纳尔更有帮助的教程,觉得我做不到。我将展示如何将文本分成段落:

foreach paragraph [regexp -all -inline {.*?\n{2,}} $text] {
    do something with $paragraph
}

在您的尝试中,我看到 [^\n\r]* -- 您确定有回车符吗在你的文本和换行符中?

I was going to provide a complete answer, but then I read Donal much more helpful tutorial and felt I just couldn't. I will show how to split the text up into paragraphs though:

foreach paragraph [regexp -all -inline {.*?\n{2,}} $text] {
    do something with $paragraph
}

In your attempt, I see [^\n\r]* -- are you sure you have carriage returns in your text as well as newlines?

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文