使用 Tcl 正则表达式从段落中提取前面有两个不同字符串的 2 个数字
我需要提取两个不同的数字,前面有两个不同的字符串。 员工 ID--> Employee16(我需要 16)和
员工链接-->员工链接:2
(我需要 2 个)。 源字符串如下所示:
Employee16, Employee name is QueenRose
Working for 46w0d
Billing is Distributed
65537 assigned tasks, 0 reordered, 0 unassigned
0 discarded, 0 lost received, 5/255 load
received sequence unavailable, 0xC2E7 sent sequence
Employee links: 2 active, 0 inactive (max not set, min not set)
Dt3/5/10:0, since 46w0d, no tasks pending
Dt3/5/10:10, since 21w0d, no tasks rcvd
Employee is currently working in Hardware section.
Employee19, Employee name is Edward11
Working for 48w4d
Billing is Distributed
206801498 assigned tasks, 0 reordered, 0 unassigned
655372 discarded, 0 lost received, 9/255 load
received sequence unavailable, 0x23CA sent sequence
Employee links: 7 active, 0 inactive (max not set, min not set)
Dt3/5/10:0, since 47w2d, tasks pending
Dt3/5/10:10, since 28w6d, no tasks pending
Dt3/5/10:11, since 18w4d, no tasks pending
Dt3/5/10:12, since 18w4d, no tasks pending
Dt3/5/10:13, since 18w4d, no tasks pending
Dt3/5/10:14, since 18w4d, no tasks pending
Dt3/5/10:15, since 7w2d, no tasks pending
Employee is currently working in Hardware sectione.
Employee6 (inactive)
Employee links: 2
Dt3/5/10:0 (inactive)
Dt3/5/10:10 (inactive)
Employee7 (inactive)
Employee links: 2
Dt3/5/10:0 (inactive)
Dt3/5/10:10 (inactive)
尝试了以下内容:
Employee(\d+)[^\n\r]*[^M]*Employee links:\s+(\d+)
期望输出如下:
16 2
19 7
6 2
7 2
但未列出所有 ID 和链接。 有人能帮我得到这个吗?
I need to extract two different numbers preceded by two different strings.Employee Id--> Employee16
(I need 16) andEmployee links--> Employee links:2
(I need 2).
Source String looks like following:
Employee16, Employee name is QueenRose
Working for 46w0d
Billing is Distributed
65537 assigned tasks, 0 reordered, 0 unassigned
0 discarded, 0 lost received, 5/255 load
received sequence unavailable, 0xC2E7 sent sequence
Employee links: 2 active, 0 inactive (max not set, min not set)
Dt3/5/10:0, since 46w0d, no tasks pending
Dt3/5/10:10, since 21w0d, no tasks rcvd
Employee is currently working in Hardware section.
Employee19, Employee name is Edward11
Working for 48w4d
Billing is Distributed
206801498 assigned tasks, 0 reordered, 0 unassigned
655372 discarded, 0 lost received, 9/255 load
received sequence unavailable, 0x23CA sent sequence
Employee links: 7 active, 0 inactive (max not set, min not set)
Dt3/5/10:0, since 47w2d, tasks pending
Dt3/5/10:10, since 28w6d, no tasks pending
Dt3/5/10:11, since 18w4d, no tasks pending
Dt3/5/10:12, since 18w4d, no tasks pending
Dt3/5/10:13, since 18w4d, no tasks pending
Dt3/5/10:14, since 18w4d, no tasks pending
Dt3/5/10:15, since 7w2d, no tasks pending
Employee is currently working in Hardware sectione.
Employee6 (inactive)
Employee links: 2
Dt3/5/10:0 (inactive)
Dt3/5/10:10 (inactive)
Employee7 (inactive)
Employee links: 2
Dt3/5/10:0 (inactive)
Dt3/5/10:10 (inactive)
Tried with the following:
Employee(\d+)[^\n\r]*[^M]*Employee links:\s+(\d+)
Expecting output to be like:
16 2
19 7
6 2
7 2
But is not listing all the Ids and links.
Can anybody help me getting this?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
最简单的方法是通过两个单独的匹配步骤从两个不同的位置提取。如果您首先将整个文本分成几个段落,那么到目前为止也是最简单的。
我会像这样提取一个:(
此任务需要行匹配模式,而不是默认的“整个字符串”匹配模式。)
对于这个,再次假设我们只查看单个员工的总体记录:
在这种情况下,我不仅提取了
$ links
以及该行的$rest
,因为您似乎需要能够考虑这是否重要。当然,以下内容可能更有用:在这种情况下,如果仅存在第一个数字,则
$inactiveLinks
将有一个空字符串(这似乎在员工不活动时发生) ;在这种情况下,你需要做一些简单的逻辑来整理)。最后,当使用
regexp
时,不要忘记检查结果是否匹配!希望这有帮助。
It's easiest to extract from the two different locations as two separate matching steps. It's also by far easiest if you split the whole text up into paragraphs first.
I'd extract a single one like this:
(You want line matching mode for this task, rather than the default "whole string" matching mode.)
For this one, again already assuming that we're only looking at the overall record for a single employee:
In this case, I've extracted not just the
$links
but also the$rest
of the line, since it seems that you might need to be able to think about whether that matters. Of course, it might be that the following is even more useful:In this case, the
$inactiveLinks
will have an empty string if only the first number was present (which seems to happen when the employee is inactive; you'll need to do a trivial bit of logic to tidy up in that case).Finally, when using
regexp
, don't forget to check the result to see if it matched!Hope this helps.
我本来打算提供一个完整的答案,但后来我读了多纳尔更有帮助的教程,觉得我做不到。我将展示如何将文本分成段落:
在您的尝试中,我看到
[^\n\r]*
-- 您确定有回车符吗在你的文本和换行符中?I was going to provide a complete answer, but then I read Donal much more helpful tutorial and felt I just couldn't. I will show how to split the text up into paragraphs though:
In your attempt, I see
[^\n\r]*
-- are you sure you have carriage returns in your text as well as newlines?