以任何顺序为3个必需组

发布于 2025-01-26 18:04:52 字数 1005 浏览 1 评论 0原文

我有一条文本行,我正在尝试将输入的用户分开,该笔记具有3个所需零件 - 任何长度和内容的注释,已知格式的日期以及用户的2-4个字母缩写。所有零件都是必需的,但是我发现在我要解析的注释中,用户定期输入所有6个可能的订单:

1/1/21 pb这是注释
PB 1/1/21这是注释
PB这是注释1/1/21
1/1/21这是注释PB
这是注释1/1/21 PB
这是注释PB 1/1/21

,我使用了名为Capture组的命名,以使我的生活变得轻松,因此这3个部分的Regexes在下面。正则是由于某些特定于行业的符号而长期造成的。足以说这组的正则是书面的,而且效果很好。

(?<note>.*?)
(?<initials>[A-Z]{2,4})
(?<date>TBD)

用户还总是在零件之间放置某种视觉分离器角色,例如我上面使用的空间。 [: - ]+涵盖我发现的所有情况。上面的第一个子弹的正则是:

^\w*(?<date>TBD)[ :-]+(?<initials>[A-Z]{2,4})[ :-]+(?<note>.*?)\w*$

分隔符有意从捕获组中删除。

那么,我将如何写这篇文章,以使三个必需的零件中的每一个都匹配但按任何顺序且不重复?我相信有条件的群体或查找是解决方案,但是我在想到如何到达任何有效的东西时遇到了很多麻烦。

另外,作为奖励头痛,我已经注意到,如以下问题所致:

不要续订-KF 4/1/22

我的上述迫克不知道“不续订”是否是注释,而“ kf”是缩写,或者“不续订-KF”是注释和“ do”是缩写。作为人类,我们看到-分隔符,并知道哪个选项正确。因此,如果在模棱两可的情况下更喜欢具有非Whitespace分离器Char的比赛,那就太神奇了。

I have a single line of text, and I'm trying to split up the notes a user entered which has 3 required parts - a note of any length and content, a date of known formatting, and the user's 2-4 letter initials. All parts are required, but I have found that in the notes I'm parsing, users have regularly entered all 6 possible orderings:

1/1/21 PB This is a note
PB 1/1/21 This is a note
PB This is a note 1/1/21
1/1/21 This is a note PB
This is a note 1/1/21 PB
This is a note PB 1/1/21

As I am using .NET, I'm used named capture groups to make my life easy, so the regexes for the 3 parts are below. The regex is long due to some industry-specific notation; suffice to say the regex for this group is written and works well.

(?<note>.*?)
(?<initials>[A-Z]{2,4})
(?<date>TBD)

Users also always put some kind of visual separator character between the parts, like the space I used above; [ :-]+ covers all the cases I've found. A regex for the first bullet above would look like:

^\w*(?<date>TBD)[ :-]+(?<initials>[A-Z]{2,4})[ :-]+(?<note>.*?)\w*$

The separator chars are intentionally dropped from the capture groups.

So how would I go about writing this such that each of the 3 required pieces are matched but in any order and not repeated? I believe either conditional groups or lookaround is the solution, but I'm having a lot of trouble figuring how to arrive at anything that works.

Also, as a bonus headache, I have noticed that notes like the following cause problems:

DO NOT RENEW - KF 4/1/22

My regex above doesn't know if "DO NOT RENEW" is the note and "KF" is the initials or if "NOT RENEW - KF" is the note and "DO" is the initials. As humans we see the - separator and know which option is correct. So it would be amazing if, in an ambiguous situation, prefer the match which has a non-whitespace separator char.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

花落人断肠 2025-02-02 18:04:52

在C#中,您可以重复使用命名的捕获组,并使用更改与所有表格匹配。

例如,匹配前3行:

^(?:(?<date>\d+/\d+/\d+)[ :-]+(?<initials>[A-Z]{2,4})[ :-]+(?<note>.*)|(?<initials>[A-Z]{2,4})[ :-]+(?<date>\d+/\d+/\d+)[ :-]+(?<note>.*)|(?<initials>[A-Z]{2,4})[ :-]+(?<note>.*?)[ :-]+(?<date>\d+/\d+/\d+))$

请参阅a regex demo

In C# you can reuse the named capture groups and use an alteration to match all forms.

For example matching the first 3 lines:

^(?:(?<date>\d+/\d+/\d+)[ :-]+(?<initials>[A-Z]{2,4})[ :-]+(?<note>.*)|(?<initials>[A-Z]{2,4})[ :-]+(?<date>\d+/\d+/\d+)[ :-]+(?<note>.*)|(?<initials>[A-Z]{2,4})[ :-]+(?<note>.*?)[ :-]+(?<date>\d+/\d+/\d+))$

See a regex demo.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文