以任何顺序为3个必需组

发布于 2025-01-26 18:04:52 字数 1005 浏览 1 评论 0原文

我有一条文本行，我正在尝试将输入的用户分开，该笔记具有3个所需零件 - 任何长度和内容的注释，已知格式的日期以及用户的2-4个字母缩写。所有零件都是必需的，但是我发现在我要解析的注释中，用户定期输入所有6个可能的订单：

1/1/21 pb这是注释
PB 1/1/21这是注释
PB这是注释1/1/21
1/1/21这是注释PB
这是注释1/1/21 PB
这是注释PB 1/1/21

，我使用了名为Capture组的命名，以使我的生活变得轻松，因此这3个部分的Regexes在下面。正则是由于某些特定于行业的符号而长期造成的。足以说这组的正则是书面的，而且效果很好。

(?<note>.*?)
(?<initials>[A-Z]{2,4})
(?<date>TBD)

用户还总是在零件之间放置某种视觉分离器角色，例如我上面使用的空间。 [： - ]+涵盖我发现的所有情况。上面的第一个子弹的正则是：

^\w*(?<date>TBD)[ :-]+(?<initials>[A-Z]{2,4})[ :-]+(?<note>.*?)\w*$

分隔符有意从捕获组中删除。

那么，我将如何写这篇文章，以使三个必需的零件中的每一个都匹配但按任何顺序且不重复？我相信有条件的群体或查找是解决方案，但是我在想到如何到达任何有效的东西时遇到了很多麻烦。

另外，作为奖励头痛，我已经注意到，如以下问题所致：

不要续订-KF 4/1/22

我的上述迫克不知道“不续订”是否是注释，而“ kf”是缩写，或者“不续订-KF”是注释和“ do”是缩写。作为人类，我们看到-分隔符，并知道哪个选项正确。因此，如果在模棱两可的情况下更喜欢具有非Whitespace分离器Char的比赛，那就太神奇了。

原文

I have a single line of text, and I'm trying to split up the notes a user entered which has 3 required parts - a note of any length and content, a date of known formatting, and the user's 2-4 letter initials. All parts are required, but I have found that in the notes I'm parsing, users have regularly entered all 6 possible orderings:

1/1/21 PB This is a note
PB 1/1/21 This is a note
PB This is a note 1/1/21
1/1/21 This is a note PB
This is a note 1/1/21 PB
This is a note PB 1/1/21

As I am using .NET, I'm used named capture groups to make my life easy, so the regexes for the 3 parts are below. The regex is long due to some industry-specific notation; suffice to say the regex for this group is written and works well.

(?<note>.*?)
(?<initials>[A-Z]{2,4})
(?<date>TBD)

Users also always put some kind of visual separator character between the parts, like the space I used above; [ :-]+ covers all the cases I've found. A regex for the first bullet above would look like:

^\w*(?<date>TBD)[ :-]+(?<initials>[A-Z]{2,4})[ :-]+(?<note>.*?)\w*$

The separator chars are intentionally dropped from the capture groups.

So how would I go about writing this such that each of the 3 required pieces are matched but in any order and not repeated? I believe either conditional groups or lookaround is the solution, but I'm having a lot of trouble figuring how to arrive at anything that works.

Also, as a bonus headache, I have noticed that notes like the following cause problems:

DO NOT RENEW - KF 4/1/22

My regex above doesn't know if "DO NOT RENEW" is the note and "KF" is the initials or if "NOT RENEW - KF" is the note and "DO" is the initials. As humans we see the - separator and know which option is correct. So it would be amazing if, in an ambiguous situation, prefer the match which has a non-whitespace separator char.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

花落人断肠 2025-02-02 18:04:52

在C＃中，您可以重复使用命名的捕获组，并使用更改与所有表格匹配。

例如，匹配前3行：

^(?:(?<date>\d+/\d+/\d+)[ :-]+(?<initials>[A-Z]{2,4})[ :-]+(?<note>.*)|(?<initials>[A-Z]{2,4})[ :-]+(?<date>\d+/\d+/\d+)[ :-]+(?<note>.*)|(?<initials>[A-Z]{2,4})[ :-]+(?<note>.*?)[ :-]+(?<date>\d+/\d+/\d+))$

请参阅a regex demo 。

In C# you can reuse the named capture groups and use an alteration to match all forms.

For example matching the first 3 lines:

^(?:(?<date>\d+/\d+/\d+)[ :-]+(?<initials>[A-Z]{2,4})[ :-]+(?<note>.*)|(?<initials>[A-Z]{2,4})[ :-]+(?<date>\d+/\d+/\d+)[ :-]+(?<note>.*)|(?<initials>[A-Z]{2,4})[ :-]+(?<note>.*?)[ :-]+(?<date>\d+/\d+/\d+))$

See a regex demo.

回复收藏 0 原文

~没有更多了~