修改正则表达式以匹配序数词“st”、“nd”、“rd”、“th”的日期

发布于 2024-08-19 08:36:27 字数 576 浏览 8 评论 0原文

如何修改下面的正则表达式以将日期与日期部分的序数相匹配?这个正则表达式匹配“Jan 1, 2003 | February 29, 2004 | October 02, 3202”,但我需要它也匹配:“Jan 1st, 2003 | February 29th, 2004 | October 02nd, 3202 | March 3rd, 2010”

^(?:(((Jan(uary)?|Ma(r(ch)?|y)|Jul(y)?|Aug(ust)?|Oct(ober)?|Dec(ember)?)\ 31)|((Jan(uary)?|Ma(r(ch)?|y)|Apr(il)?|Ju((ly?)|(ne?))|Aug(ust)?|Oct(ober)?|(Sept|Nov|Dec)(ember)?)\ (0?[1-9]|([12]\d)|30))|(Feb(ruary)?\ (0?[1-9]|1\d|2[0-8]|(29(?=,\ ((1[6-9]|[2-9]\d)(0[48]|[2468][048]|[13579][26])|((16|[2468][048]|[3579][26])00)))))))\,\ ((1[6-9]|[2-9]\d)\d{2}))

谢谢。

How can the regex below be modified to match dates with ordinals on the day part? This regex matches "Jan 1, 2003 | February 29, 2004 | November 02, 3202" but I need it to match also: "Jan 1st, 2003 | February 29th, 2004 | November 02nd, 3202 | March 3rd, 2010"

^(?:(((Jan(uary)?|Ma(r(ch)?|y)|Jul(y)?|Aug(ust)?|Oct(ober)?|Dec(ember)?)\ 31)|((Jan(uary)?|Ma(r(ch)?|y)|Apr(il)?|Ju((ly?)|(ne?))|Aug(ust)?|Oct(ober)?|(Sept|Nov|Dec)(ember)?)\ (0?[1-9]|([12]\d)|30))|(Feb(ruary)?\ (0?[1-9]|1\d|2[0-8]|(29(?=,\ ((1[6-9]|[2-9]\d)(0[48]|[2468][048]|[13579][26])|((16|[2468][048]|[3579][26])00)))))))\,\ ((1[6-9]|[2-9]\d)\d{2}))

Thank you.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

月光色 2024-08-26 08:36:27

这将取决于您的用例,但为了实用主义,您最好只匹配任何匹配的内容:
(1) 任何月份名称或缩写;
(2) 空白;
(3) 任意一位或两位数字;
(4) 空白;
(5) 任意 st,nd,rd,th;
(6) 空格 OR 逗号 + 可选空格;
(7) 任意四位数字;

我不确定您要匹配什么,但如果我有 3001 年 1 月 35 日,我想我宁愿现在捕获它并稍后使其无效,而不是只需从一开始就跳过它即可。

此外,根据您的数据集,请考虑区分大小写问题和常见的国际英语变体,例如 1 Jan 20041st Jan, 2004January, 2004 code> 等

添加换行符

^(?:j(?:an(?:uary)?|un(?:e)?|ul(?:y)?)?|feb(?:ruary)?|ma(?:r(?:ch)?|y)
|a(?:pr(?:il)?|ug(?:ust)?)|sep(?:t|tember)?|oct(?:ober)?|(?:nov|dec)(?:ember)?)  
\s+\d{1,2}(?:st|nd|rd|th)?(?:\s+|,\s*)\d{4}\b

更实用(并且可读),除非您有一个非常奇怪的数据集,否则允许在公共前缀之后添加任何内容:

(?:jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)[a-z]*?\s+\d{1,2}(?:[a-z]{2})?(?:\s+|,\s*)\d{4}\b

这会匹配 <代码>八旬老人99xx,0000?是的。这可能是一个问题吗?我对此表示怀疑。

This will depend on your use case, but in the interest of pragmatism, you might do well to just match anything matching:
(1) any month name or abbreviation;
(2) whitespace;
(3) any one or two digits;
(4) whitespace;
(5) any st,nd,rd,th;
(6) whitespace OR comma + optional whitespace;
(7) any four digits;

I'm not sure what you're matching in, but if I had Jan 35nd,3001, I think I'd rather capture it now and invalidate it later than to just skip over it right at the get-go.

Also, depending on your data set, consider case sensitivity issues and common international English variants, like 1 Jan 2004 or 1st Jan, 2004 or January, 2004 etc.

line breaks added

^(?:j(?:an(?:uary)?|un(?:e)?|ul(?:y)?)?|feb(?:ruary)?|ma(?:r(?:ch)?|y)
|a(?:pr(?:il)?|ug(?:ust)?)|sep(?:t|tember)?|oct(?:ober)?|(?:nov|dec)(?:ember)?)  
\s+\d{1,2}(?:st|nd|rd|th)?(?:\s+|,\s*)\d{4}\b

Even more pragmatic (and readable), unless you have a very bizarre dataset, is to allow anything after the common prefixes:

(?:jan|feb|mar|apr|may|jun|jul|aug|sep|oct|nov|dec)[a-z]*?\s+\d{1,2}(?:[a-z]{2})?(?:\s+|,\s*)\d{4}\b

Would this match octagenarianism 99xx, 0000 ? Yes. Is that likely to be an issue? I doubt it.

╄→承喏 2024-08-26 08:36:27

那个正则表达式做得太多了。您最好使用您语言中的 strptime() 等效项。但是,下面的正则表达式将匹配序数:

^(?:(((Jan(uary)?|Ma(r(ch)?|y)|Jul(y)?|Aug(ust)?|Oct(ober)?|Dec(ember)?)\ 31(st)?)|((Jan(uary)?|Ma(r(ch)?|y)|Apr(il)?|Ju((ly?)|(ne?))|Aug(ust)?|Oct(ober)?|(Sept|Nov|Dec)(ember)?)\ (0?[1-9]|([12]\d)|30))(st|nd|rd|th)?|(Feb(ruary)?\ (0?[1-9]|1\d|2[0-8]|(29(th)?(?=,\ ((1[6-9]|[2-9]\d)(0[48]|[2468][048]|[13579][26])|((16|[2468][048]|[3579][26])00)))))(st|nd|rd|th)?))\,\ ((1[6-9]|[2-9]\d)\d{2}))

请注意,它也会匹配“20nd”之类的内容,但在实际数据中遇到这种情况的可能性太低,在大多数情况下无需关心。

That regex is doing waaaaay too much. You'd be much better off using your language's equivalent of strptime(). However, the regex below will match ordinals:

^(?:(((Jan(uary)?|Ma(r(ch)?|y)|Jul(y)?|Aug(ust)?|Oct(ober)?|Dec(ember)?)\ 31(st)?)|((Jan(uary)?|Ma(r(ch)?|y)|Apr(il)?|Ju((ly?)|(ne?))|Aug(ust)?|Oct(ober)?|(Sept|Nov|Dec)(ember)?)\ (0?[1-9]|([12]\d)|30))(st|nd|rd|th)?|(Feb(ruary)?\ (0?[1-9]|1\d|2[0-8]|(29(th)?(?=,\ ((1[6-9]|[2-9]\d)(0[48]|[2468][048]|[13579][26])|((16|[2468][048]|[3579][26])00)))))(st|nd|rd|th)?))\,\ ((1[6-9]|[2-9]\d)\d{2}))

Note that it will also match things like "20nd" but the likelihood of encountering that in real data is way too low to bother caring in most cases.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文