如何使用可选的微秒浮士patern匹配提取DateTime

发布于 2025-01-21 12:10:13 字数 1079 浏览 2 评论 0原文

我需要提出一个模式来匹配yyyy-mm-ddthh:mm:ss.s+z,毫秒部分是可选的。正则是简单的,看起来像这样:

^\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}(.\d+)?Z$

与这些字符串相匹配:

"2022-04-02T11:24:59Z"
"2022-04-02T11:24:59.123Z"

在Lua中,这并不像我想象的那样直截了当。我已经尝试了几种模式,但最终只能让它起作用:

local pat3 = "^%d%d%d%d%-%d%d%-%d%dT%d%d:%d%d:%d%d[%.%d+]*Z$"

local dt1 = "2022-04-02T11:24:59Z"
local dt2 = "2022-04-02T11:24:59.123Z"
local dt_invalid = "2022-04-02T11:24:59.123.000.000Z"

print(dt1:match(pat3))
print(dt2:match(pat3))
print(dt_invalid:match(pat3))

这种模式满足了我的大部分需求,但是让我感到困扰的是,dt_invalid之类的字符串也匹配。我还尝试了以下模式,但没有成功:

local pat1 = "^%d%d%d%d%-%d%d%-%d%dT%d%d:%d%d:%d%d[%.%d+]?Z$"
local pat2 = "^%d%d%d%d%-%d%d%-%d%dT%d%d:%d%d:%d%d(%.%d+)?Z$"

LUA具有简化的模式匹配功能,但是这些模式看起来更像是正则模式。我在Lua中还不够知识,无法知道差异或缺少什么。为什么pat1pat2不起作用?是否有比pat3更好的模式?

I need to come up with a pattern to match YYYY-MM-DDTHH:MM:SS.s+Z with the milliseconds part being optional. The regex is simple and looks like this:

^\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}(.\d+)?Z$

Which match these strings:

"2022-04-02T11:24:59Z"
"2022-04-02T11:24:59.123Z"

In Lua, this isn't as straight forward as I thought. I've tried a couple of patterns but ultimately only got this one to work:

local pat3 = "^%d%d%d%d%-%d%d%-%d%dT%d%d:%d%d:%d%d[%.%d+]*Z
quot;

local dt1 = "2022-04-02T11:24:59Z"
local dt2 = "2022-04-02T11:24:59.123Z"
local dt_invalid = "2022-04-02T11:24:59.123.000.000Z"

print(dt1:match(pat3))
print(dt2:match(pat3))
print(dt_invalid:match(pat3))

That pattern meets most of my needs, but it's bothering me that strings like dt_invalid match too. I've also tried the following patterns with no success:

local pat1 = "^%d%d%d%d%-%d%d%-%d%dT%d%d:%d%d:%d%d[%.%d+]?Z
quot;
local pat2 = "^%d%d%d%d%-%d%d%-%d%dT%d%d:%d%d:%d%d(%.%d+)?Z
quot;

Lua has a simplified pattern matching functionality, but these patterns look more like the regex pattern. I'm not knowledgeable enough in Lua to know the difference or what I'm missing. Why does pat1 and pat2 not work? Is there a better pattern than pat3?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

农村范ル 2025-01-28 12:10:13

我强烈建议打开Lua独立并训练自己。
对我来说,一个非常好的工具是string.gsub(),每个字符串都以方法为所有字符串函数。
这使事情变得更加容易...

> _VERSION
Lua 5.4
> ("2022-04-02T11:24:59.123Z"):gsub('^%d+%-%d+%-%d+%u%d+%:%d+%:%d+%.%d+%u
, 'MATCH ALL')
MATCH ALL   1
> ("2022-04-02T11:24:59.123Z"):gsub('^%d+%-%d+%-%d+%u%d+%:%d+%:%d+%.%d+%u
, 'Replaced with MATCH: %1')
Replaced with MATCH: 2022-04-02T11:24:59.123Z    1
> -- Lets replace "T" with a space
> ("2022-04-02T11:24:59.123Z"):gsub('T', ' ')
2022-04-02 11:24:59.123Z    1
> -- Cut off the last part
> ("2022-04-02T11:24:59.123Z"):gsub('%.%d+%u
, '')
2022-04-02T11:24:59     1
> -- Finally
> do local date, count = ("2022-04-02T11:24:59.123Z"):gsub('T', ' '):gsub('%.%d+%u
, '') print(date) end
2022-04-02 11:24:59
> -- Lets do a gsub() chain for all three cases
> do local date, count = ("2022-04-02T11:24:59.123Z 2022-04-02T11:24:59Z 2022-04-02T11:24:59.123.000.000Z"):gsub('T', ' '):gsub('%.%d+',''):gsub('%u', '') print(date) end
2022-04-02 11:24:59 2022-04-02 11:24:59 2022-04-02 11:24:59

I strongly suggesting to open a Lua Standalone and train yourself.
A very good tool for me is string.gsub() and every string has all string functions attached as methods.
That make things much easier...

> _VERSION
Lua 5.4
> ("2022-04-02T11:24:59.123Z"):gsub('^%d+%-%d+%-%d+%u%d+%:%d+%:%d+%.%d+%u
, 'MATCH ALL')
MATCH ALL   1
> ("2022-04-02T11:24:59.123Z"):gsub('^%d+%-%d+%-%d+%u%d+%:%d+%:%d+%.%d+%u
, 'Replaced with MATCH: %1')
Replaced with MATCH: 2022-04-02T11:24:59.123Z    1
> -- Lets replace "T" with a space
> ("2022-04-02T11:24:59.123Z"):gsub('T', ' ')
2022-04-02 11:24:59.123Z    1
> -- Cut off the last part
> ("2022-04-02T11:24:59.123Z"):gsub('%.%d+%u
, '')
2022-04-02T11:24:59     1
> -- Finally
> do local date, count = ("2022-04-02T11:24:59.123Z"):gsub('T', ' '):gsub('%.%d+%u
, '') print(date) end
2022-04-02 11:24:59
> -- Lets do a gsub() chain for all three cases
> do local date, count = ("2022-04-02T11:24:59.123Z 2022-04-02T11:24:59Z 2022-04-02T11:24:59.123.000.000Z"):gsub('T', ' '):gsub('%.%d+',''):gsub('%u', '') print(date) end
2022-04-02 11:24:59 2022-04-02 11:24:59 2022-04-02 11:24:59
—━☆沉默づ 2025-01-28 12:10:13

这里的问题在于,为了使一组“可量化”(符合将量词分配给集合)的“量化”),您需要将括号之间集合的元素包围。

在您的pat1情况下,最后一个%d未包装到括号中,因此+被视为角色而不是量词。另一方面,在您的pat2案例中,根本不会考虑量词。

此外,在lua中,您无法嵌套集,因此您无法指定[%d]+]?之类的模式?将被视为普通字符。

我的解决方案是使用可能不太限制的解决方法(可能匹配其他字符串),它仍然会捕获您需要的时间:

%d%d%d%d%-%d%d%-%d%dT%d%d:%d%d:%d%d[%.]?[%d]*Z

漏洞(不应该匹配的行 - 虽然是匹配的线):

  • ” 2022-04 -02t11:24:59.z“
  • ” 2022-04-02T11:24:59123Z“

这在您拥有的整个字符串中有助于您的情况吗?

The problem here is that in order for a set of characters to be "quantifiable" (eligible for a quantifier to be assigned to the set), you need to enclose the elements of the set between brackets.

In your pat1 case, the last %d is not enclosed into brackets, so the + is considered as a character instead of a quantifier. On the other hand, in your pat2 case, no quantifier will be considered at all.

Moreover in LUA you can't nest sets, so you can't specify a pattern like [%.[%d]+]?, cause it would match only the inside quantifier while the ? will be considered as a normal character.

My solution would be to use a workaround that may be less restrictive (potentially prone to match other strings) still that catches the parts of the time you need:

%d%d%d%d%-%d%d%-%d%dT%d%d:%d%d:%d%d[%.]?[%d]*Z

Vulnerabilities (lines that shouldn't match - which match though):

  • "2022-04-02T11:24:59.Z"
  • "2022-04-02T11:24:59123Z"

Does this help to your case within the whole set of strings you have?

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文