如何使用可选的微秒浮士patern匹配提取DateTime
我需要提出一个模式来匹配yyyy-mm-ddthh:mm:ss.s+z
,毫秒部分是可选的。正则是简单的,看起来像这样:
^\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}(.\d+)?Z$
与这些字符串相匹配:
"2022-04-02T11:24:59Z"
"2022-04-02T11:24:59.123Z"
在Lua中,这并不像我想象的那样直截了当。我已经尝试了几种模式,但最终只能让它起作用:
local pat3 = "^%d%d%d%d%-%d%d%-%d%dT%d%d:%d%d:%d%d[%.%d+]*Z$"
local dt1 = "2022-04-02T11:24:59Z"
local dt2 = "2022-04-02T11:24:59.123Z"
local dt_invalid = "2022-04-02T11:24:59.123.000.000Z"
print(dt1:match(pat3))
print(dt2:match(pat3))
print(dt_invalid:match(pat3))
这种模式满足了我的大部分需求,但是让我感到困扰的是,dt_invalid之类的字符串也匹配。我还尝试了以下模式,但没有成功:
local pat1 = "^%d%d%d%d%-%d%d%-%d%dT%d%d:%d%d:%d%d[%.%d+]?Z$"
local pat2 = "^%d%d%d%d%-%d%d%-%d%dT%d%d:%d%d:%d%d(%.%d+)?Z$"
LUA具有简化的模式匹配功能,但是这些模式看起来更像是正则模式。我在Lua中还不够知识,无法知道差异或缺少什么。为什么pat1
和pat2
不起作用?是否有比pat3
更好的模式?
I need to come up with a pattern to match YYYY-MM-DDTHH:MM:SS.s+Z
with the milliseconds part being optional. The regex is simple and looks like this:
^\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}(.\d+)?Z$
Which match these strings:
"2022-04-02T11:24:59Z"
"2022-04-02T11:24:59.123Z"
In Lua, this isn't as straight forward as I thought. I've tried a couple of patterns but ultimately only got this one to work:
local pat3 = "^%d%d%d%d%-%d%d%-%d%dT%d%d:%d%d:%d%d[%.%d+]*Zquot;
local dt1 = "2022-04-02T11:24:59Z"
local dt2 = "2022-04-02T11:24:59.123Z"
local dt_invalid = "2022-04-02T11:24:59.123.000.000Z"
print(dt1:match(pat3))
print(dt2:match(pat3))
print(dt_invalid:match(pat3))
That pattern meets most of my needs, but it's bothering me that strings like dt_invalid
match too. I've also tried the following patterns with no success:
local pat1 = "^%d%d%d%d%-%d%d%-%d%dT%d%d:%d%d:%d%d[%.%d+]?Zquot;
local pat2 = "^%d%d%d%d%-%d%d%-%d%dT%d%d:%d%d:%d%d(%.%d+)?Zquot;
Lua has a simplified pattern matching functionality, but these patterns look more like the regex pattern. I'm not knowledgeable enough in Lua to know the difference or what I'm missing. Why does pat1
and pat2
not work? Is there a better pattern than pat3
?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我强烈建议打开Lua独立并训练自己。
对我来说,一个非常好的工具是
string.gsub()
,每个字符串都以方法为所有字符串函数。这使事情变得更加容易...
I strongly suggesting to open a Lua Standalone and train yourself.
A very good tool for me is
string.gsub()
and every string has all string functions attached as methods.That make things much easier...
这里的问题在于,为了使一组“可量化”(符合将量词分配给集合)的“量化”),您需要将括号之间集合的元素包围。
在您的
pat1
情况下,最后一个%d
未包装到括号中,因此+
被视为角色而不是量词。另一方面,在您的pat2
案例中,根本不会考虑量词。此外,在lua中,您无法嵌套集,因此您无法指定
[%d]+]?
之类的模式?将被视为普通字符。我的解决方案是使用可能不太限制的解决方法(可能匹配其他字符串),它仍然会捕获您需要的时间:
漏洞(不应该匹配的行 - 虽然是匹配的线):
这在您拥有的整个字符串中有助于您的情况吗?
The problem here is that in order for a set of characters to be "quantifiable" (eligible for a quantifier to be assigned to the set), you need to enclose the elements of the set between brackets.
In your
pat1
case, the last%d
is not enclosed into brackets, so the+
is considered as a character instead of a quantifier. On the other hand, in yourpat2
case, no quantifier will be considered at all.Moreover in LUA you can't nest sets, so you can't specify a pattern like
[%.[%d]+]?
, cause it would match only the inside quantifier while the?
will be considered as a normal character.My solution would be to use a workaround that may be less restrictive (potentially prone to match other strings) still that catches the parts of the time you need:
Vulnerabilities (lines that shouldn't match - which match though):
Does this help to your case within the whole set of strings you have?