在 PHP 中解析日期字符串
给定一个任意字符串,例如(“我下周五要玩槌球”
或 “Gadzooks,已经是 6 月 17 日了吗?”
),您会怎么走从那里提取日期?
如果这看起来是太硬的篮子的一个不错的选择,也许你可以建议一个替代方案。我希望能够解析 Twitter 消息的日期。我要查看的推文将是用户针对此服务发送的推文,因此可以指导他们使用更简单的格式,但我希望它尽可能透明。您能想到一个好的中间立场吗?
Given an arbitrary string, for example ("I'm going to play croquet next Friday"
or "Gadzooks, is it 17th June already?"
), how would you go about extracting the dates from there?
If this is looking like a good candidate for the too-hard basket, perhaps you could suggest an alternative. I want to be able to parse Twitter messages for dates. The tweets I'd be looking at would be ones which users are directing at this service, so they could be coached into using an easier format, however I'd like it to be as transparent as possible. Is there a good middle ground you could think of?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(9)
如果您有能力,您可以尝试以下算法。我将展示一个示例,并将繁琐的工作留给您:)
我们可以假设
strtotime("17th June")
比strtotime("17th")< 更准确/code> 只是因为它包含更多单词...即“下周五”总是比“周五”更准确。
If you have the horsepower, you could try the following algorithm. I'm showing an example, and leaving the tedious work up to you :)
And we can assume that
strtotime("17th June")
is more accurate thanstrtotime("17th")
simply because it contains more words... i.e. "next Friday" will always be more accurate than "Friday".我会这样做:
首先使用 strtotime() 检查整个字符串是否是有效日期。如果是这样,你就完成了。
如果不是,请确定字符串中有多少个单词(例如按空格分割)。设这个数为n。
循环遍历每个 n-1 个单词组合并使用 strtotime() 来查看该短语是否是有效日期。如果是这样,您已经在原始字符串中找到了最长的有效日期字符串。
如果不是,则循环遍历每个 n-2 个单词组合并使用 strtotime() 来查看该短语是否是有效日期。如果是这样,您已经在原始字符串中找到了最长的有效日期字符串。
...等等,直到您找到有效的日期字符串或搜索每个单个/单个单词。通过查找最长的匹配,您将获得最明智的日期(如果有意义的话)。由于您正在处理推文,因此您的字符串永远不会很大。
I would do it this way:
First check if the entire string is a valid date with strtotime(). If so, you're done.
If not, determine how many words are in your string (split on whitespace for example). Let this number be n.
Loop over every n-1 word combination and use strtotime() to see if the phrase is a valid date. If so you've found the longest valid date string within your original string.
If not, loop over every n-2 word combination and use strtotime() to see if the phrase is a valid date. If so you've found the longest valid date string within your original string.
...and so on until you've found a valid date string or searched every single/individual word. By finding the longest matches, you'll get the most informed dates (if that makes sense). Since you're dealing with tweets, your strings will never be huge.
受到 Juan Cortes 基于 Dolph 算法的断开链接的启发,我继续自己编写了它。请注意,我决定只在第一场成功的比赛后返回。
输入
输出
Inspired by Juan Cortes's broken link based off Dolph's algorithm, I went ahead and wrote it up myself. Note that I decided to just return on first successful match.
Inputs
Outputs
类似以下内容可能可以做到这一点:
可能会执行另一个循环来检查其他工作日或其他格式,或者只是嵌套。
Something like the following might do it:
Probably do a nother loop to check for other weekDays or other formats, or just nest.
使用
strtotime
PHP 函数。当然,您需要设置一些规则来解析它们,因为您需要删除字符串上的所有额外内容,但除此之外,它是一个非常灵活的函数,很可能会在这里帮助您。
例如,它可以采用“下周五”和“6 月 15 日”等字符串,并返回字符串中日期的适当 UNIX 时间戳。我想,如果您考虑一些基本规则,例如查找“下一个 X”以及周和月名称,您将能够做到这一点。
如果您可以从“下周五我要打槌球”中找到“下周五”,您就可以提取日期。看起来是一个有趣的项目!但请记住,
strtotime
仅接受英语短语,不适用于任何其他语言。例如,查找所有“下一个工作日”情况的规则将非常简单:
只要遵循该规则,这将返回字符串中提到的下一个工作日的日期!在本例中,输出为
2010 年 6 月 18 日凌晨 12:00
。考虑到用户使用正确的拼写,通过这些规则中的一些(可能不止一些!),您很可能在很高比例的情况下提取正确的日期。
就像已经指出的那样,使用正则表达式和一点耐心就可以做到这一点。编码中最困难的部分是决定以什么方式解决问题,而不是在知道要做什么后才编码!
Use the
strtotime
php function.Of course you would need to set up some rules to parse them since you need to get rid of all the extra content on the string, but aside from that, it's a very flexible function that will more than likely help you out here.
For example, it can take strings like "next Friday" and "June 15th" and return the appropriate UNIX timestamp for the date in the string. I guess that if you consider some basic rules like looking for "next X" and week and month names you would be able to do this.
If you could locate the "next Friday" from the "I'm going to play croquet next Friday" you could extract the date. Looks like a fun project to do! But keep in mind that
strtotime
only takes english phrases and will not work with any other language.For example, a rule that will locate all the "Next weekday" cases would be as simple as:
This will return the date of the next weekday mentioned on the string as long as it follows the rule! In this particular case, the output was
June 18, 2010, 12:00 am
.With a few (maybe more than a few!) of those rules you will more than likely extract the correct date in a high percentage of the cases, considering that the users use correct spelling though.
Like it's been pointed out, with regular expressions and a little patience you can do this. The hardest part of coding is deciding what way you are going to approach your problem, not coding it once you know what!
遵循 Dolph Mathews 的想法并基本上忽略我之前的答案,我构建了一个非常好的函数来实现这一点。它返回它认为与日期匹配的字符串、它的 unix 日期戳以及具有用户指定格式或预定义格式的日期本身 (
F j, Y
)。我写道关于它的一个小帖子 提取日期来自 PHP 的字符串。作为预告,以下是两个示例字符串的输出:输入:“I'm go to play croquet next Friday”
输入:< em>“Gadzooks,已经是 6 月 17 日了吗?”
我希望它能对某人有所帮助。
Following Dolph Mathews idea and basically ignoring my previous answer, I built a pretty nice function that does exactly that. It returns the string it thinks is the one that matches a date, the unix datestamp of it, and the date itself either with the user specified format or the predefined one (
F j, Y
).I wrote a small post about it on Extracting a date from a string with PHP. As a teaser, here's the output of the two example strings:Input: “I’m going to play croquet next Friday”
Input: “Gadzooks, is it 17th June already?”
I hope it helps someone.
根据Dolph的建议,我编写了一个我认为可以达到目的的函数。
您可以这样称呼它:
parse_date('现在设置截止日期 2017 年 1 月 5 日', 0 , 0)
Based on Dolph's suggestion, I wrote out a function that I think serves the purpose.
You would call it like this:
parse_date('Setting the due date january 5th 2017 now', 0 , 0)
您正在寻找的是时态表达式解析器。您可以查看维基百科文章来开始使用。请记住,解析器可能会变得非常复杂,因为这确实是一个语言识别问题。这是人工智能/计算语言学领域通常要解决的问题。
What you're looking for a is a temporal expression parser. You might look at the Wikipedia article to get started. Keep in mind that the parsers can get pretty complicated, because this really a language recognition problem. That is commonly a problem tackled by the artificial intelligence/computational linguistics field.
大多数建议的算法实际上都很蹩脚。我建议使用一些不错的正则表达式来表示日期并用它来测试句子。以此为例:
我跳过了几个月,因为我不确定我是否以正确的顺序记住了它们。
这是最简单的解决方案,但我会比其他基于计算能力的解决方案做得更好。 (是的,这不是一个万无一失的正则表达式,但你明白了)。然后对匹配的字符串应用 strtotime 函数。这是最简单、最快的解决方案。
Majority of suggested algorithms are in fact pretty lame. I suggest using some nice regex for dates and testing the sentence with it. Use this as an example:
I skipped months, since I'm not sure I remember them in the right order.
This is the easiest solution, yet I will do the job better than other compute-power based solutions. (And yeah, it's hardly a fail-proof regex, but you get the point). Then apply the strtotime function on the matched string. This is the simplest and the fastest solution.