有没有更快的方法来将任意字符串与Java中的月份名称匹配
我想确定一个字符串是否是月份的名称,并且我想相对快速地完成它。目前在我脑子里的函数是这样的:
boolean isaMonth( String str ) {
String[] months = DateFormatSymbols.getInstance().getMonths();
String[] shortMonths = DateFormatSymbols.getInstance().getShortMonths();
int i;
for( i = 0; i<months.length(); ++i;) {
if( months[i].equals(str) ) return true;
if( shortMonths[i].equals(str ) return true;
}
return false;
}
然而,我将处理大量文本,一次将一个字符串传递给这个函数,大多数时候我会遇到整个整个过程的最坏情况循环并返回 false。
我看到另一个问题,讨论了正则表达式来匹配月份名称和年份,可以针对这种情况进行调整。正则表达式会更快吗?还有其他可能更快的解决方案吗?
I want to determine if a string is the name of a month and I want to do it relatively quickly. The function that is currently stuck in my brain is something like:
boolean isaMonth( String str ) {
String[] months = DateFormatSymbols.getInstance().getMonths();
String[] shortMonths = DateFormatSymbols.getInstance().getShortMonths();
int i;
for( i = 0; i<months.length(); ++i;) {
if( months[i].equals(str) ) return true;
if( shortMonths[i].equals(str ) return true;
}
return false;
}
However, I will be processing lots of text, passed one string at a time to this function, and most of the time I will be getting the worst case of going through the entire loop and returning false.
I saw another question that talked about a Regex to match a month name and a year which could be adapted for this situation. Would the Regex be faster? Is there any other solution that might be faster?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
为什么不将月份名称存储在
HashSet
中?这将为您提供恒定时间查找,而不是从循环中获得的线性时间查找。Why not store the month names in a
HashSet
? This will give you constant time lookup instead of the linear time lookup you are getting from your loop.将 Months 和 ShortMonths 合并到单个排序数组中,并对数组进行二分搜索。或者将它们合并到一个 Set (HashSet) 中并使用 contains。如果您想不区分大小写,请将所有月份名称更改为小写,并对搜索值执行相同的操作。
如果您希望能够检索月份数,请将它们全部合并到一个Map(HashMap)中,其值为月份数。
Merge months and shortMonths into a single sorted array and do a binary search on the array. Or merge them both into a Set (HashSet) and use contains. Change all the month names to lowercase and do the same with the search value, if you want to be case insensitive.
If you want to be able to retrieve the number of the month, merge them all into a Map (HashMap) with the value being the month number.
HashSet 是一个很好的通用解决方案 - 但我认为你可以做得更好。看一下月份的第一个字母 - jfmasond - 如果您预先过滤这些字母,并且仅在通过时才进行 HashSet 检查,它将处理大量“返回 false”场景。
您可以通过多种方式进行设置 - 一种超级简单的方法是使用 switch 语句,尽管查找表会更快。另请注意,您只需检查第一个字符是否在 a 和 s 之间,因此查找表不必具有完整的 unicode(或 UTF-8,具体取决于要求)代码空间。
为了使这一过程更加有效,您可以构建查找表,使其包含每个月的前 2 个字符 - 生成的查找表不会太大,这将大大减少需要对照每月检查的单词数。哈希集。
PS - 在执行任何操作之前,您应该进行一些分析并确保这是代码中真正的瓶颈区域。
HashSet is a good general purpose solution - but I think you can do better. Take a look at the first letter of the months - jfmasond - if you pre-filter on those, and only do the HashSet check if it passes, it will take care of a huge number of your 'return false' scenarios.
You can set this up a couple of ways - one super easy way to do it is to use a switch statement, although a lookup table would be faster. Note also that you only need to do the check if the first character is between a and s, so a lookup table doesn't have to have the full unicode (or UTF-8 depending on the requirements) code space.
To make this even more effective, you can construct your lookup table so it contains the first 2 characters of each month - the resulting lookup table isn't too big, and this would drastically reduce the number of words that need to be checked against the hashset.
PS - before you do any of this, you should do some profiling and make sure that this is the area of your code that is actually the bottleneck.