java中如何使用通配符进行url匹配
我试图将给定的网址与一组过滤条件进行匹配,根据这些条件该网址将被接受或丢弃。这是一个示例模式
http://test.blogs.com/between_the/
http://test.blogs.com/between_the/page*
http://test.blogs.com/between_the/archives*
*index.html*
*/page/*
http://abc.blogs.com/
http://area.test.com/index.php/blogs_a/blog_list/
http://area.test.com/index.php/blogs_b/blog_list/*/
根据条件,将接受以下网址,
http://test.blogs.com/between_the/2012/02/autocad-ws-update-coming.html
http://abc.blogs.com/test
http://area.test.com/index.php/blogs_b/blog_list/page/2
而将过滤以下网址
http://test.blogs.com/between_the/page/2
http://test.blogs.com/index.html
http://area.test.com/index.php/blogs_b/blog_list/1/
只是想知道最好的方法是什么?我不确定是否可以使用复杂的通用正则表达式来处理此问题,因为排除模式不可预测。我正在考虑删除通配符并创建两个单独的列表以进行精确匹配并包含匹配,然后让输入 url 对这两个列表进行迭代。
任何指示将不胜感激。
谢谢
I'm trying to match a given url against a set of filtering conditions based on which the url will be accepted or discarded. Here's a sample pattern
http://test.blogs.com/between_the/
http://test.blogs.com/between_the/page*
http://test.blogs.com/between_the/archives*
*index.html*
*/page/*
http://abc.blogs.com/
http://area.test.com/index.php/blogs_a/blog_list/
http://area.test.com/index.php/blogs_b/blog_list/*/
Based on the condition, the following urls will be accepted
http://test.blogs.com/between_the/2012/02/autocad-ws-update-coming.html
http://abc.blogs.com/test
http://area.test.com/index.php/blogs_b/blog_list/page/2
while the ones below will be filtered
http://test.blogs.com/between_the/page/2
http://test.blogs.com/index.html
http://area.test.com/index.php/blogs_b/blog_list/1/
Just wondering what's the best approach for this ? I'm not sure if this can be handled using a complex generic regex as the exclusion patterns are not predictable. I was thinking of removing the wildcards and create two seperate List for exact match and contains match, then have the input url iterate against the two lists.
Any pointers will be appreciated.
Thanks
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您可以简单地创建一个正则表达式列表,并在与任何正则表达式都不匹配时接受 url。 url 一旦与正则表达式匹配就会被丢弃。这应该比创建单个复杂的正则表达式更容易且更易于维护。
You can simply create a List of regular expressions and accept a url when it doesn't match any of the regexes. A url is discarded as soon as it matches a regex. This should be much easier and more maintainable than creating a single complex regular expression.