特定 url 格式的正则表达式
我正在尝试获取正则表达式来匹配特定的 url 格式。特别是 stackexchange 的 api url。例如,我希望这两者匹配:
http://api.stackoverflow.com/1.1/questions/1234/answers http://api.physics.stackexchange.com/1.0/questions/5678/answers
凡
- 非粗体的所有内容都必须相同。
- 第一个粗体部分只能由 a 到 z 组成,可以有一个句号,也可以没有句号。
- 另外,如果有一个句号,后面必须跟有“stackexchange”一词,那就太好了。但这并不重要。
- 第二个粗体部分只能是 1 或 0。
- 最后一个粗体部分只能是数字 0 到 9,并且可以是
- 任意长度 url 之前或之后不能有任何内容,甚至不能有尾部斜杠
I am trying to get a regex expression to match a specific url format. Specifically the api urls for stackexchange. For example I want both of these to match:
http://api.stackoverflow.com/1.1/questions/1234/answers http://api.physics.stackexchange.com/1.0/questions/5678/answers
Where
- everything not in bold must identical.
- The first bold part, can only be made of a to z, and either one or no full stop.
- Also it would be good, if there is one full stop the word "stackexchange" must follow. However this isn't crucial.
- The second bold part can only be a 1 or a 0.
- The last bold part can be only numbers 0 to 9, and can be any length
- There can't be anything at all before or after the url, not even a trailing slash
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
^
确保它在输入开始时开始,\\z
确保它在输入结束时结束。所有的点都被转义了,所以它们都是字面意思。根据 URL 规范,(?i:...)
部分使域和方案不区分大小写。[01]
仅匹配字符 0 或 1。[0-9]+
匹配 1 个或多个阿拉伯数字。其余的都是不言自明的。The
^
makes sure it starts at the start of input, and the\\z
makes sure it ends at the end of input. All the dots are escaped so they are literal. The(?i:...)
part makes the domain and scheme case-insensitive as per the URL spec. The[01]
only matches the characters 0 or 1. The[0-9]+
matches 1 or more Arabic digits. The rest is self explanatory.^
匹配字符串开头,$
匹配行尾,[.]
是转义点的另一种方法,而不是反斜杠(其本身需要转义为\\.
)。^
matches start-of-string,$
matches end-of-line,[.]
is an alternative way to escape the dot than a backslash (which itself would need to be escaped as\\.
).这个经过测试的 Java 程序有一个带注释的正则表达式,应该可以解决这个问题:
This tested Java program has a commented regex which should do the trick: