W3C 兼容 URL 的正则表达式?
我试图找到一个符合 HTML5“url”输入类型(在 JavaScript 中使用)的 W3C 标准的 URL 正则表达式。
两种可能性:
我发现了另一个 StackOverflow 关于 URL 正则表达式的问题 看起来很有前途:
存在一个 HTML5 表单验证 jQuery 插件 - 应该模拟 HTML5 表单验证
该脚本使用以下正则表达式:
/(https?|ftp):\/\/(((([a-zA-Z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-f]{2})|[!\$&'\(\)\*\+,;=]|:)*@)?(((\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5])\.(\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5])\.(\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5])\.(\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5]))|((([a-zA-Z]|\d|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(([a-zA-Z]|\d|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])([a-zA-Z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])*([a-zA-Z]|\d|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])))\.)+(([a-zA-Z]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(([a-zA-Z]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])([a-zA-Z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])*([a-zA-Z]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])))\.?)(:\d*)?)(\/((([a-zA-Z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-f]{2})|[!\$&'\(\)\*\+,;=]|:|@)+(\/(([a-zA-Z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-f]{2})|[!\$&'\(\)\*\+,;=]|:|@)*)*)?)?(\?((([a-zA-Z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-f]{2})|[!\$&'\(\)\*\+,;=]|:|@)|[\uE000-\uF8FF]|\/|\?)*)?(\#((([a-zA-Z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-f]{2})|[!\$&'\(\)\*\+,;=]|:|@)|\/|\?)*)?/
我不知道其余部分,但乍一看它最多似乎在规范的“可能被空格包围”部分上失败。
以前有其他人尝试过这样做吗?有谁知道我在哪里可以找到兼容的正则表达式?
塔,罗宾。
I'm trying to find a regular expression for a URL that will be compliant with the W3C standard for the HTML5 "url" input type (to be used in JavaScript).
See the W3C specification of the requirements.
Two possibilities:
I found this other StackOverflow question about URL regexes which looks quite promising:
There exists an HTML5 form validation jQuery plugin - which is supposed to emulate HTML5 form validation functionality.
This script uses the following regular expression:
/(https?|ftp):\/\/(((([a-zA-Z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-f]{2})|[!\amp;'\(\)\*\+,;=]|:)*@)?(((\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5])\.(\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5])\.(\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5])\.(\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5]))|((([a-zA-Z]|\d|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(([a-zA-Z]|\d|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])([a-zA-Z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])*([a-zA-Z]|\d|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])))\.)+(([a-zA-Z]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(([a-zA-Z]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])([a-zA-Z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])*([a-zA-Z]|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])))\.?)(:\d*)?)(\/((([a-zA-Z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-f]{2})|[!\amp;'\(\)\*\+,;=]|:|@)+(\/(([a-zA-Z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-f]{2})|[!\amp;'\(\)\*\+,;=]|:|@)*)*)?)?(\?((([a-zA-Z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-f]{2})|[!\amp;'\(\)\*\+,;=]|:|@)|[\uE000-\uF8FF]|\/|\?)*)?(\#((([a-zA-Z]|\d|-|\.|_|~|[\u00A0-\uD7FF\uF900-\uFDCF\uFDF0-\uFFEF])|(%[\da-f]{2})|[!\amp;'\(\)\*\+,;=]|:|@)|\/|\?)*)?/
I don't know about the rest of this, but at first glance it at best seems to fail on the "potentially surrounded by spaces" part of the specification.
Has anyone else tried to do this before? Does anyone know where I could find a compliant regex?
Ta, Robin.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
是的,我想我已经有了答案。我发现 另一个 StackOverflow 问题,其中包含似乎是 RFC 3987 的非常好的正则表达式。我不能确定,因为我还没有详细了解规范或任何内容,但我认为它是符合。
我必须对其进行一些重构才能在 JavaScript 中工作 - 我从字符声明中删除了大括号,并将
\x
替换为\u
但现在我很确定它作品。我还在开头和结尾添加了\s*
以符合 W3C 规范的该部分。这是最终的正则表达式。如果有人尝试它并遇到任何问题,或者认为它应该以任何方式有所不同,请告诉我 - 我将在这里寻求最准确的解决方案:
Right I think I have the answer. I found another StackOverflow question that contained what seems to be a very good RegEx for RFC 3987. I can't be sure because I haven't gone through the spec in detail or anything, but I think it's compliant.
I had to refactor it a bit to work in JavaScript - I removed the curly braces from the character declarations and replaced
\x
with\u
but now I'm pretty sure it works. I also added\s*
to the beginning and end to comply with that part of the W3C specification.Here's the final regex. If anyone tries it and has any problems, or thinks it should be different in any way let me know - I'm going for the most accurate solution possible here:
您的链接提供了您的答案:
因此,您可以手动去除空格,然后应用正则表达式,或者使用这个:
考虑前导和尾随空格
Your link provides your answer:
So, you could manually strip the whitespace and then apply your regular expression, or use this one:
to account for leading and trailing whitespace