正则表达式用于验证长度字符串而不包含 html 标签
我正在使用 umbraco,其中字段的验证是通过正则表达式完成的。在一个字段中,我希望允许用户使用富文本编辑器(tinymce)设置文本样式,但我仍然想限制他们可以输入的字符数。
我目前正在使用这个正则表达式,但它会检查字符总数,因此包括 html。
^[\s\S]{0,250}$
是否有一个正则表达式不会计算 html 标签中的字符。
I am using umbraco where the validation on fields is done by regular expressions. In one field I want to allow users to style their text using the rich text editor (tinymce) but I still want to limit the number of characters they can enter.
I'm currently using this regular expression but it checks the total number of characters so includes the html.
^[\s\S]{0,250}$
Is there a regular expression that wouldn't count the characters in html tags.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
简短的回答是否定的。至少,没有任何健全的正则表达式,没有允许递归或平衡组的高级正则表达式引擎,也许根本就没有。可以识别和忽略 HTML 标签的正则表达式必须解析 HTML 才能做到这一点,并且 沿着这条路就是疯狂。
但是,您可以使用某种预处理,例如 jQuery在客户端或服务器端的其他地方,解析 HTML 并在应用长度验证之前删除标签。
不过,您确定要这样做吗?如果您将样式输入存储在数据库中,那么这些 HTML 标记将像其他所有内容一样计入您的列大小。如果将它们存储在 varchar(250) 列中,则必须将 HTML 标记计为该 250 个标记的一部分,或者将它们删除并丢失所有样式信息。
The short answer is no. At least, not with any sane regex, not without an advanced regex engine that allows recursion or balanced groups, and maybe not at all. A regex that can recognize and ignore HTML tags would have to parse the HTML to do it, and down that road lies madness.
However, you could use some sort of preprocessing, such as jQuery on the client-side or something else on the server-side, to parse the HTML and strip out the tags before you apply length validation.
Are you sure you want to do this, though? If you're storing the styled input in a database, then those HTML tags are going to count against your column size just like everything else will. If you're storing these in a varchar(250) column, you're going to have to either count the HTML tags as part of that 250, or else strip them out and lose all the style information.
一步完成这一任务会很困难(几乎不可能),因为您要检测的语法
不是上下文无关。两步就很容易了;只需先执行s/<.+?>//
替换即可删除所有标签,然后再次计数。与此相关的是,上面的正则表达式有点愚蠢。可以使用
.
字符来代表任意字符;您不必执行您正在使用的“空白或非空白”技巧。It's going to be hard (nigh impossible) to do this in one step, since the grammar you're trying to detect is
notcontext-free. Two steps would be easy; just do as/<.+?>//
substitution first to remove all the tags then count again.On a related note, your regex above is a little bit silly. You can use the
.
character to represent any character; you don't have to do the "whitespace OR not-whitespace" trick you're using.