负向预测以排除 html 标签

发布于 2024-08-30 16:36:08 字数 438 浏览 2 评论 0 原文

我试图提出一个验证表达式来防止用户在网页上的评论框中输入 html 或 javascript 标签。

以下内容适用于单行文本:

^(?!.*(<|>)).*$

..但由于点(.),它不允许任何换行符。如果我采用这样的方法:

^(?!.*(<|>))(.|\s)*$

它将允许多行,但表达式仅匹配 '<'和“>”在第一行。我需要它来匹配任何行。

这工作正常:

^[-_\s\d\w&quot;'\.,:;#/&\$\%\?!@\+\*\\(\)]{0,4000}$

但它很丑陋,我担心它会对某些用户造成影响,因为它是一个多语言应用程序。

有什么想法吗?谢谢!

I'm trying to come up with a validation expression to prevent users from entering html or javascript tags into a comment box on a web page.

The following works fine for a single line of text:

^(?!.*(<|>)).*$

..but it won't allow any newline characters because of the dot(.). If I go with something like this:

^(?!.*(<|>))(.|\s)*$

it will allow multiple lines but the expression only matches '<' and '>' on the first line. I need it to match any line.

This works fine:

^[-_\s\d\w"'\.,:;#/&\$\%\?!@\+\*\\(\)]{0,4000}$

but it's ugly and I'm concerned that it's going to break for some users because it's a multi-lingual application.

Any ideas? Thanks!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

谈场末日恋爱 2024-09-06 16:36:08

请注意,您的 RE 会阻止用户在任何上下文中输入 <>。例如,“2>1”。这是非常不可取的。

而不是尝试使用 正则表达式来匹配 HTML (它们不是不太适合这样做),只需将 <> 转换为 <& 即可转义它们。 > 。或者,找到适合您选择的语言的包,该包实现白名单以允许有限的 HTML 子集,或者支持其自己的标记语言(我听说 markdown 很好)。

至于“。”由于不匹配换行符,一些正则表达式实现支持一个标志(通常“m”表示“多行”,“s”表示“单行”;后者导致“.”匹配换行符)来控制此行为。

前两个基本上相当于 /^[^<>]*$/,除了这个适用于多行字符串。你有什么理由不这样写 RE 呢?

Note that your RE prevents users from entering < and >, in any context. "2 > 1", for example. This is very undesirable.

Rather than trying to use regular expressions to match HTML (which they aren't well suited to do), simply escape < and > by transforming them to < and >. Alternatively, find a package for your language-of-choice that implements whitelisting to allow a limited subset of HTML, or that supports its own markup language (I hear markdown is nice).

As for "." not matching newline characters, some regexp implementations support a flag (usually "m" for "multi-line" and "s" for "single line"; the latter causes "." to match newlines) to control this behavior.

The first two are basically equivalent to /^[^<>]*$/, except this one works on multiline strings. Any reason why you didn't write the RE that way?

冧九 2024-09-06 16:36:08

所以,我研究了一下,发现有一个 .Net 'SingleLine' 正则表达式选项会导致“.”。也匹配新行字符。不幸的是,这在 ASP.Net RegularExpressionValidator 中不可用。据我所知,没有办法让像 ^(?!.(<\w+>)).$ 这样的东西在不做服务器端的情况下在多行文本框中工作验证。

我采纳了你的建议,并采取了在服务器端转义标签的方式。这需要将验证页面指令设置为“false”,但在这种特殊情况下,这并不是什么大问题,因为评论框实际上是唯一需要担心的事情。

So, I looked into it and there is a .Net 'SingleLine' option for regular expressions that causes "." to also match on the new line character. Unfortunately, this isn't available in the ASP.Net RegularExpressionValidator. As far as I can see, there's no way to make something like ^(?!.(<\w+>)).$ work on a multi-line textbox without doing server-side validation.

I took your advice and went the route of escaping the tags on the server side. This requires setting the validation page directive to 'false' but in this particular instance that isn't a big deal because the comment box is really the only thing to worry about.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文