jQuery 清理注释和链接 URL
就 jQuery(或 Javascript)而言,当一个人在 Facebook、Twitter 或博客上发表评论时,幕后会发生什么?
例如,他们是否首先清理文本,然后将 URL 模式匹配到实际链接中?除了在后端进行一些检查之外,客户端还应该检查其他事项吗?
我找到了一些用于将 URL 转换为链接的正则表达式,但我不确定是否有更好的解决方案。
我正在努力解决这个问题,但我很难知道从哪里开始。非常感谢您提供的任何指导!
In terms of jQuery (or Javascript), what happens behind the scenes when a person posts a comment on Facebook, Twitter, or a blog?
For instance, do they sanitize the text first, and then pattern match URL's into an actual link? Are there other items of concern that the client-side should check in addition to doing some checks on the backend?
I have found a few regex's for turning URL's into links, but I'm not sure if there are better solutions.
I'm trying to wrap my head around the problem, but I'm having a difficult time knowing where to start. Any guidance you can provide is greatly appreciated!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
这是一个意见问题(在我看来),所以我会CW这个答案。作为一个真正的互联网公民,我的看法是:
至于第 1 点,我认为防御性清理是错误的,因为它忽略了上面的第 2 点:在不知道要防御恶意输入的环境的情况下,如果不极大地限制输入,就无法真正对其进行清理字母表,即使如此,该过程也可能会与自身作斗争。它是对用户怀有敌意的,因为它不必要地限制了合法用户对他们想要保留在帐户中的数据的操作。谁说我想在我的“评论”或“昵称”或“注释”字段中包含看起来像 XML、SQL 或任何其他语言的特殊字符的字符?如果没有语义原因来过滤输入,为什么要对用户这样做呢?
第2点确实是这个问题的关键。用户输入可能很危险,因为服务器端代码(或客户端代码)可以将其直接交给毫无戒心的解释环境,其中元字符对于每个不同的环境都很重要< /em> 可能会导致意外行为。如果您通过将未受影响的用户输入直接粘贴到查询模板中来将其直接传递给 SQL,则恶意用户可以使用特殊的 SQL 元字符(例如引号)以您绝对不希望的方式控制数据库。然而,仅此一点并不能阻止我告诉你我的名字是“欧亨利”。
第 2 点的关键问题是存在许多不同的解释环境,并且就用户输入构成的威胁而言,每个环境都是完全不同的。让我们列出一些:
这里的关键点是,保护这些环境免受格式错误或恶意输入影响所需的确切技术因环境而异。保护 SQL 服务器免受恶意引用与保护 HTML 或 JavaScript 中的引用是完全不同的问题(请注意,这两者也完全不同!)。
底线:因此,我的观点是,在担心潜在的格式错误或恶意输入时,正确的关注焦点是写入用户数据的过程,而不是读取用户数据的过程。由于用户提供的数据的每个片段都由您的软件与每个解释环境配合使用,因此必须执行“引用”或“转义”操作,并且它必须是特定于目标环境的操作。具体如何安排可能各地有所不同。例如,传统上在 SQL 中,人们使用准备好的语句,尽管有时准备好的语句的缺陷使这种方法变得困难。当吐出 HTML 时,大多数服务器端框架都有各种内置的钩子,用于使用实体符号转义 HTML 或 XML(例如表示“&”的
&
)。如今,保护 Javascript 的最简单方法是利用 JSON 序列化器,当然还有其他方法可以选择。This is a matter of opinion (in my opinion) so I'll CW this answer. Here's my opnion as a bona-fide citizen of the Internet:
As to point 1, I think that defensive sanitization is misguided because it ignores point 2 above: without knowing what environment you're defending from malicious input, you can't really sanitize it without greatly restricting the input alphabet, and even then the process may be fighting against itself. It's user-hostile because it needlessly restricts what legitimate users can do with the data they want to keep in their account. Who is to say that me wanting to include in my "comments" or "nickname" or "notes" fields characters that look like XML, or SQL, or any other language's special characters? If there's no semantic reason to filter inputs, why do that to your users?
Point 2 is really the crux of this. User input can be dangerous because server-side code (or client-side code, for that matter) can hand it over directly to unsuspecting interpretation environments where meta-characters important to each distinct environment can cause unexpected behavior. If you hand untouched user input directly to SQL by pasting it directly into a query template, then special SQL meta-characters like quotes can be used by a malicious user to control the database in ways you definitely don't want. However, that alone is no reason to prevent me from telling you that my name is "O'Henry".
The key issue with point 2 is that there are many different interpretation environments, and each of them is completely distinct as far as the threat posed by user input. Let's list a few:
The key point here is that the exact techniques necessary to protect those environments from malformed or malicious input differ significantly from one to the next. Protecting your SQL server from malicious quotes is a completely different problem from guarding those quotes in HTML or JavaScript (and note that both of those are totally different from each other too!).
The bottom line: my opinion, therefore, is that the proper focus of attention when worrying about potentially malformed or malicious input is the process of writing user data, not reading it. As each fragment of user-supplied data is used by your software in cooperation with each interpreting environment, a "quoting" or "escaping" operation has to be done, and it has to be an operation specific to the target environment. How exactly that's arranged may vary all over the place. Traditionally in SQL, for example, one uses prepared statements, though there are times when the deficiencies of prepared statements make that approach difficult. When spitting out HTML, most server-side frameworks have all sorts of built-in hooks for HTML or XML escaping with entity notation (like
&
for "&"). Nowadays, the simplest way to protect things for Javascript is to leverage a JSON serializer, though of course there are other ways to go.