HTML 表单字段中允许哪些内容?
我有一些 HTML 表单,我正在服务器端实现这些字段的过滤(使用 Java Servlet),我想知道我应该允许什么,或者也许我应该禁止什么。对于电子邮件地址,我删除了与此匹配的任何内容:
[^A-Za-z0-9._%-@]
我可以将哪些类似规则应用于姓名、消息和电话号码字段。
我假设<和>应该转义为 <和 >,我还应该更换什么?
沿着这些思路,对于此类字段允许的最大长度是否有任何建议?
I have a few HTML forms, and I am implementing filtering of these fields on the server-side (using Java Servlets), and I was wondering what I should allow, or perhaps what I should disallow. For e-mail addresses I remove anything that matches this:
[^A-Za-z0-9._%-@]
What are some similar rules I could apply to name, message and phone number fields.
I'm assuming that < and > should be escaped as < and >, what else should I replace?
Along those lines, are there any recommendations for the maximum length allowed for such fields?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
您需要先将
&
转义为&
,然后将<
转义为<
。与普遍看法相反,没有必要将>
转义为>
。如果无法打开 HTML 标记,则无需保护关闭 HTML 标记的括号。您决定是否应该在将其写入数据库之前对其进行转义,或者是否应该在每次从数据库读取数据时进行转义。在输入端执行会更快;如果您不必在将数据发送到另一个应用程序之前总是转义内容,那么在输出端执行此操作将更加安全,并且还可以更轻松地与其他应用程序交换数据。我个人会付出性能代价并在输出方面无法逃脱。缓存可以提供帮助。
您需要执行的其余验证取决于数据类型。对于电子邮件地址,请检查以确保其后有一个
@
和至少一个.
,然后,如果您关心它是否有效,请发送地址测试电子邮件。完全验证电子邮件地址几乎是不可能的,即使该地址在语法上是有效的,但这仍然并不意味着它可以被发送。同样,允许几乎任何内容作为 URL,然后尝试检索它以查看其是否有效。对于帐单/送货地址,请使用 USPS Web 服务验证并获取最佳格式的数据(对于美国地址)。You need to escape
&
to&
first, then<
to<
. Contrary to popular belief, it is not necessary to escape>
to>
. There is no need to protect the bracket that closes an HTML tag if there is no way to open one.Your call on whether it should be escaped before being written to the database, or whether you should do it as it's read from the database each time. Doing it on the input side is going to be faster; doing it on the output side is going to be more secure and also make interchanging data with other apps easier if you don't have to always unescape stuff before sending it off to another app. I personally would pay the performance price and unescape on the output side. Caching can help.
The rest of the validation you'll want to do depends on the type of data. For an e-mail address, check to make sure it has an
@
and at least one.
after that, then, if you care whether it's valid or not, send the address a test e-mail. It is next to impossible to completely validate an e-mail address much further than that, and even if the address is syntactically valid, that still doesn't mean it can be delivered. Similarly, allow almost anything as a URL and then try to retrieve it to see if it's valid. For a billing/shipping address, use the USPS Web service to validate and get the data in the best format (for U.S. addresses).您应该允许任何名称通过。考虑“O'Malley”或“Hudson-Walker”。某些语言(例如 Salish)包含数字,因此您可以输入“Sqwxwu7mish”。然后还有带重音的字符,希伯来语、西里尔语、希腊语、中文、韩语,甚至还有以前被称为 Prince 的音乐家。
消息文本也应该同样不受约束。如果消息可以包含 HTML,那么您必须解析 HTML(使用真正的 HTML 解析器)并应用标记和属性白名单,以仅允许您期望的内容通过。
电话号码也应该非常接近自由格式。北美格式与欧洲格式不同,有些人喜欢说“(555) 555-5555”,而另一些人喜欢说“555-555-5555”,有些电话号码有分机号,有些则没有。
您在输入时唯一需要担心的编码是所有内容均采用 UTF-8(包括您的数据库)。而且,在与数据库通信时,不要尝试自己编码任何内容,请使用数据库驱动程序的引用机制和占位符。
长度通常应该比您想象的要大得多,因此它们应该是您第一次猜测的合理最大值的两倍(至少)。名称的 20 个字符和 100 个字符之间的存储差异对于大多数应用程序来说并不重要,因此请慷慨一些。
在输出之前您不应该担心 HTML 编码,然后您应该使用您的环境支持的任何 HTML 和 URL 编码工具,不要尝试构建自己的工具。
不要过度限制你的投入,尽可能宽松和宽容。但对你的输出要非常严格。
You should allow anything through for names. Consider "O'Malley" or "Hudson-Walker". Some languages (such as Salish) include numbers so you can have "Sqwxwu7mish". Then there are accented characters, Hebrew, Cyrillic, Greek, Chinese, Korean, and even the musician formerly known as Prince.
Message text should be similarly unconstrained. If messages can contain HTML then you'll have to parse the HTML (with a real HTML parser) and apply tag and attribute whitelists to only allow things through that you are expecting.
Phone numbers should be pretty close to free form too. North American formats are different than European ones, some people like to say "(555) 555-5555" while others like "555-555-5555", some phone numbers have extensions and some don't.
The only encoding that you should worry about on input is that everything is in UTF-8 (including your database). And, when talking to your database, don't try to encode anything yourself, use the database driver's quoting mechanism and placeholders.
Lengths should generally be a lot bigger than you think they should so double (at least) your first guess at a reasonable maximum. The storage difference between 20 characters for a name and 100 isn't going to be important for most applications so be generous.
You shouldn't worry about HTML encoding until output and then you should use whatever HTML and URL encoding tools your environment supports, do not try to build your own.
Don't over-constrain your inputs, be as loose and forgiving as possible. Be very strict with your outputs though.
最大长度:我总是在客户端和服务器端的字段上应用最大长度。这些值与数据库中设置的最大值匹配。
我同意转义<,>和>,<。
我认为有很好的验证是一个好习惯。如果我正在处理姓名、消息和电话号码字段,我会执行以下操作。
对于每个文本框,确保文本框根本不会采用无效值。
名称:aA-zZ
消息:'aA-zZ''0-9''。' ',' ';'等等..
电话号码:'0-9' 不要允许任何空格,但允许'-',您始终可以在服务器端解析字符串。
Maximum length: I always apply a max length on my fields on the client side and server side. The values match the max values set in the database.
I agree with escaping <,> and >,<.
I think it is a good habit to have very good validation. If I were working with name,message and phone number fields I would do the following.
For each text box make it so that the textbox won't take the invalid values at all.
Name: aA-zZ
Message: 'aA-zZ' '0-9' '.' ',' ';' etc..
Phone number:'0-9' Don't allow any space but do allow '-', you can always parse the string server side.