我们是否应该在将特殊字符存储到数据库之前对其进行 HTML 编码?
我使用MySQL来存储数据,我的网页全部编码为UTF-8。我有很多葡萄牙语字符,例如 ç
和 õ
,我想知道是否应该在存储之前对它们进行 HTML 转义。
例如,我们应该将 &
存储为 &
吗?为什么(不)?有哪些优点和缺点/最佳实践?
I use MySQL to store data and my web pages are all encoded as UTF-8. I have a lot of Portuguese characters such as ç
and õ
and I'm wondering if I should HTML-escape them before storage.
Should we store &
as &
, for example? And why (not)? What are the advantages and disadvantages / best practices?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
存储之前不要对字符进行 HTML 编码。您应该存储尽可能纯粹的数据形式。需要 HTML 编码,因为您要在 HTML 页面上显示数据,因此在处理数据以创建页面期间需要进行编码。例如,假设您决定还要通过纯文本电子邮件发送数据。如果您已经对数据进行了 HTML 编码,那么现在 HTML 编码就成了您必须消除的障碍。
为您的数据选择一种规范形式并存储它。 UTF-8 非常棒,并且您的数据库支持它(假设您已正确创建所有表)。只存储UTF-8。
Don't HTML-encode your characters before storage. You should store as pure a form of your data as possible. HTML encoding is needed because you are going to display the data on an HTML page, so do the encoding during the processing of the data to create the page. For example, suppose you decide you're also going to send the data in plain text emails. If you've HTML-encoded the data, now the HTML encoding is a barrier that you have to undo.
Choose a canonical form for your data, and store that. UTF-8 is wonderful, and your database supports it (assuming you've created all your tables properly). Just store UTF-8.
根据数据库的目的,不建议使用 HTML 编码和存储数据。这样做将使数据仅适合在 HTML 页面上呈现(唯一目的),而对于所有其他操作(许多),您需要再次解码。这降低了数据库的数据一致性(因为有效性、准确性、可用性受到阻碍)。
Going by the purpose of Database, its not advisable to HTML encode and store the data. Doing so will make the data desirable only for rendering on HTML pages(the one purpose) and for all other operations(many) you need to again decode. This degrades data consistency(since validity, accuracy, usability are hampered) property of Database.
您是否需要寻找它们?我不是 MySQL 专家,但您可能需要费尽周折才能进行搜索。
您是否关心数据的 HTML 性质或字符编码?
我想说,如果可以避免的话,尽量不要在数据库中对字符进行任何特殊编码。搜索,必须记住特殊的入站/出站处理等。
Do you ever need to search for them? I'm not a MySQL expert but you may have to jump thru hoops to do searches.
Are you concerned about the HTML-ness of the data or the character encoding?
I would say try not to do any special encoding of characters in the DB if you can avoid it. Searching, having to remember special in-bound/out-bound processing, etc.
我不会将其编码到数据库中,除非有明确且明确的价值。您(以及任何使用该数据的其他人)必须记住在使用该数据时取消转义,或者转义您插入、更新或与该字段进行比较的任何数据。我不确定逃避它有什么好处,但这可能不值得。
I wouldn't encode it in the database unless there's a clear and definite value to doing that. You (and anyone else who will ever work with the data) will have to remember to un-escape when using that data or escape whatever data you insert, update, or compare to that field. I'm not sure what the benefit is to escaping it, but it's probably not worth it.
如果您每次写入都要进行 100 或 1000 个页面演示,那么在写入过程中进行编码将会更加高效。但在大多数情况下,我认为差异可以忽略不计。
但毫无疑问,其他原因(不编码)也很好——而且无论如何,对 UTF-8 喜欢的字符进行编码是没有意义的。
If you are doing 100's or 1000's of page presentations for each write, then encoding on the way in is going to be more efficient. But in most circumstances I guess the difference would be negligible.
But the other reasons (to not encode) are good, no doubt about it - and anyway it's pointless to encode characters which UTF-8 likes.
我认为在进入数据库的过程中进行编码实际上是一种安全风险,因为这意味着您可能不会在数据库和浏览器之间进行编码(因为这会导致双重编码)。这意味着,如果现在或将来有一条路线让未编码的数据进入您的数据库,那么该数据将以未编码的方式发送到浏览器。最好在数据库和浏览器之间进行编码,因此存储未编码的恕我直言。
I would argue that encoding on the way into the database is actually a security risk, because it means you presumably won't be encoding between database and browser (as this would lead to double encoding). That means that if there is a route either now or in future for unencoded data to get into your database then that will be sent to the browser unencoded. Better to encode between database and browser and therefore store unencoded IMHO.