Markdown 或 HTML
我需要用户创建、修改和删除自己的文章。我计划使用 SO 用于创建文章的 WMD 编辑器。
据我所知,SO 存储了 markdown 和 HTML。为什么这样做——有什么好处?
我无法决定是存储 Markdown、HTML 还是两者都存储。如果我存储两者,我将检索并转换哪一个以显示给用户。
更新:
好的,我认为从到目前为止的答案来看,我应该存储 markdown 和 HTML。看起来很酷。我还阅读了 Jeff 发表的有关 XSS 漏洞的博客文章。因为 WMD 编辑器允许您输入任何 HTML,这可能会让我有些头痛。
有问题的博客文章位于此处。我猜我将不得不遵循与 SO 相同的方法 - 并清理服务器端的输入。
SO 使用的清理代码是否可以作为开源提供,或者我必须从头开始?
任何帮助将不胜感激。
谢谢
I have a requirement for users to create, modify and delete their own articles. I plan on using the WMD editor that SO uses to create the articles.
From what I can gather SO stores the markdown and the HTML. Why does it do this - what is the benefit?
I can't decide whether to store the markdown, HTML or both. If I store both which one do I retrieve and convert to display to the user.
UPDATE:
Ok, I think from the answers so far, i should be storing both the markdown and HTML. That seems cool. I have also been reading a blog post from Jeff regarding XSS exploits. Because the WMD editor allows you to input any HTML this could cause me some headaches.
The blog post in question is here. I am guessing that I will have to follow the same approach as SO - and sanitize the input on the server side.
Is the sanitize code that SO uses available as Open Source or will I have to start this from scratch?
Any help would be much appreciated.
Thanks
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
就性能和兼容性(最终还有社会控制)而言,存储两者都非常有用/有帮助。
如果您仅存储 Markdown(或任何非 HTML 标记),那么每次将其解析为 HTML 风格都会产生性能成本。这并不总是明显便宜。
如果您仅存储 HTML,那么您将面临错误在生成的 HTML 中悄然蔓延的风险。这将导致大量的维护和错误修复令人头痛。您还将失去社交控制,因为您不再知道用户实际填写了什么。例如,作为管理员,您还想知道哪些用户试图使用 XSS 进行攻击
等等。此外,最终用户将无法编辑 Markdown 格式的数据。您需要将其从 HTML 转换回来。
要在 Markdown 版本的每次更改时更新 HTML,您只需添加一个额外的字段来表示用于生成 HTML 输出的 Markdown 版本。每当您检索行时服务器端发生更改时,请使用新版本重新解析数据并更新数据库中的行。这只是一次性的额外费用。
Storing both is extremely useful/helpful in terms of performance and compatiblity (and eventually also social control).
If you store only Markdown (or whatever non-HTML markup), then there's a performance cost by parsing it into HTML flavor everytime. This is not always noticeably cheap.
If you store only HTML, then you'll risk that bugs are silently creeping in the generated HTML. This would lead to lot of maintenance and bugfixing headache. You'll also lose social control because you don't know anymore what the user has actually filled in. You'd for example as being an admin also like to know which users are trying to do XSS using
<script>
and so on. Also, the enduser won't be able to edit the data in Markdown format. You'd need to convert it back from HTML.To update the HTML on every change of Markdown version, you just add one extra field representing the Markdown version being used for generating the HTML output. Whenever this has been changed in the server side at the moment you retrieve the row, re-parse the data using the new version and update the row in the DB. This is only an one-time extra cost.
通过存储两者,您只需处理一次降价(发布时)。然后,您将检索 HTML,以便可以更快地加载页面。
By storing both you only have to process the markdown once (when it is posted). You would then retrieve the HTML so that you can load your pages faster.
如果您只存储了一个,则永远必须为显示视图或编辑视图重新创建另一个。
If you only stored one, you'd forever have to recreate the other for either the display view or the edit view.