面向未来的大文本块的最佳标记格式?
我有许多记录(=<100),其中包含大量文本,需要在以可重复使用的文件格式存储之前进行标记(语义上:列表、标题、表格、链接、引用等)。
存储后,它可能在未来尽可能多的时间内保持或多或少的不变。
它包含一些非ascii,所以需要UTF-8。我开始使用 HTML,然后考虑使用 Markdown...但想知道人们认为什么是最适合长期存储的标记格式?该内容最初用于(大部分是静态的)网站,但可以用作其他输出的内容。
最后,关于长期使用的存储选择的意见 - 数据库、单独的文档......?记录的更改不会经常发生,并且仅由 1-3 人编辑,并且读取访问权限应该随着时间的推移而增加。
更新:
我终于选择了 MultiMarkdown、PHP Markdown Extra 和 Kramdown 作为文本格式(Markdown 省略了太多 HTML 标签),并使用 Kramdown 将结果文件转换为 html。现在我正在尝试 iOS Markdown 编辑器,它可以处理扩展的 Markdown 并通过 Dropbox 同步到我的台式机/笔记本电脑。
I have a number of records (=< 100) that contain sizeable chunks of text that require marking up (semantically: lists, headings, tables, links, quotations, etc...) before storing in a re-usable file format.
When stored, it is likely to remain more or less unchanged for as many years into the future as possible.
It contains some non-ascii, so UTF-8 is required. I started using HTML, then considered Markdown... but would like to know what people think is the most future-proof markup format for long-term storage? The content is initially for a (mostly static) website, but may be used as content for other outputs.
Finally, opinions on the choice of storage for long-term use - database, separate documents...? Changes to records will be infrequent and edited by only 1-3 people, and read access should increase over time.
Update:
I've finally chosen the common features (e.g. for tables) between MultiMarkdown, PHP Markdown Extra and Kramdown as the text format (Markdown omits too many HTML tags), and am converting the resulting files to html with Kramdown. Now I'm trying out iOS Markdown editors that can handle an extended Markdown and sync via Dropbox to my desk/laptop.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
任何不适合长期存档的存储都会损坏。
这并不是数据库与文件系统的问题,而是如何确保不会发生(静默)数据损坏,以及如何迁移数据。我无法给您明确的答案,因为这取决于很多因素(包括成本),但这里有一些资源:
我对格式问题没有真正的答案,但我认为 HTML + UTF-8 即使在几十年后也应该可读,但要记录下来。
Any storage not designed for long-term archiving will break.
It is not so much a question of database vs. filesystem, but how to ensure that no (silent) data corruption happens, and how to migrate data. I can give you no definitive answers, because it depends on a lot of factors (incl. costs), but here are a few resources:
I have no real answer for the format question, but I think HTML + UTF-8 should be readable even in decades, but document it.