存储 html 文件(或一般文件)的最佳数据库
将 html 文件(小尺寸,~最大 700kb)存储到数据库中的最佳数据库类型(面向文档、关系、键值等)是什么?
目前我正在使用 sqlite3 和 python,但如果条目/文件的数量超过 3000(那么 .db 文件大约为 260mb),它似乎会变得相当慢。除此之外,sqlite 不适合多处理用例。
sqlite 模式是这样的:
CREATE TABLE articles (url TEXT NOT NULL,published DATETIME,title TEXT, fetched TEXT NOT
NULL,section TEXT,PRIMARY KEY (url), FOREIGN KEY(url) references
contents(url));
CREATE TABLE contents(url TEXT NOT NULL,date DATETIME,content TEXT,PRIMARY KEY (url));
CREATE TABLE shares (url TEXT NOT NULL, date DATETIME,likes INTEGER NOT NULL,
totals INTEGER NOT NULL,clicks INTEGER, comments INTEGER NOT
NULL,share INTEGER NOT NULL,
tweets INTEGER NOT NULL,PRIMARY KEY(date,url),FOREIGN KEY (url)
REFERENCES articles(url));
html 文件转到内容
What is the best Database-Type (document-oriented,relational,key-value etc.) to store a html file (small sizes, ~max. 700kb) into Database?
Currently I´m using sqlite3 with python, but it seems to get pretty slow if the number of entries/files exceeds 3000 (the .db-file is about 260mb then). Besides that, sqlite is not suited for multiprocessing-usecases.
sqlite schema is like this:
CREATE TABLE articles (url TEXT NOT NULL,published DATETIME,title TEXT, fetched TEXT NOT
NULL,section TEXT,PRIMARY KEY (url), FOREIGN KEY(url) references
contents(url));
CREATE TABLE contents(url TEXT NOT NULL,date DATETIME,content TEXT,PRIMARY KEY (url));
CREATE TABLE shares (url TEXT NOT NULL, date DATETIME,likes INTEGER NOT NULL,
totals INTEGER NOT NULL,clicks INTEGER, comments INTEGER NOT
NULL,share INTEGER NOT NULL,
tweets INTEGER NOT NULL,PRIMARY KEY(date,url),FOREIGN KEY (url)
REFERENCES articles(url));
And the html files go to contents
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
对于使用 URL 作为主键的以文档为中心的数据库,并且还必须支持多个并发编写器,您可能希望考虑使用 SQLite 上的 noSQL 数据库之一。目前此处列出了其中的 122 个。
“相当慢”对你来说意味着什么?您确定感知到的缓慢是@数据库吗?
For a document-centric database that uses a URL as the primary key, and which also has to support multiple concurrent writers, you might wish to consider one of the noSQL databases over SQLite. There are currently 122 of them listed here.
What does "pretty slow" mean to you? And are you certain the perceived slowness is @ the database?
所以你认为,sqlite 一般来说应该具有足够的可扩展性?
现实世界中不存在“一般”场景。不,我认为它对于以文档为中心的应用程序(记录可以达到 500K)来说不能很好地扩展。 SQLite 没有经过优化,无法在繁忙的多个并发写入场景中很好地扩展,其中“繁忙”是一个多变量函数,涉及每秒写入次数、正在写入的记录大小以及表上有多少个索引。简而言之,写入操作的磁盘密集度越高(ergo),其扩展性就越差。换句话说,记录越大和/或表的索引越多,每秒可以容纳的写入次数就越少。 500K 的记录确实是一个非常大的记录。使用 MVCC 会更好。
so you think, sqlite should be scalable enough in general?
There is no "in general" scenario in the actual world. No, I do not think it would scale well for a document-centric application where the records can be 500K. SQLite is not optimized to scale well in a BUSY MULTIPLE CONCURRENT WRITERS SCENARIO, where "busy" is a multivariable function involving the number of writes per second and the size of the record being written and how many indexes are on the table. In brief, the more disk-intensive (ergo time-consuming) the write operation, the less well it well scale. In other words, the larger the record and/or the more heavily indexed the table is, the fewer writes-per-second can be accommodated. And a 500K record is a very large record indeed. You'd be better served with MVCC.