存储 html 文件(或一般文件)的最佳数据库

发布于 2024-11-28 22:22:13 字数 868 浏览 2 评论 0原文

将 html 文件(小尺寸,~最大 700kb)存储到数据库中的最佳数据库类型(面向文档、关系、键值等)是什么?

目前我正在使用 sqlite3 和 python,但如果条目/文件的数量超过 3000(那么 .db 文件大约为 260mb),它似乎会变得相当慢。除此之外,sqlite 不适合多处理用例。

sqlite 模式是这样的:

CREATE TABLE articles (url TEXT NOT NULL,published DATETIME,title TEXT, fetched TEXT NOT
    NULL,section TEXT,PRIMARY KEY (url), FOREIGN KEY(url) references
    contents(url));
CREATE TABLE contents(url TEXT NOT NULL,date DATETIME,content TEXT,PRIMARY KEY (url));

CREATE TABLE shares (url TEXT NOT NULL, date DATETIME,likes INTEGER NOT NULL,
                    totals INTEGER NOT NULL,clicks INTEGER, comments INTEGER NOT                
                    NULL,share INTEGER NOT NULL, 
                    tweets INTEGER NOT NULL,PRIMARY KEY(date,url),FOREIGN KEY (url)       
                     REFERENCES articles(url));

html 文件转到内容

What is the best Database-Type (document-oriented,relational,key-value etc.) to store a html file (small sizes, ~max. 700kb) into Database?

Currently I´m using sqlite3 with python, but it seems to get pretty slow if the number of entries/files exceeds 3000 (the .db-file is about 260mb then). Besides that, sqlite is not suited for multiprocessing-usecases.

sqlite schema is like this:

CREATE TABLE articles (url TEXT NOT NULL,published DATETIME,title TEXT, fetched TEXT NOT
    NULL,section TEXT,PRIMARY KEY (url), FOREIGN KEY(url) references
    contents(url));
CREATE TABLE contents(url TEXT NOT NULL,date DATETIME,content TEXT,PRIMARY KEY (url));

CREATE TABLE shares (url TEXT NOT NULL, date DATETIME,likes INTEGER NOT NULL,
                    totals INTEGER NOT NULL,clicks INTEGER, comments INTEGER NOT                
                    NULL,share INTEGER NOT NULL, 
                    tweets INTEGER NOT NULL,PRIMARY KEY(date,url),FOREIGN KEY (url)       
                     REFERENCES articles(url));

And the html files go to contents

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

幻想少年梦 2024-12-05 22:22:13

对于使用 URL 作为主键的以文档为中心的数据库,并且还必须支持多个并发编写器,您可能希望考虑使用 SQLite 上的 noSQL 数据库之一。目前此处列出了其中的 122 个。

“相当慢”对你来说意味着什么?您确定感知到的缓慢是@数据库吗?

For a document-centric database that uses a URL as the primary key, and which also has to support multiple concurrent writers, you might wish to consider one of the noSQL databases over SQLite. There are currently 122 of them listed here.

What does "pretty slow" mean to you? And are you certain the perceived slowness is @ the database?

好听的两个字的网名 2024-12-05 22:22:13

所以你认为,sqlite 一般来说应该具有足够的可扩展性?

现实世界中不存在“一般”场景。不,我认为它对于以文档为中心的应用程序(记录可以达到 500K)来说不能很好地扩展。 SQLite 没有经过优化,无法在繁忙的多个并发写入场景中很好地扩展,其中“繁忙”是一个多变量函数,涉及每秒写入次数、正在写入的记录大小以及表上有多少个索引。简而言之,写入操作的磁盘密集度越高(ergo),其扩展性就越差。换句话说,记录越大和/或表的索引越多,每秒可以容纳的写入次数就越少。 500K 的记录确实是一个非常大的记录。使用 MVCC 会更好。

so you think, sqlite should be scalable enough in general?

There is no "in general" scenario in the actual world. No, I do not think it would scale well for a document-centric application where the records can be 500K. SQLite is not optimized to scale well in a BUSY MULTIPLE CONCURRENT WRITERS SCENARIO, where "busy" is a multivariable function involving the number of writes per second and the size of the record being written and how many indexes are on the table. In brief, the more disk-intensive (ergo time-consuming) the write operation, the less well it well scale. In other words, the larger the record and/or the more heavily indexed the table is, the fewer writes-per-second can be accommodated. And a 500K record is a very large record indeed. You'd be better served with MVCC.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文