适合高写入（10000 次插入/小时）、低读取（10 次读取/秒）的最佳数据库？

发布于 2024-08-04 09:53:29 字数 255 浏览 5 评论 0原文

我正在开发一个网络应用程序，目前使用 sql server 2008。但是，我正在考虑迁移到另一个数据库（simpledb）以提高性能。

我有一个后台进程，每小时向一个特定表插入最多 10000 行。还会读取该表以在 Web 应用程序中显示数据。当后台进程运行时，Web 应用程序将无法使用，因为数据库连接超时。

因此，我正在考虑转向亚马逊的 simpledb 以提高性能。亚马逊的 SimpleDB 是否针对此用例进行了优化？如果没有，我可以使用其他解决方案吗？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

落在眉间の轻吻 2024-08-11 09:53:29

您的问题是您正在使用的隔离级别。除非您更改它，否则 SQL Server（以及许多其他数据库）将以 select 将阻止未提交的读取的模式运行。您想要更改 SQL Server，使其使用 MVCC （Oracle、MySQL 和 SQL 的默认设置）服务器也都有它），你的问题就会消失。

来自设置事务隔离级别 (Transact-SQL)：

已提交读取
指定语句无法读取
已修改但未修改的数据
由其他交易提交。这
防止脏读。数据可以是
因其他交易而改变
内的个别陈述
当前交易，导致
不可重复读取或幻像数据。
此选项是 SQL Server 默认选项。
READ COMMITTED 的行为取决于
关于设置
READ_COMMITTED_SNAPSHOT 数据库
选项：
如果 READ_COMMITTED_SNAPSHOT 设置为 OFF（默认值），数据库引擎
使用共享锁来防止其他
修改行时产生的事务
当前事务正在运行
读操作。共享锁也
阻止语句读取行
被其他事务修改，直到
另一项交易已完成。
共享锁类型决定何时
它将被释放。行锁是
在下一行之前释放
已处理。页面锁被释放
当读取下一页时，表
当语句执行时锁被释放
完成。
如果 READ_COMMITTED_SNAPSHOT 设置为 ON，数据库引擎将使用行
版本控制以呈现每个语句
具有事务一致性
数据存在的快照
声明的开始。锁是
不用于保护数据
其他交易的更新。
当 READ_COMMITTED_SNAPSHOT 发生时
数据库选项为ON，您可以使用
READCOMMITTEDLOCK 表提示
请求共享锁定而不是行锁定
单个语句的版本控制
在 READ 运行的事务中
COMMITTED 隔离级别。

（已添加强调）

更改数据库配置以将 READ_COMMITTED_SNAPSHOT 设置为 ON。

另外，请尝试使事务尽可能短暂，并确保在后台进程中提交事务（即每小时执行 10,000 次插入），因为如果它从不提交，则 select 将永远阻塞（在默认设置下）。

Your problem is the isolation level you are using. Unless you change it, SQL Server (and many other databases) operate in a mode that selects will block on uncommitted reads. You want to change SQL Server such that it uses MVCC instead (the default for Oracle; MySQL and SQL Server both have it too) and your problem will go away.

From SET TRANSACTION ISOLATION LEVEL (Transact-SQL):

READ COMMITTED
Specifies that statements cannot read
data that has been modified but not
committed by other transactions. This
prevents dirty reads. Data can be
changed by other transactions between
individual statements within the
current transaction, resulting in
nonrepeatable reads or phantom data.
This option is the SQL Server default.
The behavior of READ COMMITTED depends
on the setting of the
READ_COMMITTED_SNAPSHOT database
option:
If READ_COMMITTED_SNAPSHOT is set to OFF (the default), the Database Engine
uses shared locks to prevent other
transactions from modifying rows while
the current transaction is running a
read operation. The shared locks also
block the statement from reading rows
modified by other transactions until
the other transaction is completed.
The shared lock type determines when
it will be released. Row locks are
released before the next row is
processed. Page locks are released
when the next page is read, and table
locks are released when the statement
finishes.
If READ_COMMITTED_SNAPSHOT is set to ON, the Database Engine uses row
versioning to present each statement
with a transactionally consistent
snapshot of the data as it existed at
the start of the statement. Locks are
not used to protect the data from
updates by other transactions.
When the READ_COMMITTED_SNAPSHOT
database option is ON, you can use the
READCOMMITTEDLOCK table hint to
request shared locking instead of row
versioning for individual statements
in transactions running at the READ
COMMITTED isolation level.

(emphasis added)

Change your database configuration to turn READ_COMMITTED_SNAPSHOT to ON.

Also, try to keep your transactions as short-lived as possible and make sure you are committing the transaction in your background process (that's doing the 10,000 inserts an hour) because if it never commits then selects will block forever (on default settings).

回复收藏 0 原文

∞梦里开花 2024-08-11 09:53:29

正如其他人所说，写入数据库的数据量不是问题。 SQL Server 可以轻松处理比这多得多的数据。就我个人而言，我的表每小时可以处理数十万到数百万行而不会出现任何问题，并且人们整天都在阅读这些行而没有任何减速。

您可能需要通过更改读取语句的隔离级别或使用WITH (NOLOCK) 提示来了解脏读。
您应该考虑使用 .NET 中的批量上传对象将数据加载到数据库中。根据您在测试过程中看到的性能，使用 1000-5000 批次。您需要调整数字才能获得最佳性能。与逐行插入记录相比，向表中批量插入数据将带来显着更好的性能。确保您不会在一次事务中完成整个上传。您应该每批执行一个事务。
写入数据库时磁盘IO是什么样的。
您为数据库设置了什么恢复模式？数据库上的完整恢复比使用简单恢复模式需要更多的 IO。仅当您确实需要附带的时间点恢复时才使用完整恢复。

回复收藏 0 原文

微凉徒眸意 2024-08-11 09:53:29

每秒 3 次插入不会给任何 DBMS 带来压力，除非每次插入操作中要插入的数据量是惊人的。同样，每秒 10 次读取不太可能对任何有能力的 DBMS 造成过度压力，除非有一些您没有提到的复杂因素（例如“读取是整个 DBMS 上的聚合的聚合，在一段时间后将积累数十亿条记录”） ... 嗯，前 10 亿条记录需要 100,000 小时，大约是 4,000 天，或者大约 10 年。）

回复收藏 0 原文