适合高写入(10000 次插入/小时)、低读取(10 次读取/秒)的最佳数据库?

发布于 2024-08-04 09:53:29 字数 255 浏览 5 评论 0原文

我正在开发一个网络应用程序,目前使用 sql server 2008。但是,我正在考虑迁移到另一个数据库(simpledb)以提高性能。

我有一个后台进程,每小时向一个特定表插入最多 10000 行。还会读取该表以在 Web 应用程序中显示数据。当后台进程运行时,Web 应用程序将无法使用,因为数据库连接超时。

因此,我正在考虑转向亚马逊的 simpledb 以提高性能。亚马逊的 SimpleDB 是否针对此用例进行了优化?如果没有,我可以使用其他解决方案吗?

I'm developing a web app and currently using sql server 2008 for it. But, I am considering moving to another database (simpledb) for improved performance.

I have a background process that inserts up to 10000 rows every hour into one specific table. That table is also read from to display data in the web application. When the background process runs, the web app is unusable because the db connection times out.

As a result, I'm thinking of moving to amazon's simpledb to improve performance. Is amazon's SimpleDB optimized for this use case? If not, is there another solution that I could use?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

落在眉间の轻吻 2024-08-11 09:53:29

您的问题是您正在使用的隔离级别。除非您更改它,否则 SQL Server(以及许多其他数据库)将以 select 将阻止未提交的读取的模式运行。您想要更改 SQL Server,使其使用 MVCC (Oracle、MySQL 和 SQL 的默认设置)服务器也都有它),你的问题就会消失。

来自设置事务隔离级别 (Transact-SQL)

已提交读取

指定语句无法读取
已修改但未修改的数据
由其他交易提交。这
防止脏读。数据可以是
因其他交易而改变
内的个别陈述
当前交易,导致
不可重复读取或幻像数据。
此选项是 SQL Server 默认选项。

READ COMMITTED 的行为取决于
关于设置
READ_COMMITTED_SNAPSHOT 数据库
选项:

  • 如果 READ_COMMITTED_SNAPSHOT 设置为 OFF(默认值),数据库引擎
    使用共享锁来防止其他
    修改行时产生的事务
    当前事务正在运行
    读操作。 共享锁也
    阻止语句读取行
    被其他事务修改,直到
    另一项交易已完成。

    共享锁类型决定何时
    它将被释放。行锁是
    在下一行之前释放
    已处理。页面锁被释放
    当读取下一页时,表
    当语句执行时锁被释放
    完成。
  • 如果 READ_COMMITTED_SNAPSHOT 设置为 ON,数据库引擎将使用行
    版本控制以呈现每个语句
    具有事务一致性
    数据存在的快照
    声明的开始。锁是
    不用于保护数据
    其他交易的更新。

当 READ_COMMITTED_SNAPSHOT 发生时
数据库选项为ON,您可以使用
READCOMMITTEDLOCK 表提示
请求共享锁定而不是行锁定
单个语句的版本控制
在 READ 运行的事务中
COMMITTED 隔离级别。

(已添加强调)

更改数据库配置以将 READ_COMMITTED_SNAPSHOT 设置为 ON。

另外,请尝试使事务尽可能短暂,并确保在后台进程中提交事务(即每小时执行 10,000 次插入),因为如果它从不提交,则 select 将永远阻塞(在默认设置下)。

Your problem is the isolation level you are using. Unless you change it, SQL Server (and many other databases) operate in a mode that selects will block on uncommitted reads. You want to change SQL Server such that it uses MVCC instead (the default for Oracle; MySQL and SQL Server both have it too) and your problem will go away.

From SET TRANSACTION ISOLATION LEVEL (Transact-SQL):

READ COMMITTED

Specifies that statements cannot read
data that has been modified but not
committed by other transactions. This
prevents dirty reads. Data can be
changed by other transactions between
individual statements within the
current transaction, resulting in
nonrepeatable reads or phantom data.
This option is the SQL Server default.

The behavior of READ COMMITTED depends
on the setting of the
READ_COMMITTED_SNAPSHOT database
option:

  • If READ_COMMITTED_SNAPSHOT is set to OFF (the default), the Database Engine
    uses shared locks to prevent other
    transactions from modifying rows while
    the current transaction is running a
    read operation. The shared locks also
    block the statement from reading rows
    modified by other transactions until
    the other transaction is completed.

    The shared lock type determines when
    it will be released. Row locks are
    released before the next row is
    processed. Page locks are released
    when the next page is read, and table
    locks are released when the statement
    finishes.
  • If READ_COMMITTED_SNAPSHOT is set to ON, the Database Engine uses row
    versioning to present each statement
    with a transactionally consistent
    snapshot of the data as it existed at
    the start of the statement. Locks are
    not used to protect the data from
    updates by other transactions.

When the READ_COMMITTED_SNAPSHOT
database option is ON, you can use the
READCOMMITTEDLOCK table hint to
request shared locking instead of row
versioning for individual statements
in transactions running at the READ
COMMITTED isolation level.

(emphasis added)

Change your database configuration to turn READ_COMMITTED_SNAPSHOT to ON.

Also, try to keep your transactions as short-lived as possible and make sure you are committing the transaction in your background process (that's doing the 10,000 inserts an hour) because if it never commits then selects will block forever (on default settings).

∞梦里开花 2024-08-11 09:53:29

正如其他人所说,写入数据库的数据量不是问题。 SQL Server 可以轻松处理比这多得多的数据。就我个人而言,我的表每小时可以处理数十万到数百万行而不会出现任何问题,并且人们整天都在阅读这些行而没有任何减速。

  1. 您可能需要通过更改读取语句的隔离级别或使用WITH (NOLOCK) 提示来了解脏读。

  2. 您应该考虑使用 .NET 中的批量上传对象将数据加载到数据库中。根据您在测试过程中看到的性能,使用 1000-5000 批次。您需要调整数字才能获得最佳性能。与逐行插入记录相比,向表中批量插入数据将带来显着更好的性能。确保您不会在一次事务中完成整个上传。您应该每批执行一个事务。

  3. 写入数据库时​​磁盘IO是什么样的。

  4. 您为数据库设置了什么恢复模式?数据库上的完整恢复比使用简单恢复模式需要更多的 IO。仅当您确实需要附带的时间点恢复时才使用完整恢复。

As others have said, the amount of data that you are writing into the database isn't an issue. SQL Server can easily handle much more data than that. Personally I've got tables that take hundreds of thousands to millions of rows per hour without issue, and people are reading the rows all day without any slow down.

  1. You may need to look at doing dirty reads by changing the isolation level of the read statements, or using the WITH (NOLOCK) hint.

  2. You should look at using the bulk upload object in .NET to load your data into the database. Use batches of 1000-5000 depending on the performance that you see during testing. You'll need to play with the number to get the best performance. Bulk inserting data into the table will give you a dramatically better performance than inserting the records row by row. Make sure that you don't do the entire upload in a single transaction. You should do one transaction per batch.

  3. What does the disk IO look like when you are writing into the database.

  4. What recovery model do you have set for the database? FULL recovery on the database will require much more IO than using the SIMPLE recovery mode. Only use FULL recovery if you actually need the point in time restores that come with it.

微凉徒眸意 2024-08-11 09:53:29

每秒 3 次插入不会给任何 DBMS 带来压力,除非每次插入操作中要插入的数据量是惊人的。同样,每秒 10 次读取不太可能对任何有能力的 DBMS 造成过度压力,除非有一些您没有提到的复杂因素(例如“读取是整个 DBMS 上的聚合的聚合,在一段时间后将积累数十亿条记录”) ... 嗯,前 10 亿条记录需要 100,000 小时,大约是 4,000 天,或者大约 10 年。)

Under 3 inserts per second is not going to give any DBMS a workout unless the amount of data to be inserted in each insert operation is phenomenal. Likewise, 10 reads per second is unlikely to over-stress any competent DBMS unless there is some complicating factor that you've not mentioned (such as 'the reads are aggregates of aggregates over the entire DBMS which will accumulate billions of records after a period of ... well, 100,000 hours for the first billion records, which is roughly 4,000 days, or roughly 10 years').

一花一树开 2024-08-11 09:53:29

在 Joel 的回答的后续过程中,您可能需要考虑为索引上的 PAD_INDEX 和 FILLFACTOR 设置适当的值。如果您没有指定这些选项,您的插入可能会对索引进行大量的重新分页,这会显着减慢您的写入时间。

In a follow-up to Joel's answer, you may need to look at setting appropriate values for PAD_INDEX and FILLFACTOR on your indexes. If you haven't specified those options, your inserts may be doing a lot of re-paginating on your indexes, which would slow down your write times significantly.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文