适合高写入(10000 次插入/小时)、低读取(10 次读取/秒)的最佳数据库?
我正在开发一个网络应用程序,目前使用 sql server 2008。但是,我正在考虑迁移到另一个数据库(simpledb)以提高性能。
我有一个后台进程,每小时向一个特定表插入最多 10000 行。还会读取该表以在 Web 应用程序中显示数据。当后台进程运行时,Web 应用程序将无法使用,因为数据库连接超时。
因此,我正在考虑转向亚马逊的 simpledb 以提高性能。亚马逊的 SimpleDB 是否针对此用例进行了优化?如果没有,我可以使用其他解决方案吗?
I'm developing a web app and currently using sql server 2008 for it. But, I am considering moving to another database (simpledb) for improved performance.
I have a background process that inserts up to 10000 rows every hour into one specific table. That table is also read from to display data in the web application. When the background process runs, the web app is unusable because the db connection times out.
As a result, I'm thinking of moving to amazon's simpledb to improve performance. Is amazon's SimpleDB optimized for this use case? If not, is there another solution that I could use?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
您的问题是您正在使用的隔离级别。除非您更改它,否则 SQL Server(以及许多其他数据库)将以 select 将阻止未提交的读取的模式运行。您想要更改 SQL Server,使其使用 MVCC (Oracle、MySQL 和 SQL 的默认设置)服务器也都有它),你的问题就会消失。
来自设置事务隔离级别 (Transact-SQL):
(已添加强调)
更改数据库配置以将 READ_COMMITTED_SNAPSHOT 设置为 ON。
另外,请尝试使事务尽可能短暂,并确保在后台进程中提交事务(即每小时执行 10,000 次插入),因为如果它从不提交,则 select 将永远阻塞(在默认设置下)。
Your problem is the isolation level you are using. Unless you change it, SQL Server (and many other databases) operate in a mode that selects will block on uncommitted reads. You want to change SQL Server such that it uses MVCC instead (the default for Oracle; MySQL and SQL Server both have it too) and your problem will go away.
From SET TRANSACTION ISOLATION LEVEL (Transact-SQL):
(emphasis added)
Change your database configuration to turn READ_COMMITTED_SNAPSHOT to ON.
Also, try to keep your transactions as short-lived as possible and make sure you are committing the transaction in your background process (that's doing the 10,000 inserts an hour) because if it never commits then selects will block forever (on default settings).
正如其他人所说,写入数据库的数据量不是问题。 SQL Server 可以轻松处理比这多得多的数据。就我个人而言,我的表每小时可以处理数十万到数百万行而不会出现任何问题,并且人们整天都在阅读这些行而没有任何减速。
您可能需要通过更改读取语句的隔离级别或使用WITH (NOLOCK) 提示来了解脏读。
您应该考虑使用 .NET 中的批量上传对象将数据加载到数据库中。根据您在测试过程中看到的性能,使用 1000-5000 批次。您需要调整数字才能获得最佳性能。与逐行插入记录相比,向表中批量插入数据将带来显着更好的性能。确保您不会在一次事务中完成整个上传。您应该每批执行一个事务。
写入数据库时磁盘IO是什么样的。
您为数据库设置了什么恢复模式?数据库上的完整恢复比使用简单恢复模式需要更多的 IO。仅当您确实需要附带的时间点恢复时才使用完整恢复。
As others have said, the amount of data that you are writing into the database isn't an issue. SQL Server can easily handle much more data than that. Personally I've got tables that take hundreds of thousands to millions of rows per hour without issue, and people are reading the rows all day without any slow down.
You may need to look at doing dirty reads by changing the isolation level of the read statements, or using the WITH (NOLOCK) hint.
You should look at using the bulk upload object in .NET to load your data into the database. Use batches of 1000-5000 depending on the performance that you see during testing. You'll need to play with the number to get the best performance. Bulk inserting data into the table will give you a dramatically better performance than inserting the records row by row. Make sure that you don't do the entire upload in a single transaction. You should do one transaction per batch.
What does the disk IO look like when you are writing into the database.
What recovery model do you have set for the database? FULL recovery on the database will require much more IO than using the SIMPLE recovery mode. Only use FULL recovery if you actually need the point in time restores that come with it.
每秒 3 次插入不会给任何 DBMS 带来压力,除非每次插入操作中要插入的数据量是惊人的。同样,每秒 10 次读取不太可能对任何有能力的 DBMS 造成过度压力,除非有一些您没有提到的复杂因素(例如“读取是整个 DBMS 上的聚合的聚合,在一段时间后将积累数十亿条记录”) ... 嗯,前 10 亿条记录需要 100,000 小时,大约是 4,000 天,或者大约 10 年。)
Under 3 inserts per second is not going to give any DBMS a workout unless the amount of data to be inserted in each insert operation is phenomenal. Likewise, 10 reads per second is unlikely to over-stress any competent DBMS unless there is some complicating factor that you've not mentioned (such as 'the reads are aggregates of aggregates over the entire DBMS which will accumulate billions of records after a period of ... well, 100,000 hours for the first billion records, which is roughly 4,000 days, or roughly 10 years').
在 Joel 的回答的后续过程中,您可能需要考虑为索引上的 PAD_INDEX 和 FILLFACTOR 设置适当的值。如果您没有指定这些选项,您的插入可能会对索引进行大量的重新分页,这会显着减慢您的写入时间。
In a follow-up to Joel's answer, you may need to look at setting appropriate values for PAD_INDEX and FILLFACTOR on your indexes. If you haven't specified those options, your inserts may be doing a lot of re-paginating on your indexes, which would slow down your write times significantly.