如何改进大表的读取操作?
我有一个包含大约 3.5 亿行的巨大表,这些表时不时地更新一次(大约每行 - 每 5 分钟一次),并且更新是分批完成的(一次更新多行,例如一次 100 行)
现在在单独的应用程序我需要定期运行查询(比如 5 分钟),如果以正常方式完成,这将花费很长时间。
我需要关于如何以高效的方式做到这一点的建议。分析服务是否适合此类任务?
请提供您对此的意见/想法/评论。
问候 什鲁蒂
I have a huge table with around 350 million rows which are updated every now and then(approx every row - once in 5 mins) and the update is done in batches (updating multiple rows together like 100 rows at a time)
Now in a seperate application i need to run queries at regular intervals(say 5 mins) which are taking a long time if done in a normal way.
I need suggestions on how can i do it in a performance efficient way. Are analysis Services meant for such a task?
Please provide your inputs / thoughts / comments on this.
Regards
Shruti
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
1 - 确保您有正确的索引结构。请记住,索引会减慢写入和更新速度。通常,每当您插入或更新行时,EVERY 索引都会更新。这也可能导致锁定。
2 - 读取和写入之间的一致性有多重要?每个查询都具有上次更新中每一行的最新记录,这对任务至关重要吗?如果不是,您可以在 select 语句中使用诸如
WITH (NOLOCK)
之类的锁定提示,这将为您提供可能过时的记录(如果它们在您阅读时正在更新),但是将绕过行级锁,现在等待UPDATE
或INSERT
完成。1 - Make sure you have a proper index structure. Bear in mind that indexes slow down writes and updates. As a rule EVERY index is updated whenever you insert or update a row. This can lead to locking as well.
2 - How important is consistency between reads and writes? Is it mission-critical that each query have the most recent record from every row in the last update? If it's not, you can use locking hints such as
WITH (NOLOCK)
in your select statements which will give you potentially out-of-date records (if they are being updated as you are reading them) but will bypass row-level locks and now wait for theUPDATE
orINSERT
to complete.由于这是一个非常笼统的问题,我只能提供一个非常笼统的答复。
1)索引确实增加了插入等的开销,但与其他地方的性能提升相比,它们通常是微不足道的。使用适当的索引来提高应用程序的选择性能。
2)我推断你阅读的次数多于写作的次数。您可能会受益于让触发器为您预先计算中间结果并将其存储在另一个表中。根据写入的性质,您可能需要从头开始删除重新计算,或者只是计算更改的影响(使用
deleted
和inserted
表)。编辑
如果您正在监视特定写入;在表上创建一个触发器。
然后您只需要检查正在写入的数据,而不是整个表。
As this is an extremely generic question, I can only provide an extremely generic reply.
1) Indexes do add overhead to Inserts etc, but they are often marginal compared to the performance gains elsewhere. Use appropriate indexes to improve your apps select performance.
2) I'm inferring that you're reading more often that you're writing. You may benefit from having triggers pre-calculate an intermediate result for you, stored in another table. Depending on the nature of the writes, you may need to delete an re-calculate from scratch, or just calculate the effects of the change (using both the
deleted
andinserted
tables).EDIT
If you're monitoring for particular write; create a trigger on the table.
You then only need to check the data being written, not the whole table.
当你这样做时,不要在生产中使用 SELECT * 查询,尤其是当你有连接时。您返回的数据超出了您的需要(连接字段包含相同的数据),这浪费了网络和服务器资源。每次运行查询时,需要额外花十秒时间仅放入所需的字段,以便获得更好的查询性能。
然后查看您正在运行的查询以获得其他性能增强。有很多方法可以编写性能不佳的查询,并且在您所描述的系统中,您根本无法这样做。给自己找一本有关性能调优的书,并仔细阅读有关调优查询的章节。然后阅读其余部分。在这么大的系统中,性能调优是一项关键技能。当您处理这么大的事情时,您不会因为不了解数据库概念和内部结构而感到困惑,也不会再深入查询。
And while you are at it, do NOT ever use SELECT * queries on production especially when you have joins. You are returning more data than you need (the join fields contain the same data) which is wasteful of network and server resources. Take the extra ten seconds once to put in only the fields you need to have a better performing query every time it is run.
Then look at the queries you are running for other performance enhancements. There are a lot of ways you can write a badly performing query and in a system as you describe, you simply can't afford to do that. Get yourself a book on performance tuning and thoroughly read the chaptes on tuning queries. Then read the rest. In a system this large, performance tuning is a critical skill. You can't get awy with not knowing database concepts and internals and querying in depth anymore when you work on something this large.
也许您也可以考虑这样做这个 (尽管如果您无法更改数据库结构,则不太可能):
备注:在这种情况下 1) selects 不会阻止更新,反之亦然 2) 您将“看到最新提交的数据” 3) 存储
tempdb
中的行版本。请参阅这篇好帖子的最后几段。
另请参阅:选择基于行版本控制的隔离级别
Perhaps could you also consider doing this (though not likely if you cannot change the database structure):
Remarks: in this case 1) selects will not block updates and vice versa 2) you will "see the latest committed data" 3) there is an extra overhead to store the row versions in
tempdb
.See the last paragraphs of this good post.
See also: Choosing Row Versioning-based Isolation Levels