PostgreSQL 中不同表(同一数据库)的并发查询
谁能帮我理解 PostgreSQL 的内部结构吗?如果我从两个客户端访问两个单独的表(同一数据库),则与在不同时间对单独的表执行查询相比,查询响应时间会增加。我很困惑不同表上的同时查询如何导致更长的执行时间。理论上,如果我同时对不同的表执行查询,则两个客户端的表都会解锁,并且执行时间应该保持相同。
我想知道由于数据库相同而使用公共共享资源是否会产生任何开销。
请帮忙!
Could anyone please help me in understanding the internals of PostgreSQL? If I access two separate tables (same database) from two clients, the query response time increases as compared to when I execute queries on separate tables at different times. I am confused how the simultaneous queries on different tables are resulting into longer execution time. In theory, if I perform queries simultaneously on different tables, tables are unlocked for both clients and execution time should remain the same.
I am wondering if there is any overhead due to usage of common shared resources because the database is same.
Please help!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
好吧,让我们看一个最基本的案例。
您有两个大表存储在磁盘上。
您可以针对这两个单独的表运行两个查询。
但是,虽然它们是由数据库表示的两个独立的逻辑表,但它们实际上共享同一磁盘。
因此,当每个单独的 postgres 进程读取每个表时,磁盘头会来回滑动和跳舞,试图提供数据,因此每个进程都在磁盘驱动器(单一共享资源)上相互争斗。
这就像一家银行有两个出纳窗口和两条线路,但只有一名出纳员在后面完成所有工作。
这并没有考虑到任何其他可能会减慢或加快查询速度的因素。这只是可能发生的一种情况的基本示例。
Well, let's take the most basic case.
You have two, large, tables stored on the disk.
You them run two queries against those two separate tables.
But, while they're two separate logical tables as represented by the DB, they are in fact sharing the same disk.
So, as each individual postgres process reads each table, the disk head is skating and dancing back and forth trying to serve up the data, and thus each process is fighting with each other on the disk drive, the single shared resource.
It's like have a bank with two teller windows and two lines, but only one teller in the back doing all of the work.
This doesn't take in to account any of the other zillion factors that could be coming in to play to slow or speed your query. Just the a basic example of one case that could be happening.
这里有很多事情需要记住。
首先,顺序磁盘 I/O 和随机磁盘 I/O 之间存在巨大差异。使用随机磁盘 I/O,您无法让操作系统帮助进行预取,但可以进行顺序访问。因此,在顺序读取中,操作系统将提取比我们最初读取的更多的数据,并且当操作系统从另一个查询的另一个表中提取数据时,我们将获取它。在随机访问中,您会得到所描述的效果,但在随机访问中,您必须等待磁盘头移动,而不会出现并发问题,因为您正在进行随机读取。
第二个要记住的事情是不同的查询计划有不同的 I/O 配置文件。如果我们从磁盘上拉出表的 10%,并且我们有一个索引,我们可能会选择顺序加载索引,然后按逻辑顺序遍历以查找记录(当磁盘正在执行操作时)对于另一个查询),然后接受从磁盘随机访问几页的开销。这当然涉及这种等待头部移动的过程,并且可能存在也可能不存在并发问题,这可能会使情况变得更糟。因此,完全有可能查询 1 将执行索引扫描,而查询 2 将执行顺序扫描,并在查询 1 处理索引时提取所需的大部分数据。然后,也许您会遇到威尔描述的一些问题,但可能不多。
最后一件事,也是非常关键的,是缓存。数据库倾向于缓存大量记录,因为这完全避免了磁盘 I/O。所以在这种情况下,你实际上可能会得到一些非常不同的东西。查询 1 可能从内存或大部分从内存运行,而查询 2 可能会访问磁盘。一般来说,如果您有足够的内存,您感兴趣的大部分数据都可以轻松地放入内存中,并为其他软件(如内核)腾出空间,那么所有可能的磁盘 I/O 问题都不会成为实际问题,并且您通常访问磁盘的唯一时间是提交 WAL 段。
所以答案是“视情况而定”。这取决于您的系统。这取决于您的 RAM 和数据大小。这取决于您的硬盘驱动器和操作系统。这取决于具体的查询。这取决于其他使用模式。这取决于。
There are a bunch of things to keep in mind here.
First, there is a huge difference between sequential and random disk I/O. With random disk I/O you don't get the operating system to help with pre-fetch, but you do with sequential access. So in a sequential read the operating system will pull in much more data than we initially read and we will get to it while the operating system pulls data from the other table for the other query. In random access, you get the effect described, but in random access, you get to wait for disk heads to move anyway without the concurrency issue because well, you are doing random reads.
The second thing to keep in mind is that different query plans have different I/O profiles. If we are pulling, say, 10% of the pages of the table off of disk, and we have an index, we might choose to sequentially load the index and then traverse in logical order to find our records (while the disk is doing stuff for the other query), and then accept the overhead to just pull a few pages random-access from the disk. This of course involves this sort of waiting for heads to move, and there may or may not be concurrency issues that may make that worse. So it is entirely possible that query 1 will do an index scan and query 2 will do a sequential scan, and will pull most of the data needed while query 1 is processing the index. Then maybe you get some of of the issues Will describes, but probably not much.
The final thing, and this is really critical, is caching. Databases tend to cache a lot of records because this avoids the disk I/O altogether. So in this case, you may actually have something very different. Query 1 might work from memory or mostly from memory while query 2 might hit the disk. In general if you have enough memory that most of the data you are interested in fits in memory comfortably, with room to spare for other pieces of software like the kernel, then none of the disk I/O problems that are possible will be actual problems, and the only time you will be typically hitting disk will be to commit the WAL segments.
So the answer is "it depends." It depends on your system. It depends on your RAM and data size. It depends on your hard drives and operating system. It depends on the specific queries. It depends on other use patterns. It depends.