The select() call has you create three bitmasks to mark which sockets and file descriptors you want to watch for reading, writing, and errors, and then the operating system marks which ones in fact have had some kind of activity; poll() has you create a list of descriptor IDs, and the operating system marks each of them with the kind of event that occurred.
The select() method is rather clunky and inefficient.
There are typically more than a thousand potential file descriptors available to a process. If a long-running process has only a few descriptors open, but at least one of them has been assigned a high number, then the bitmask passed to select() has to be large enough to accomodate that highest descriptor — so whole ranges of hundreds of bits will be unset that the operating system has to loop across on every select() call just to discover that they are unset.
Once select() returns, the caller has to loop over all three bitmasks to determine what events took place. In very many typical applications only one or two file descriptors will get new traffic at any given moment, yet all three bitmasks must be read all the way to the end to discover which descriptors those are.
Because the operating system signals you about activity by rewriting the bitmasks, they are ruined and are no longer marked with the list of file descriptors you want to listen to. You either have to rebuild the whole bitmask from some other list that you keep in memory, or you have to keep a duplicate copy of each bitmask and memcpy() the block of data over on top of the ruined bitmasks after each select() call.
So the poll() approach works much better because you can keep re-using the same data structure.
In fact, poll() has inspired yet another mechanism in modern Linux kernels: epoll() which improves even more upon the mechanism to allow yet another leap in scalability, as today's servers often want to handle tens of thousands of connections at once. This is a good introduction to the effort:
While this link has some nice graphs showing the benefits of epoll() (you will note that select() is by this point considered so inefficient and old-fashioned that it does not even get a line on these graphs!):
The basic difference is that select()'s fd_set is a bit mask and therefore has some fixed size. It would be possible for the kernel to not limit this size when the kernel is compiled, allowing the application to define FD_SETSIZE to whatever it wants (as the comments in the system header imply today) but it takes more work. 4.4BSD's kernel and the Solaris library function both have this limit. But I see that BSD/OS 2.1 has now been coded to avoid this limit, so it's doable, just a small matter of programming. :-) Someone should file a Solaris bug report on this, and see if it ever gets fixed.
With poll(), however, the user must allocate an array of pollfd structures, and pass the number of entries in this array, so there's no fundamental limit. As Casper notes, fewer systems have poll() than select, so the latter is more portable. Also, with original implementations (SVR3) you could not set the descriptor to -1 to tell the kernel to ignore an entry in the pollfd structure, which made it hard to remove entries from the array; SVR4 gets around this. Personally, I always use select() and rarely poll(), because I port my code to BSD environments too. Someone could write an implementation of poll() that uses select(), for these environments, but I've never seen one. Both select() and poll() are being standardized by POSIX 1003.1g.
October 2017 Update:
The email referenced above is at least as old as 2001; the poll() command is now (2017) supported across all modern operating systems - including BSD. In fact, some people believe that select()should be deprecated. Opinions aside, portability issues around poll() are no longer a concern on modern systems. Furthermore, epoll() has since been developed (you can read the man page), and continues to rise in popularity.
For modern development you probably don't want to use select(), although there's nothing explicitly wrong with it. poll(), and it's more modern evolution epoll(), provide the same features (and more) as select() without suffering from the limitations therein.
Both of them are slow and mostly the same, But different in size and some kind of features!
When you write an iterator, You need to copy the set of select every time! While poll has fixed this kind of problem to have beautiful code. Another difference is that poll can handle more than 1024 file descriptors (FDs) by default. poll can handle different events to make the program more readable instead of having a lot of variables to handle this kind of job. Operations in poll and select is linear and slow because of having a lot of checks.
发布评论
评论(3)
select()
调用让您创建三个位掩码来标记您想要监视哪些套接字和文件描述符的读取、写入和错误,然后操作系统标记哪些套接字和文件描述符实际上具有某种类型活动;poll()
让您创建一个描述符 ID 列表,操作系统用发生的事件的种类来标记每个描述符 ID。select()
方法相当笨重且效率低下。通常有超过一千个进程可用的潜在文件描述符。 如果一个长时间运行的进程只打开了几个描述符,但至少其中一个已被分配了较高的数字,则传递给 select() 的位掩码必须足够大才能容纳最高的描述符描述符 - 因此数百位的整个范围将被取消设置,操作系统必须在每次
select()
调用上循环才能发现它们被取消设置。一旦
select()
返回,调用者必须循环遍历所有三个位掩码以确定发生了什么事件。 在许多典型应用程序中,在任何给定时刻只有一两个文件描述符会获得新流量,但必须一直读取所有三个位掩码直至最后才能发现它们是哪些描述符。由于操作系统通过重写位掩码向您发出有关活动的信号,因此它们被破坏并且不再标记有您想要侦听的文件描述符列表。 您要么必须从内存中保存的其他列表重建整个位掩码,要么必须在损坏的数据之上保留每个位掩码和
memcpy()
数据块的副本每次select()
调用后的位掩码。因此,
poll()
方法效果更好,因为您可以继续重复使用相同的数据结构。事实上,
poll()
启发了现代 Linux 内核中的另一种机制:epoll()
,它对该机制进行了更多改进,以实现可扩展性的又一次飞跃,就像今天的 epoll() 一样。服务器通常希望同时处理数万个连接。 这是对这项工作的一个很好的介绍:http://scotdoyle.com/python-epoll-howto。虽然
此链接有一些漂亮的图表,显示了
epoll()
的优点(您会注意到select()
到目前为止被认为效率低下且陈旧- 时尚的是,它甚至在这些图表上都没有一条线!):http://lse.sourceforge .net/epoll/index.html
更新:这是另一个 Stack Overflow 问题,其答案提供了有关差异的更多详细信息:
Twisted 中 select/poll 与 epoll 反应器的注意事项
The
select()
call has you create three bitmasks to mark which sockets and file descriptors you want to watch for reading, writing, and errors, and then the operating system marks which ones in fact have had some kind of activity;poll()
has you create a list of descriptor IDs, and the operating system marks each of them with the kind of event that occurred.The
select()
method is rather clunky and inefficient.There are typically more than a thousand potential file descriptors available to a process. If a long-running process has only a few descriptors open, but at least one of them has been assigned a high number, then the bitmask passed to
select()
has to be large enough to accomodate that highest descriptor — so whole ranges of hundreds of bits will be unset that the operating system has to loop across on everyselect()
call just to discover that they are unset.Once
select()
returns, the caller has to loop over all three bitmasks to determine what events took place. In very many typical applications only one or two file descriptors will get new traffic at any given moment, yet all three bitmasks must be read all the way to the end to discover which descriptors those are.Because the operating system signals you about activity by rewriting the bitmasks, they are ruined and are no longer marked with the list of file descriptors you want to listen to. You either have to rebuild the whole bitmask from some other list that you keep in memory, or you have to keep a duplicate copy of each bitmask and
memcpy()
the block of data over on top of the ruined bitmasks after eachselect()
call.So the
poll()
approach works much better because you can keep re-using the same data structure.In fact,
poll()
has inspired yet another mechanism in modern Linux kernels:epoll()
which improves even more upon the mechanism to allow yet another leap in scalability, as today's servers often want to handle tens of thousands of connections at once. This is a good introduction to the effort:http://scotdoyle.com/python-epoll-howto.html
While this link has some nice graphs showing the benefits of
epoll()
(you will note thatselect()
is by this point considered so inefficient and old-fashioned that it does not even get a line on these graphs!):http://lse.sourceforge.net/epoll/index.html
Update: Here is another Stack Overflow question, whose answer gives even more detail about the differences:
Caveats of select/poll vs. epoll reactors in Twisted
我认为这回答了您的问题:
2017 年 10 月更新:
上面引用的电子邮件至少可以追溯到 2001 年; 现在(2017 年)所有现代操作系统(包括 BSD)都支持
poll()
命令。 事实上,有些人认为select()
应该被弃用。 抛开观点不谈,围绕poll()
的可移植性问题不再是现代系统的问题。 此外,epoll()
已经被开发出来(你可以阅读手册页),并且受欢迎程度持续上升。对于现代开发,您可能不想使用
select()
,尽管它没有任何明显的错误。poll()
,它是epoll()
的更现代的演变,提供与select()
相同的功能(甚至更多),而不会受到其中的限制。I think that this answers your question:
October 2017 Update:
The email referenced above is at least as old as 2001; the
poll()
command is now (2017) supported across all modern operating systems - including BSD. In fact, some people believe thatselect()
should be deprecated. Opinions aside, portability issues aroundpoll()
are no longer a concern on modern systems. Furthermore,epoll()
has since been developed (you can read the man page), and continues to rise in popularity.For modern development you probably don't want to use
select()
, although there's nothing explicitly wrong with it.poll()
, and it's more modern evolutionepoll()
, provide the same features (and more) asselect()
without suffering from the limitations therein.它们都慢并且大部分相同,但是大小和某些功能不同!
当你编写迭代器时,你每次都需要复制
select
集合! 而poll
已经修复了此类问题,拥有漂亮的代码。 另一个区别是,默认情况下,poll
可以处理超过 1024 个文件描述符 (FD)。poll
可以处理不同的事件,使程序更具可读性,而不是用很多变量来处理此类工作。 由于需要进行大量检查,poll
和select
中的操作是线性且缓慢的。Both of them are slow and mostly the same, But different in size and some kind of features!
When you write an iterator, You need to copy the set of
select
every time! Whilepoll
has fixed this kind of problem to have beautiful code. Another difference is thatpoll
can handle more than 1024 file descriptors (FDs) by default.poll
can handle different events to make the program more readable instead of having a lot of variables to handle this kind of job. Operations inpoll
andselect
is linear and slow because of having a lot of checks.