pinging约100,000台服务器,多线程或多处理更好?

发布于 2025-02-07 07:19:48 字数 866 浏览 2 评论 0 原文

我创建了一个简单的脚本,该脚本通过我需要ping和nslookup的服务器列表进行迭代。问题是,ping可能需要一些时间,尤其是在一天中的几秒钟内,ping ping的服务器更多。

我是编程的新手,我知道多处理或多线程可能是使我的工作更快的解决方案。

我的计划是获取我的服务器列表,要么1。将其分解为均匀尺寸的列表,其中列表的数量与线程 /进程匹配或2。如果这些选项之一支持它服务器名称到线程或进程完成了以前的ping和nslookup之后。这是最好的选择,因为它可以确保我花最少的时间,而列表1则具有200个离线服务器,而List 6具有2000,它将需要使用List 6等待该过程才能完成,即使所有其他程序都是免费的观点。

  1. 哪个是此任务的优越性,为什么?

可能现在

import subprocess
import time
server_file = open(r"myfilepath", "r")
initial_time = time.time()
for i in range(1000):
    print(server_file.readline()[0:-1]+ ' '+str(subprocess.run('ping '+server_file.readline()[0:-1]).returncode)) #This returns a string with the server name, and return code,
print(time.time()-initial_time)

出现问题是因为PING平均需要3秒钟以上。我也知道,不包含打印声明会使它更快,但我想对其进行小案例进行监视。我正在为100,000台服务器的效果ping,这将需要定期完成,列表将不断增长

I have created a simple script that iterates through a list of servers that I need to both ping, and nslookup. The issue is, pinging can take some time, especially pinging more server than that are seconds in a day.

Im fairly new to programming and I understand that multiprocessing or multithreading could be a solution to make my job run faster.

My plan is to take my server list and either 1. Break it into lists of even size, with the number of lists matching the threads / processes or 2. If one of these options support it, loop through the single list passing each a new server name to a thread or process after it finishes its previous ping and nslookup. This is preferable since it ensures I spend the least time, where as if list 1 has 200 offline servers and list 6 has 2000, it will need to wait for the process using list 6 to finish, even though all others would be free at that point.

  1. Which one is superior for this task and why?

  2. If possible, how would I make sure that each thread or process has essentially the same runtime

code snippet even though rather simple right now

import subprocess
import time
server_file = open(r"myfilepath", "r")
initial_time = time.time()
for i in range(1000):
    print(server_file.readline()[0:-1]+ ' '+str(subprocess.run('ping '+server_file.readline()[0:-1]).returncode)) #This returns a string with the server name, and return code,
print(time.time()-initial_time)

The issue arises because a failed ping takes over 3 seconds each on average. Also I am aware that not putting the print statement will make it faster, but I wanted to monitor it for a small case. I am pinging something to the effect of 100,000 servers, and this will need to be done routinely, and the list will keep growing

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

我三岁 2025-02-14 07:19:48

为了获得最佳性能,您都不想要;在100,000个活动作业中,最好使用 asynchronous 处理,在A 单个或可能的少数线程或过程中(但不超过可用核心的数量)。

使用异步I/O,可以在单个线程中执行许多网络任务,这很容易获得100,000或以上的速率,这是由于上下文切换上的节省(也就是说,您可以在1秒内从理论上ping 100,000台机器)。

Python支持异步I/O通过 asyncio asyncio a href =“ https://faculty.ai/blog/a-guide-to-using-asyncio/” rel =“ noreferrer”>这是 一个不错的简介Asyncio和Coroutines)。

不依赖 ping 之类的外部过程也很重要,因为产生新过程是一个非常昂贵的操作。

asyncio 完成的本机Python ping(请注意, ping 实际上是一对ICMP请求/回复数据包)。适应它可以同时执行多个ping。

For best performance you want neither; with 100,000 active jobs it's best to use asynchronous processing, in a single or possibly a handful of threads or processes (but not exceeding the number of available cores).

With async I/O many networking tasks can be performed in a single thread, easily achieving rates of 100,000 or more due to savings on context switching (that is, you can theoretically ping 100,000 machines in 1 second).

Python supports asynchronous I/O via asyncio (here's a nice intro into asyncio and coroutines).

It is also important to not depend on an external process like ping, because spawning a new process is a very costly operation.

aioping is an example of a native Python ping done using asyncio (note that a ping is actually a pair of ICMP request/reply packets). It should be easy to adapt it to perform multiple pings simultaneously.

难忘№最初的完美 2025-02-14 07:19:48

tldr; 多线程是您的解决方案 -
线程模块使用线程,多处理模块使用过程。
不同之处在于,线程在同一内存空间中运行,而进程具有单独的内存。

至于问题1-

对于IO任务,例如查询数据库或加载网页,CPU只是在等待答案,而是浪费资源,因此多线程是答案(:

对于问题2-

您只需创建线程池,可以管理它们同时运行,而无需您折叠头。

TLDR; MultiThreading is the solution for you-
The threading module uses threads, the multiprocessing module uses processes.
The difference is that threads run in the same memory space, while processes have separate memory.

As for question 1-

For IO tasks, like querying a database or loading a webpage the CPU is just doing nothing but waiting for an answer and that's a waste of resources, thus multithreading is the answer (:

As for question 2-

You can just create pool of threads that will manage them to run simultaneously without you needing to break your head.

嘦怹 2025-02-14 07:19:48

+1 yoel的答案。穿线绝对是必经之路。

我很好奇它实际节省了多少时间,所以只是为Google一遍又一遍地编写了一个脚本:

import subprocess  
import os

import threading
from multiprocessing import Process

FNULL = open(os.devnull, 'w')
def ping(host):
    command = ['ping', '-c', '1', host]

    return subprocess.call(command, stdout=FNULL, stderr=subprocess.STDOUT) == 0

def ping_hosts(hosts, i):
    for h in hosts:
        ping(h)
        # print "%d: %s" % (i, str(ping(h)))

hosts = ["www.google.com"] * 1000
num_threads = 5

for i in range(num_threads):
    ping_hosts(hosts, i)

#for i in range(num_threads):
#    p = Process(target=ping_hosts, args=(hosts, i))
#    p.start()

#for i in range(num_threads):
#    t = threading.Thread(target=ping_hosts, args=(hosts, i))
#    t.start()

结果:

# no threading no multiprocessing
$ time python ping_hosts.py  # 5000 in a row
real    0m34.657s
user    0m5.817s
sys 0m11.436s

# multiprocessing
$ time python ping_hosts.py
real    0m8.119s
user    0m6.021s
sys 0m16.365s

# threading
$ time python ping_hosts.py
real    0m8.392s
user    0m7.453s
sys 0m16.376s

显然测试中存在缺陷,但是很明显,您可以从添加任一个库中得到重大提升。请注意,节省大致相同。但是,正如Yoel所说的那样,由于大多数情况下,您大部分时间都在等待线程是必经之路。很容易将您的主机名称倒入队列中,并在其中搅动一堆工人线程。

+1 to Yoel's answer. Threading is definitely the way to go.

I was curious how much time it would actually save, so just wrote a script to ping google over and over:

import subprocess  
import os

import threading
from multiprocessing import Process

FNULL = open(os.devnull, 'w')
def ping(host):
    command = ['ping', '-c', '1', host]

    return subprocess.call(command, stdout=FNULL, stderr=subprocess.STDOUT) == 0

def ping_hosts(hosts, i):
    for h in hosts:
        ping(h)
        # print "%d: %s" % (i, str(ping(h)))

hosts = ["www.google.com"] * 1000
num_threads = 5

for i in range(num_threads):
    ping_hosts(hosts, i)

#for i in range(num_threads):
#    p = Process(target=ping_hosts, args=(hosts, i))
#    p.start()

#for i in range(num_threads):
#    t = threading.Thread(target=ping_hosts, args=(hosts, i))
#    t.start()

The results:

# no threading no multiprocessing
$ time python ping_hosts.py  # 5000 in a row
real    0m34.657s
user    0m5.817s
sys 0m11.436s

# multiprocessing
$ time python ping_hosts.py
real    0m8.119s
user    0m6.021s
sys 0m16.365s

# threading
$ time python ping_hosts.py
real    0m8.392s
user    0m7.453s
sys 0m16.376s

Obviously there are flaws in the test, but it's pretty clear that you get a significant boost from adding either library. Note that the savings are about the same. But, as Yoel said, since most of the time you just spend most of your time waiting threading is the way to go. Easy enough to just dump your host names into a queue and have a pool of worker threads churn through it.

情绪少女 2025-02-14 07:19:48

由于PINGS不是CPU密集型,因此多线程的运行速度比多处理更快。可以运行大量线程。以下是我的工作代码,

import subprocess
import threading
raw_list = []
def ping(host):
    raw_list.append(host+ ' '+ str((subprocess.run('ping '+host).returncode)))
with open(r'RedactedFilePath', "r") as server_list_file:
    hosts = server_list_file.read()
    hosts_list =hosts.split('\n')
num_threads = 75
number = 0
while number< len(hosts_list):
    print(number)
    for i in range(num_threads):
        t = threading.Thread(target=ping, args=(hosts_list[number+i],))
        t.start()
    t.join()
    number = number +75

可能会有一种更加Pythonic的ping方法,这可能会使它更快,因为它不会为每个单个启动子过程,

因此调整适当的线程数量是机器依赖性的,并且更加依赖于您是什么使用线程。

当然,当服务器列表不直接排除线程数时,应该有一个尝试语句来停止错误。

另外,应使用拆分函数来删除您的分离器,在我的情况下,我的服务器列表在新行中,每个线路都在新行中,但是csv将为.split(','),而tsv则为( '\ t')等

Multithreading can run faster than multiprocessing, since pings are not CPU intensive. It is possible to run a large number of threads. Below is my working code

import subprocess
import threading
raw_list = []
def ping(host):
    raw_list.append(host+ ' '+ str((subprocess.run('ping '+host).returncode)))
with open(r'RedactedFilePath', "r") as server_list_file:
    hosts = server_list_file.read()
    hosts_list =hosts.split('\n')
num_threads = 75
number = 0
while number< len(hosts_list):
    print(number)
    for i in range(num_threads):
        t = threading.Thread(target=ping, args=(hosts_list[number+i],))
        t.start()
    t.join()
    number = number +75

there may be a more pythonic way to ping, which may make it faster since it will not be launching subprocesses for every single one

Also, tuning the proper number of threads is machine dependent and further more dependent on what you are doing with the threads.

And of course, there should be a try statement to stop errors when the server list is not directly divisible by the number of threads.

Also, the split function should be used to remove your separator, in my case, my server list is in a text file with each one on a new line, but a csv would be .split(',') and tsv would be ('\t') etc

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文