Haskell 网络性能不佳
我正在编写一些“类似 openvpn”的东西,并认为它是提高我的 Haskell 知识的一个很好的选择。然而,我遇到了相当严重的性能问题。
它的作用:打开一个 TUN 设备;它将自身绑定在 UDP 端口上,启动 2 个线程(forkIO,但由于 fdRead 而使用 -threaded 进行编译)。我没有使用tuntap包,而是完全在Haskell中自己完成的。
线程1:从tun设备读取数据包(fdRead)。使用 UDP 套接字发送。
线程2:从UDP套接字读取数据包(recv);将其发送到 tun 设备 (fdWrite)
问题 1:在此配置中,fdRead 返回字符串,并且我使用了接受字符串的 Network.Socket 函数。我在本地系统上进行了配置(一些 iptables 魔法),我可以通过它在本地主机上运行 15MB/s,该程序基本上在 100% CPU 上运行。那很慢。我可以做些什么来提高性能吗?
问题 2:我必须在我发送的数据包前面添加一些内容;然而sendMany网络函数只接受ByteString;从 Fd 读取返回字符串。转换相当慢。转换为 Handle 似乎在 TUN 设备上工作得不够好......
问题 3:我想在 Data.Heap(功能堆)中存储一些信息(我需要使用“takeMin”,尽管对于 3 个项目这是矫枉过正,很容易做到:))。因此,我创建了一个 MVar,在每个收到的数据包上,我从 MVar 中提取堆,用新信息更新堆,然后将其放回到 MVar 中。现在,事情开始消耗大量内存。可能是因为旧堆没有足够快/足够频繁地收集垃圾......?
有办法解决这些问题还是我必须回到C...?我所做的应该主要是零复制操作 - 我是否使用了错误的库来实现它?
==================
我做了什么: - 当放入 MVar 时,做了:
a `seq` putMVar mvar a
这完美地帮助解决了内存泄漏。
- 改为ByteString;现在,当仅使用“读/写”而不进行进一步处理时,我的速度为 42MB/s。 C 版本的速度约为 56MB/s,因此这是可以接受的。
I am programming some 'openvpn-like' thing and thought it would be a good candidate to improve my Haskell knowledge. However, I ran into quite severe performance problems.
What it does: It opens a TUN device; it binds itself on an UDP port, starts 2 threads (forkIO, however compiled with -threaded because of the fdRead). I have not used the tuntap package and did it myself completely in Haskell.
thread 1: read a packet (fdRead) from a tun device. Send it using UDP socket.
thread 2: read a packet (recv) from an UDP socket; send it to tun device (fdWrite)
Problem 1: In this configuration fdRead returns String and I have used the Network.Socket functions that accept String. I made a configuration on local system (some iptables magic) and I can run 15MB/s through it on localhost, the program run basically on 100% CPU. That's slow. Is there anything I could do to improve the performance?
Problem 2: I will have to prepend something to the packets I am sending; however the sendMany network function takes only ByteString; reading from Fd returns String. Conversion is pretty slow. Converting to Handle doesn't seem to work well enough with the TUN device....
Problem 3: I wanted to store some information in Data.Heap (functional heap) (I need to use the 'takeMin' and although for 3 items it is overkill, it is easy to do :) ). So I created an MVar and on each received packet I've pulled the Heap from the MVar, updated the Heap with new info and put it back inito the MVar Now the thing simply starts to eat A LOT of memory. Probably because the old heaps don't get garbage collected soon/frequently enough..?
Is there a way to solve these problems or do I have to get back to C...? What I am doing should be mostly zerocopy operation - am I using wrong libraries to achieve it?
==================
What I did:
- when putting to MVar, did:
a `seq` putMVar mvar a
That perfectly helped with the memory leak.
- changed to ByteString; now I get 42MB/s when using just 'read/write' with no further processing. The C version does about 56MB/s so this is acceptable.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
字符串很慢。真的,真的,真的很慢。它是一个 cons 单元的单链接列表,每个单元包含一个 unicode 字符。将一个字符写入套接字需要将每个字符转换为字节,将这些字节复制到数组中,并将该数组传递给系统调用。这其中的哪一部分听起来像您想做的事情? :)
你想专门使用 ByteString。 ByteString IO 函数实际上尽可能使用零复制 IO。特别要注意 hackage 上的 network-bytestring 包。它包含所有网络库的版本,这些版本都经过优化,可以与 ByteString 有效地配合使用。
String is slow. Really, really, really slow. It's a singly-linked list of cons cells containing one unicode character each. Writing one to a socket requires converting each character to bytes, copying those bytes into an array, and handing that array to the system call. What part of this sounds like what you want to be doing? :)
You want to be using ByteString exclusively. The ByteString IO functions actually use zero-copy IO where possible. Especially look at the network-bytestring package on hackage. It contains versions of all the network libraries that are optimized to work efficiently with ByteString.
下面是两个示例程序:客户端和服务器。使用 GHC 7.0.1 和 network-2.3,在我漂亮的新双核笔记本电脑上,我通过环回获得了超过 7500 Mbps 的速度(总 CPU 使用率约为 90%)。我不知道 UDP 引入了多少开销,但这仍然是一个相当大的数字。
Below are two example programs: client and server. Using GHC 7.0.1 and network-2.3 I got more then 7500 Mbps over loopback, on my pretty new dual core laptop (~90% total CPU usage). I don't know how much overhead UDP introduces, but nevertheless this is quite a number.
卡尔对于你的前两个问题是正确的。关于最后一个,请考虑使用严格并发包。
Carl is right with regards to your first two questions. About your last one, consider using the strict concurrency package.