ARM 上的 Ruby on Rails 性能
我想知道我们是否可以用一些等效的基于 ARM 的设备替换运行 Rails (ruby 1.8.6...) web 应用程序的基于 Atom N270 的上网本(我们喜欢无风扇设置、功耗等)。
ARM 设备是 XScale-PXA270 @ 520,128MB(可能还有一些较慢的 SDRAM),运行 Linux,总是有足够的可用内存,其性能与越狱的 iPhone 相当。
对生产数据库 (SQLite) 进行基准测试给了我们有希望的结果(ARM 只是 慢 20-30%),所以我尝试构建 ruby(1.9.2p0)。
Rails 应用程序在 ARM 上运行速度非常慢(从 sql 获取并生成模板速度慢 10-20 倍)。我决定运行一些基准测试来查找瓶颈。
同样,有些结果还不错(与我们现在使用的较旧的 ruby 1.8.6 相当,比 ruby 1.9.2 慢 6 倍),而有些结果非常慢(慢 20-30 倍)。铁。看起来哈希方法在 ARM 上慢了 40 倍。运行 Ruby Benchmark Suite 显示更多瓶颈、字符串、线程、数组...
我知道 ARM比 Atom 慢,我只是没想到会有如此巨大的差异,尤其是在 SQLite 运行良好之后。
ARM 上的 Ruby 是否存在一些缺陷,我是否需要应用一些补丁,如果我想使用 ARM 设备或者只是该设备没有足够的计算能力,这是没有希望的,应该用 C 重写整个应用程序吗?
示例
def fib(n)
return 1 if n < 2
fib(n-1)+fib(n-2)
end
Benchmark.bm do |x|
x.report { fib(32) }
x.report { fib(36) }
x.report { h = {}; (0..10**3).each {|i| h[i] = i} }
x.report { h = {}; (0..10**4).each {|i| h[i] = i} }
x.report { h = {}; (0..10**5).each {|i| h[i] = i} }
end
ruby -rbenchmark bench.rb
Atom N270,1GB
ruby 1.9.2p0 (2010-08-18) [i686-linux] user system total real 2.440000 0.000000 2.440000 ( 2.459400) 16.780000 0.030000 16.810000 ( 17.293015) 0.000000 0.000000 0.000000 ( 0.001180) 0.020000 0.000000 0.020000 ( 0.012180) 0.160000 0.000000 0.160000 ( 0.161803) ruby 1.8.6 (2008-08-11 patchlevel 287) [i686-linux] user system total real 12.500000 0.020000 12.520000 ( 12.628106) 84.450000 0.170000 84.620000 ( 85.879380) 0.010000 0.000000 0.010000 ( 0.002216) 0.040000 0.000000 0.040000 ( 0.032939) 0.240000 0.010000 0.250000 ( 0.255756)
XScale-PXA270 @ 520,128MB ruby 1.9.2p0 (2010-08-18) [arm-linux]
user system total real 12.470000 0.000000 12.470000 ( 12.526507) 85.480000 0.000000 85.480000 ( 85.939294) 0.060000 0.000000 0.060000 ( 0.060643) 0.640000 0.000000 0.640000 ( 0.642136) 6.460000 0.130000 6.590000 ( 6.605553)
构建:
./configure --host=arm-linux --without-X11 --disable-largefile \ --enable-socket=yes --without-Win32API --disable-ipv6 \ --disable-install-doc --prefix=/opt --with-openssl-include=/opt/include/ \ --with-openssl-lib=/opt/include/lib ENV: PFX=arm-iwmmxt-linux-gnueabi export DISCIMAGE="/opt" export CROSS_COMPILE="arm-linux-" export HOST="arm-linux" export TARGET="arm-linux" export CROSS_COMPILING=1 export CC=$PFX-gcc export CFLAGS="-O3 -I/opt/include" export LDFLAGS="-O3 -L/opt/lib/" #LIBS= #CPPFLAGS= export CXX=$PFX-g++ #CXXFLAGS= export CPP=$PFX-cpp export OBJCOPY="$PFX-objcopy" export LD="$PFX-ld" export AR="$PFX-ar" export RANLIB="$PFX-ranlib" export NM="$PFX-nm" export STRIP="$PFX-strip" export ac_cv_func_setpgrp_void=yes export ac_cv_func_isinf=no export ac_cv_func_isnan=no export ac_cv_func_finite=no
I was wondering if we could replace our Atom N270 based nettops that are running a Rails (ruby 1.8.6...) webapp with some equivalent ARM based device (we like the fanless setup, power consumption, etc.).
The ARM device was XScale-PXA270 @ 520, 128MB (and probably some slower SDRAMs), running linux, there was always enough free memory with comparable performance as a jailbroken iPhone.
Benchmarking the production database (SQLite) gave us promising results (ARM was just
20-30% slower), so I tried to build ruby (1.9.2p0).
The rails app was running very slowly on ARM (fetching from sql and generating templates 10-20x slower). I've decided run some benchmarks to find bottlenecks.
Again, some results were ok (on par with older ruby 1.8.6 we are using now, 6x slower than ruby 1.9.2), and some were very slow (20-30x slower). Fe. it looks that hash methods are 40x slower on ARM. Running Ruby Benchmark Suite showed more bottlenecks, strings, threads, arrays...
I knew ARM is slower than Atom, I was just not expecting such a huge difference, especially after SQLite was running fine.
Is there some flaw with Ruby on ARM, do I need to apply some patches, is this hopeless and should rewrite the whole app in C if I want to use the ARM device or just the device has not enough computing power?
Examples
def fib(n)
return 1 if n < 2
fib(n-1)+fib(n-2)
end
Benchmark.bm do |x|
x.report { fib(32) }
x.report { fib(36) }
x.report { h = {}; (0..10**3).each {|i| h[i] = i} }
x.report { h = {}; (0..10**4).each {|i| h[i] = i} }
x.report { h = {}; (0..10**5).each {|i| h[i] = i} }
end
ruby -rbenchmark bench.rb
Atom N270, 1GB
ruby 1.9.2p0 (2010-08-18) [i686-linux] user system total real 2.440000 0.000000 2.440000 ( 2.459400) 16.780000 0.030000 16.810000 ( 17.293015) 0.000000 0.000000 0.000000 ( 0.001180) 0.020000 0.000000 0.020000 ( 0.012180) 0.160000 0.000000 0.160000 ( 0.161803) ruby 1.8.6 (2008-08-11 patchlevel 287) [i686-linux] user system total real 12.500000 0.020000 12.520000 ( 12.628106) 84.450000 0.170000 84.620000 ( 85.879380) 0.010000 0.000000 0.010000 ( 0.002216) 0.040000 0.000000 0.040000 ( 0.032939) 0.240000 0.010000 0.250000 ( 0.255756)
XScale-PXA270 @ 520, 128MB
ruby 1.9.2p0 (2010-08-18) [arm-linux]
user system total real 12.470000 0.000000 12.470000 ( 12.526507) 85.480000 0.000000 85.480000 ( 85.939294) 0.060000 0.000000 0.060000 ( 0.060643) 0.640000 0.000000 0.640000 ( 0.642136) 6.460000 0.130000 6.590000 ( 6.605553)
Build with:
./configure --host=arm-linux --without-X11 --disable-largefile \ --enable-socket=yes --without-Win32API --disable-ipv6 \ --disable-install-doc --prefix=/opt --with-openssl-include=/opt/include/ \ --with-openssl-lib=/opt/include/lib ENV: PFX=arm-iwmmxt-linux-gnueabi export DISCIMAGE="/opt" export CROSS_COMPILE="arm-linux-" export HOST="arm-linux" export TARGET="arm-linux" export CROSS_COMPILING=1 export CC=$PFX-gcc export CFLAGS="-O3 -I/opt/include" export LDFLAGS="-O3 -L/opt/lib/" #LIBS= #CPPFLAGS= export CXX=$PFX-g++ #CXXFLAGS= export CPP=$PFX-cpp export OBJCOPY="$PFX-objcopy" export LD="$PFX-ld" export AR="$PFX-ar" export RANLIB="$PFX-ranlib" export NM="$PFX-nm" export STRIP="$PFX-strip" export ac_cv_func_setpgrp_void=yes export ac_cv_func_isinf=no export ac_cv_func_isnan=no export ac_cv_func_finite=no
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
您似乎在抱怨 Ruby 1.9.2 中的新优化(与 1.8.x 相比)是特定于 x86 的。 Ruby 1.8.x 的 Atom 和 ARM 性能相当。也许您可以询问特定于 Ruby 的邮件列表。快速搜索表明,是的,Ruby 1.9.x 中有很多变化:
也许正确的问题是“YARV 是否有 x86 特定优化?这些优化可以在 ARM 端口中重复吗? ”
It seems you're complaining that optimizations new in Ruby 1.9.2 (when compared to 1.8.x) are x86 specific. The Atom and ARM performance is comparable for Ruby 1.8.x. Perhaps you could ask a ruby-specific mailing list. A quick search shows that yes, there were many changes in Ruby 1.9.x:
Perhaps the right question is "Does YARV have x86 specific optimizations? Could these optimizations be duplicated in the ARM port?"
树莓派上的相同基准测试具有较新的软件包:
RP2 更新(2015 年):
RP3-B 更新(2017 年 - Raspian Jessie):
The same benchmark on raspberry pi with a bit newer packages:
Update for a RP2 (in 2015):
Update for a RP3-B (in 2017 - Raspian Jessie):
使用问题示例中引用的代码,这些是我在使用 armv6l 处理器运行 Raspbian 的 Raspberry Pi 上的结果:
Using the code quoted in the question's example, these are my results on a Raspberry Pi running Raspbian with an armv6l processor: