独角兽吃记忆

发布于 2024-12-18 15:28:17 字数 4368 浏览 3 评论 0原文

我在亚马逊有一个 m1.small 实例,有 8GB 硬盘空间,我的 Rails 应用程序在其上运行。它顺利运行了两周,之后就崩溃了,说内存已满。 应用程序在 Rails 3.1.1、unicorn 和 nginx 上运行

我根本不明白什么占用了 13G ?
我杀死了独角兽,“free”命令显示了一些可用空间,而 df 仍然显示 100%
我重新启动了实例,一切开始正常工作。

free(杀死独角兽之前)

             total       used       free     shared    buffers     cached  
Mem:       1705192    1671580      33612          0     321816     405288  
-/+ buffers/cache:     944476     760716   
Swap:       917500      50812     866688 

df -l (杀死独角兽之前)

Filesystem           1K-blocks      Used Available Use% Mounted on  
/dev/xvda1             8256952   7837520         4 100% /  
none                    847464       120    847344   1% /dev  
none                    852596         0    852596   0% /dev/shm  
none                    852596        56    852540   1% /var/run  
none                    852596         0    852596   0% /var/lock  
/dev/xvda2           153899044    192068 145889352   1% /mnt  
/dev/xvdf             51606140  10276704  38707996  21% /data  

sudo du -hc --max-depth=1 (杀死独角兽之前)

28K ./root  
6.6M    ./etc  
4.0K    ./opt  
9.7G    ./data  
1.7G    ./usr  
4.0K    ./media  
du: cannot access `./proc/27220/task/27220/fd/4': No such file or directory  
du: cannot access `./proc/27220/task/27220/fdinfo/4': No such file or directory  
du: cannot access `./proc/27220/fd/4': No such file or directory  
du: cannot access `./proc/27220/fdinfo/4': No such file or directory  
0   ./proc  
14M ./boot  
120K    ./dev  
1.1G    ./home  
66M ./lib  
4.0K    ./selinux  
6.5M    ./sbin  
6.5M    ./bin  
4.0K    ./srv  
148K    ./tmp  
16K ./lost+found  
20K ./mnt  
0   ./sys  
253M    ./var  
13G .  
13G total   

free (杀死独角兽之后 )杀死独角兽)

             total       used       free     shared    buffers     cached    
Mem:       1705192     985876     **719316**          0     365536     228576    
-/+ buffers/cache:     391764    1313428    
Swap:       917500      46176     871324  

df -l (杀死独角兽后)

Filesystem           1K-blocks      Used Available Use% Mounted on  
/dev/xvda1             8256952   7837516         8 100% /  
none                    847464       120    847344   1% /dev  
none                    852596         0    852596   0% /dev/shm  
none                    852596        56    852540   1% /var/run  
none                    852596         0    852596   0% /var/lock  
/dev/xvda2           153899044    192068 145889352   1% /mnt  
/dev/xvdf             51606140  10276704  38707996  21% /data  

unicorn.rb

rails_env = 'production'  

working_directory "/home/user/app_name"  
worker_processes 5  
preload_app true  
timeout 60  

rails_root = "/home/user/app_name"  
listen "#{rails_root}/tmp/sockets/unicorn.sock", :backlog => 2048  
# listen 3000, :tcp_nopush => false  

pid "#{rails_root}/tmp/pids/unicorn.pid"  
stderr_path "#{rails_root}/log/unicorn/unicorn.err.log"  
stdout_path "#{rails_root}/log/unicorn/unicorn.out.log"  

GC.copy_on_write_friendly = true if GC.respond_to?(:copy_on_write_friendly=)  

before_fork do |server, worker|  
  ActiveRecord::Base.connection.disconnect!  

  ##  
  # When sent a USR2, Unicorn will suffix its pidfile with .oldbin and  
  # immediately start loading up a new version of itself (loaded with a new  
  # version of our app). When this new Unicorn is completely loaded  
  # it will begin spawning workers. The first worker spawned will check to  
  # see if an .oldbin pidfile exists. If so, this means we've just booted up  
  # a new Unicorn and need to tell the old one that it can now die. To do so  
  # we send it a QUIT.  
  #  
  # Using this method we get 0 downtime deploys.  

  old_pid = "#{rails_root}/tmp/pids/unicorn.pid.oldbin"  
  if File.exists?(old_pid) && server.pid != old_pid  
    begin  
      Process.kill("QUIT", File.read(old_pid).to_i)  
    rescue Errno::ENOENT, Errno::ESRCH  
      # someone else did our job for us  
    end  
  end  
end  


after_fork do |server, worker|  
  ActiveRecord::Base.establish_connection  
  worker.user('rails', 'rails') if Process.euid == 0 && rails_env == 'production'  
end  

I have a m1.small instance in amazon with 8GB hard disk space on which my rails application runs. It runs smoothly for 2 weeks and after that it crashes saying the memory is full.
App is running on rails 3.1.1, unicorn and nginx

I simply dont understand what is taking 13G ?
I killed unicorn and 'free' command is showing some free space while df is still saying 100%
I rebooted the instance and everything started working fine.

free (before killing unicorn)

             total       used       free     shared    buffers     cached  
Mem:       1705192    1671580      33612          0     321816     405288  
-/+ buffers/cache:     944476     760716   
Swap:       917500      50812     866688 

df -l (before killing unicorn)

Filesystem           1K-blocks      Used Available Use% Mounted on  
/dev/xvda1             8256952   7837520         4 100% /  
none                    847464       120    847344   1% /dev  
none                    852596         0    852596   0% /dev/shm  
none                    852596        56    852540   1% /var/run  
none                    852596         0    852596   0% /var/lock  
/dev/xvda2           153899044    192068 145889352   1% /mnt  
/dev/xvdf             51606140  10276704  38707996  21% /data  

sudo du -hc --max-depth=1 (before killing unicorn)

28K ./root  
6.6M    ./etc  
4.0K    ./opt  
9.7G    ./data  
1.7G    ./usr  
4.0K    ./media  
du: cannot access `./proc/27220/task/27220/fd/4': No such file or directory  
du: cannot access `./proc/27220/task/27220/fdinfo/4': No such file or directory  
du: cannot access `./proc/27220/fd/4': No such file or directory  
du: cannot access `./proc/27220/fdinfo/4': No such file or directory  
0   ./proc  
14M ./boot  
120K    ./dev  
1.1G    ./home  
66M ./lib  
4.0K    ./selinux  
6.5M    ./sbin  
6.5M    ./bin  
4.0K    ./srv  
148K    ./tmp  
16K ./lost+found  
20K ./mnt  
0   ./sys  
253M    ./var  
13G .  
13G total   

free (after killing unicorn)

             total       used       free     shared    buffers     cached    
Mem:       1705192     985876     **719316**          0     365536     228576    
-/+ buffers/cache:     391764    1313428    
Swap:       917500      46176     871324  

df -l (after killing unicorn)

Filesystem           1K-blocks      Used Available Use% Mounted on  
/dev/xvda1             8256952   7837516         8 100% /  
none                    847464       120    847344   1% /dev  
none                    852596         0    852596   0% /dev/shm  
none                    852596        56    852540   1% /var/run  
none                    852596         0    852596   0% /var/lock  
/dev/xvda2           153899044    192068 145889352   1% /mnt  
/dev/xvdf             51606140  10276704  38707996  21% /data  

unicorn.rb

rails_env = 'production'  

working_directory "/home/user/app_name"  
worker_processes 5  
preload_app true  
timeout 60  

rails_root = "/home/user/app_name"  
listen "#{rails_root}/tmp/sockets/unicorn.sock", :backlog => 2048  
# listen 3000, :tcp_nopush => false  

pid "#{rails_root}/tmp/pids/unicorn.pid"  
stderr_path "#{rails_root}/log/unicorn/unicorn.err.log"  
stdout_path "#{rails_root}/log/unicorn/unicorn.out.log"  

GC.copy_on_write_friendly = true if GC.respond_to?(:copy_on_write_friendly=)  

before_fork do |server, worker|  
  ActiveRecord::Base.connection.disconnect!  

  ##  
  # When sent a USR2, Unicorn will suffix its pidfile with .oldbin and  
  # immediately start loading up a new version of itself (loaded with a new  
  # version of our app). When this new Unicorn is completely loaded  
  # it will begin spawning workers. The first worker spawned will check to  
  # see if an .oldbin pidfile exists. If so, this means we've just booted up  
  # a new Unicorn and need to tell the old one that it can now die. To do so  
  # we send it a QUIT.  
  #  
  # Using this method we get 0 downtime deploys.  

  old_pid = "#{rails_root}/tmp/pids/unicorn.pid.oldbin"  
  if File.exists?(old_pid) && server.pid != old_pid  
    begin  
      Process.kill("QUIT", File.read(old_pid).to_i)  
    rescue Errno::ENOENT, Errno::ESRCH  
      # someone else did our job for us  
    end  
  end  
end  


after_fork do |server, worker|  
  ActiveRecord::Base.establish_connection  
  worker.user('rails', 'rails') if Process.euid == 0 && rails_env == 'production'  
end  

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

紅太極 2024-12-25 15:28:17

我刚刚发布了 'unicorn-worker-killer' gem。这使您能够根据 1) 最大请求数和 2) 进程内存大小 (RSS) 杀死 Unicorn 工作程序,而不影响请求。

它真的很容易使用。不需要外部工具。首先,请将这一行添加到您的 Gemfile 中。

gem 'unicorn-worker-killer'

然后,请将以下行添加到您的 config.ru 中。

# Unicorn self-process killer
require 'unicorn/worker_killer'

# Max requests per worker
use Unicorn::WorkerKiller::MaxRequests, 10240 + Random.rand(10240)

# Max memory size (RSS) per worker
use Unicorn::WorkerKiller::Oom, (96 + Random.rand(32)) * 1024**2

强烈建议随机化阈值,以避免一次杀死所有工作人员。

i've just released 'unicorn-worker-killer' gem. This enables you to kill Unicorn worker based on 1) Max number of requests and 2) Process memory size (RSS), without affecting the request.

It's really easy to use. No external tool is required. At first, please add this line to your Gemfile.

gem 'unicorn-worker-killer'

Then, please add the following lines to your config.ru.

# Unicorn self-process killer
require 'unicorn/worker_killer'

# Max requests per worker
use Unicorn::WorkerKiller::MaxRequests, 10240 + Random.rand(10240)

# Max memory size (RSS) per worker
use Unicorn::WorkerKiller::Oom, (96 + Random.rand(32)) * 1024**2

It's highly recommended to randomize the threshold to avoid killing all workers at once.

森末i 2024-12-25 15:28:17

我认为您将内存使用情况和磁盘空间使用情况混为一谈。看起来 Unicorn 及其子级使用了大约 500 MB 的内存,您可以查看第二个“-/+ buffers/cache:”数字来查看真正的可用内存。就磁盘空间而言,我的赌注是某种日志文件或类似的东西会变得疯狂。您应该在数据目录中执行 du -h 来找出到底是什么使用了这么多存储空间。最后的建议是,一个鲜为人知的事实是,如果 Ruby 分配了内存,它永远不会将内存返回给操作系统。它仍然在内部使用它,但是一旦 Ruby 获取了一些内存,让它将未使用的内存返回给操作系统的唯一方法就是退出该进程。例如,如果您碰巧有一个进程将内存使用量激增至 500 MB,那么您将无法再次使用这 500 MB,即使在请求完成且 GC 周期已运行之后也是如此。然而,Ruby 会为未来的请求重用分配的内存,因此它不太可能进一步增长。

最后,Sergei 提到上帝要监控进程内存。如果您有兴趣使用它,此处已经有一个很好的配置文件。请务必阅读相关文章,因为独角兽配置文件​​中有一些关键内容,这个上帝配置假设你有。

I think you are conflating memory usage and disk space usage. It looks like Unicorn and its children were using around 500 MB of memory, you look at the second "-/+ buffers/cache:" number to see the real free memory. As far as the disk space goes, my bet goes on some sort of log file or something like that going nuts. You should do a du -h in the data directory to find out what exactly is using so much storage. As a final suggestion, it's a little known fact that Ruby never returns memory back to the OS if it allocates it. It DOES still use it internally, but once Ruby grabs some memory the only way to get it to yield the unused memory back to the OS is to quit the process. For example, if you happen to have a process that spikes your memory usage to 500 MB, you won't be able to use that 500 MB again, even after the request has completed and the GC cycle has run. However, Ruby will reuse that allocated memory for future requests, so it is unlikely to grow further.

Finally, Sergei mentions God to monitor the process memory. If you are interested in using this, there is already a good config file here. Be sure to read the associated article as there are key things in the unicorn config file that this god config assumes you have.

爱你是孤单的心事 2024-12-25 15:28:17

正如 Preston 提到的,你没有内存问题(超过 40% 可用),但你有磁盘已满问题。 du 报告大部分存储空间消耗在 /root/data 中。

您可以使用 find 来识别非常大的文件,例如,以下将显示该目录下大小大于 100MB 的所有文件。

sudo find /root/data -size +100M

如果 unicorn 仍在运行,lsof (LiSt Open Files) 可以显示正在运行的程序或特定进程集 (-p PID) 正在使用哪些文件,例如:

sudo lsof | awk  '$5 ~/REG/ && $7 > 100000000 { print }'

将显示打开的文件大小大于 100MB

As Preston mentioned you don't have a memory problem (over 40% free), you have a disk full problem. du reports most of the storage is consumed in /root/data.

You could use find to identify very large files, eg, the following will show all files under that dir greater than 100MB in size.

sudo find /root/data -size +100M

If unicorn is still running, lsof (LiSt Open Files) can show what files are in use by your running programs or by a specific set of processes (-p PID), eg:

sudo lsof | awk  '$5 ~/REG/ && $7 > 100000000 { print }'

will show you open files greater than 100MB in size

残花月 2024-12-25 15:28:17

你可以设置 god 来监视你的独角兽工人,如果他们吃掉太多内存就杀死他们。然后,Unicorn master 进程将派生另一个工作进程来取代这个工作进程。问题解决了。 :-)

You can set up god to watch your unicorn workers and kill them if they eat too much memory. Unicorn master process will then fork another worker to replace this one. Problem worked around. :-)

蓝色星空 2024-12-25 15:28:17

如果您正在使用 newrelic,请尝试删除应用程序的 newrelic。 Newrelic rpm gem 本身会泄漏内存。我也遇到了同样的问题,我花了近 10 天的时间来解决这个问题。

希望对您有帮助。

我联系了 newrelic 支持团队,下面是他们的回复。

感谢您联系支持人员。对于令人沮丧的事情,我深表歉意
你有过的经历。作为性能监控工具,我们的
我们的初衷是“首先不造成伤害”,我们非常重视此类问题
认真的。

我们最近确定了此问题的原因并发布了
补丁来解决它。 (请参阅https://newrelic.com/docs/releases/ruby)。我们
希望您考虑通过此修复恢复使用 New Relic 进行监控。
如果您有兴趣这样做,请确保您至少使用
v3.6.8.168 即日起。

如果您还有任何其他问题或疑虑,请告诉我们。
我们渴望解决这些问题。

即使我尝试更新 newrelic gem 但它仍然泄漏内存。最后我必须删除 rewrelic,虽然它是一个很棒的工具,但我们不能以这样的代价使用它(内存泄漏)。

希望对您有帮助。

Try removing newrelic for your app if you are using newrelic. Newrelic rpm gem itself leaking the memory. I had the same issue and I stratched my head for almost 10day to figure out the issue.

Hope that help you.

I contact newrelic support team and below is their reply.

Thanks for contacting support. I am deeply sorry for the frustrating
experience you have had. As a performance monitoring tool, our
intention is "first do no harm", and we take these kind of issues very
seriously.

We recently identified the cause of this issue and have released a
patch to resolve it. (see https://newrelic.com/docs/releases/ruby). We
hope you'll consider resuming monitoring with New Relic with this fix.
If you are interested in doing so, make sure you are using at least
v3.6.8.168 from now on.

Please let us know if you have any addition questions or concerns.
We're eager to address them.

Even if I tried update newrelic gem but it still leaking the memory. Finally I have to remove the rewrelic although it is a great tool but we can not use it at such cost(memory leak).

Hope that help you.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文