Linux内存管理和大文件
我从远程服务器获取图像对象,然后尝试使用 API 将它们上传到 Rackspace 的云文件。想知道 a) 如何使这个过程更加高效,b) 假设我需要购买更多内存,完成这项任务需要多少合理的 RAM(当前的开发服务器只有 512MB)。
在执行脚本时,我:
- 查询本地数据库以获取一组 id(大约 1000 个)
- 对于每个 id,查询远程服务器,该服务器返回 10-20 个图像对象,每个图像为 25-30k
- 创建一个云文件容器,基于我的数据库的 id
- 对于从远程服务器返回的每个图像对象,在我的容器中创建一个图像对象,并将图像数据写入该对象
- 使用添加图像的日期时间更新本地数据库中的
行这在一小组 id,但是 100 个(即 700-1k 图像)可能需要 5-10 分钟,超过这个时间的任何东西似乎都会无限期地运行。尝试了以下方法,但收效甚微:
- 使用 php 的 set_timeout 在几分钟后终止脚本,认为这会清除分配给执行的内存,让我可以从上次停下的地方继续工作,并完成较小的部分。 永远不会引发此错误
- 但是,在上传后取消设置包含图像对象的数组键时, (不是
PHP 的内存限制设置为 128MB,运行“tops”命令我发现用户“www-data”消耗了 16% 的内存资源。然而,它不再出现在用户列表中,但我继续看到这一点:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
2400 mysql 20 0 161m 8220 2808 S 0 1.6 11:12.69 mysqld
...但 TIME+ 永远不会改变。我看到仍然有 1 个任务正在运行,但这些值永远不会改变:
Mem: 508272k total, 250616k used, 257656k free, 4340k buffers
对冗长的帖子表示歉意 - 不完全确定什么(如果有的话)是有用的。这不是我的专业领域,所以要抓住一点救命稻草。预先感谢您的帮助。
I'm acquiring image objects from a remote server, then attempting to upload them to Rackspace's Cloud Files using their API. Wondering a) how I can make this process more efficient, and b) assuming I'll need to purchase more memory, what a reasonable amount of RAM might be to accomplish this task (current development server is just 512MB).
In executing the script I'm:
- Querying my local database for a set of ids (around 1 thousand)
- For each id, querying a remote server, which returns between 10-20 image objects, each image is 25-30k
- Create a Cloud Files container, based on the id from my db
- For each image object returned from the remote server, create an image object in my container, and write image data to that object
- Update row in local db with datetime of added images
This executes relatively quickly on a small set of ids, however 100 (so 700-1k images) can take 5-10min, and anything more than that seems to run indefinitely. Have tried the following, with little success:
- using php's set_timeout to kill the script after a couple minutes, figuring that'd purge memory allocated to execution, allowing me to pick up where I left off and work through the set is smaller pieces. However this error is never thrown
- unset the array key containing the image object after it's uploaded (not just the reference inside the loop).
PHP's memory_limit is set to 128MB, and running 'tops' command I see that user 'www-data' was consuming 16% of memory resources. However that no longer appears in the list of users, but I continue to see this:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
2400 mysql 20 0 161m 8220 2808 S 0 1.6 11:12.69 mysqld
...but the TIME+ never changes. I see that there is still 1 task running, yet these values never change:
Mem: 508272k total, 250616k used, 257656k free, 4340k buffers
Apologies for the lengthy post - not entirely sure what (if any of that) is useful. This is not my area of expertise so grasping at straws a little. Thanks in advance for your help.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
MySQL 是一个守护进程 - 它会一直运行并驻留在内存中,直到它死亡或被您杀死。 TIME+ 是自上次重新启动以来使用的 cpu 时间。如果它处于空闲状态(%CPU = 0),则 TIME+ 将不会增加,因为没有消耗任何 cpu 时间。
您是否检查过 cloudfiles API 是否正在泄漏某种类型的句柄?您可能会取消设置从服务中检索到的图像对象(服务 -> 您),但 Cloudfiles API 仍然必须将该图像发送回门外(您 -> Rackspace),这可能会在某个地方泄漏。
MySQL's a daemon - it'll keep running and sit in memory until it dies or you kill it. The TIME+ is how much cpu time it's used since last restart. If it's idle (%CPU = 0), then TIME+ will not increment, since no cpu time has been consumed.
Have you checked if the cloudfiles API is leaking handles of some sort? You may be unsetting the image object you've retrieved from your service (service->you), but the Cloudfiles API still has to send that image back out the door (you->Rackspace), and that could be leaking somewhere.