刚刚开始进行批量删除的完整 UFS 上的 open() 非常慢(六秒以上)?
我们在Solaris 上有一个UFS 分区。
音量变满。 我们仍在尝试写入它 - open() 自然会立即返回 -1。
当一个执行批量删除的 cronjob 启动时,看起来 open() 没有及时返回 - 它至少需要六秒钟,因为这是看门狗杀死进程之前的时间。
现在,明显的想法是删除使文件系统保持繁忙,而 open() 需要永远......但是有关于这种行为的具体知识吗?
We have a UFS partition on solaris.
The volume becomes full. We're still trying to write to it - and naturally open() returns -1 immediately.
When a cronjob fires up which does a mass delete, it looks like open() doesn't return in a timely manner - it's taking at least six seconds, because that's how long before the watchdog kills the process.
Now, the obvious thought is that the deletes are keeping the file system busy and open() just takes forever...but is there any concrete knowledge out there about this behaviour?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
也许可以更改执行“批量删除”的程序,以便在有问题的文件系统上更顺利地运行。 如果它确实查询以查找要删除的文件,则可能不是开放调用超时。 为了测试这个理论,是否有某种方法可以设置一个 cron 作业,在磁盘已满状态下简单地删除一个已知名称的单个文件? “批量删除”程序如何决定进行什么“打开”调用?
还可以在写入停止之前控制磁盘利用率的百分比。 您也可以尝试将其设置为较低的百分比。 如果您通过等待文件创建步骤返回 -1 来检测“磁盘已满”状态,那么您应该考虑在代码中添加显式检查,以便在文件系统已满的百分比超过一定百分比时采取纠正措施。
Perhaps the program doing the 'mass delete' could be changed to operate more smoothly on a filesystem that's having problems. If it does queries to find the files to delete, it might not be the open call that's timing out. To test the theory, is there some way to set up a cron job which simply removes a single file with a known name during the disk-full state? How does the 'mass delete' program decide what 'open' call to make?
It's also possible to control the percentage of disk utilization before writes stop working. You could also try setting this to a lower percentage. If you are detecting the 'disk full' state by waiting until a file-creation step returns -1, then you should consider adding an explicit check to your code so that if the filesystem is over a certain percentage full, take corrective action.
批量删除会导致随机 IO 风暴,这确实会损害性能。 它会提交尽可能多的日志/日志事务(尝试使用
nologging
选项?)。 此外,如果你的文件系统几乎已满,打开无论如何都会花费一些时间来为新的索引节点找到空间。更频繁地删除文件,一次更少地删除文件,可能会帮助您缩短响应时间。 或者干脆删除它们,速度较慢,在 rm 之间睡觉。
Mass delete causes a storm of random IO which really hurts performance. And it makes as much of journal/log transactions to commit (try with the
nologging
option ?). Moreover, if your fs is nearly full, open would anyway takes some time to find space for a new inode.Deleting files more often, fewer at a time, may help you to get lower response time. Or simply delete them slower, sleeping between rm.