如何从搜索中排除可能在 python 中使用或复制到的文件?

发布于 2024-09-30 12:44:41 字数 1999 浏览 13 评论 0原文

我是 python 新手,所以这最终可能有一个简单的解决方案。

我家有3台与这种情况相关的电脑: - 文件服务器(Linux) - 我的主电脑(Windows) - 女朋友的 MacBook Pro

我的文件服务器正在运行 ubuntu 和 samba。我已经安装了 python 3.1 并在 3.1 中编写了代码。

我创建了一个守护程序,用于确定上传目录中何时存在遵循给定模式的某些文件。找到此类文件后,它会重命名该文件并将其移动到不同驱动器上的不同位置。它还重写了所有者、组和权限。所有这些都非常有效。它每分钟运行一次这个过程。

如果我从我的主电脑(运行 Windows 风格)复制文件,该过程始终有效。 (我相信 Windows 会锁定文件,直到完成复制 - 我可能是错的。) 如果我的女朋友复制了一个文件,它会在复制完成之前拾取该文件,事情会变得混乱。 (创建具有不正确权限的文件的下划线版本,有时该文件会进入正确的位置) 我在这里猜测她的mac book在复制时不会锁定文件。我也可能是错的。

我需要的是一种排除正在使用的文件或正在创建的文件的方法。

作为参考,我创建的查找文件的方法是:

# _GetFileListing(filter)
# Description: Gets a list of relevant files based on the filter
#
# Parameters: filter - a compiled regex query
# Retruns:
#   Nothing. It populates self.fileList
def _GetFileListing(self, filter):
    self.fileList = []
    for file in os.listdir(self.dir):
        filterMatch = filter.search(file)
        filepath = os.path.join(self.dir, file)

        if os.path.isfile(filepath) and filterMatch != None:
            self.fileList.append(filepath)

注意,这全部在一个类中。

我创建的操作文件的方法是:

# _ArchiveFile(filepath, outpath)
# Description: Renames/Moves the file to outpath and re-writes the file permissions to the permissions used for
#   the output directory. self.mask, self.group, and self.owner for the actual values.
#
# Parameters: filepath - path to the file
#             outpath - path to the file to output
def _ArchiveFile(self, filepath, outpath):
    dir,filename,filetype = self._SplitDirectoryAndFile(outpath)

    try:
        os.makedirs(dir, self.mask)
    except OSError:
        #Do Nothing!
        dir = dir

    uid = pwd.getpwnam(self.owner)[2]
    gid = grp.getgrnam(self.group)[2]
    #os.rename(filepath, outpath)
    shutil.move(filepath, outpath)
    os.chmod(outpath, self.mask)
    os.chown(outpath, uid, gid)

我已停止使用 os.rename,因为当我开始将文件移动到不同的驱动器时,它似乎已停止工作。

简短版本: 如何防止自己在搜索中拾取当前正在传输的文件?

预先感谢您能够提供的任何帮助。

I'm new to python so this might end up having a simple solution.

At my house, I have 3 computers that are relevant to this situation:
- File Server (linux)
- My main PC (windows)
- Girlfriend's MacBook Pro

My file server is running ubuntu and samba. I've installed python 3.1 and I've written my code in 3.1.

I've created a daemon that determines when certain files exist in the uploads directory that follow a given pattern. Upon finding such file, it renames it and moves it to a different location on a different drive. It also re-writes the owner, group, and permissions. All of this works great. It runs this process every minute.

If I copy files from my main pc (running a flavor of windows), the process always works. (I believe windows locks the file until its done copying-- I could be wrong.)
If my girlfriend copies a file, it picks up the file before the copy is complete and things get messy. (underscored versions of the files with improper permissions are created and occasionally, the file will go into the correct place)
I am guessing here that her mac book does not lock the file when copying. I could also be wrong there.

What I need is a way to exclude files that are either in use or, failing that, are being created.

For reference, the method I've created to find the files is:

# _GetFileListing(filter)
# Description: Gets a list of relevant files based on the filter
#
# Parameters: filter - a compiled regex query
# Retruns:
#   Nothing. It populates self.fileList
def _GetFileListing(self, filter):
    self.fileList = []
    for file in os.listdir(self.dir):
        filterMatch = filter.search(file)
        filepath = os.path.join(self.dir, file)

        if os.path.isfile(filepath) and filterMatch != None:
            self.fileList.append(filepath)

Note, this is all in a class.

The method I've created to manipulate the files is:

# _ArchiveFile(filepath, outpath)
# Description: Renames/Moves the file to outpath and re-writes the file permissions to the permissions used for
#   the output directory. self.mask, self.group, and self.owner for the actual values.
#
# Parameters: filepath - path to the file
#             outpath - path to the file to output
def _ArchiveFile(self, filepath, outpath):
    dir,filename,filetype = self._SplitDirectoryAndFile(outpath)

    try:
        os.makedirs(dir, self.mask)
    except OSError:
        #Do Nothing!
        dir = dir

    uid = pwd.getpwnam(self.owner)[2]
    gid = grp.getgrnam(self.group)[2]
    #os.rename(filepath, outpath)
    shutil.move(filepath, outpath)
    os.chmod(outpath, self.mask)
    os.chown(outpath, uid, gid)

I've stopped using os.rename because it seems to have stopped working when I started moving files to different drives.

Short Version:
How do I prevent myself from picking up files in my search that are currently being transferred?

Thank you in advance for any help you might be able to provide.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

岁月流歌 2024-10-07 12:44:42

事实证明写锁方法不起作用。我想我在更新之前没有正确测试它。

我现在决定做的是:

  • 将检查之间的时间减少到 30 秒
  • 保留在
    之前的迭代及其
    各自的文件大小
  • 根据旧列表检查新文件列表

如果新列表包含与旧列表文件大小相同的相同文件,则将其放入要传输的列表中。新列表中的剩余文件将成为旧列表,并且该过程继续。

我确信 lsof 方法会起作用,但我不确定如何在 python 中使用它。此外,这种方法应该非常适合我的情况,因为我最关心的是在传输过程中不移动文件。

我还必须排除所有以“._”开头的文件,因为 mac 创建这些文件,并且我不确定它们是否会随着时间的推移而增加。

或者,我可以选择仅处理由她的 Mac 传输的情况。我知道当 Mac 传输文件时,它会创建:

  • filename.ext
  • ._filename.ext

我可以检查列表中所有以 ._ 开头的文件名实例,并以这种方式排除文件。

我可能会首先尝试第二个选项。它有点脏,但希望它能起作用。

Turns out the write lock approach didn't work. I guess I didn't properly test it before updating here.

What I've decided to do for now is:

  • Reduce the time between checks to 30s
  • Keep a list of files found in the
    previous iteration and their
    respective file sizes
  • Check the new list of files against the old list

If the new list contains the same file with the same file size as the old list, put it in a list to be transferred. The remaining files in the new list become the old list and the process continues.

I'm sure the lsof method will work but I'm not sure how to use it in python. Also this method should work quite well for my situation since I am mostly concerned with not moving the files while they're in transit.

I would also have to exclude all files that start with "._" since the mac creates those and I'm not sure if they increase in size over time.

Alternatively, I have the option to handle just cases where it's being transferred by her mac. I know that when the mac is transferring the file, it creates:

  • filename.ext
  • ._filename.ext

I could check the list for all instances of filename where it is preceded with ._ and exclude files that way.

I'll probably try the second option first. It's a little dirty but hopefully it will work.

北方。的韩爷 2024-10-07 12:44:42

Mac 上的 ._ 文件包含资源分支。更多信息请访问:http://support.apple.com/kb/TA20578

没有足够的代表发表评论,因此答案。

在大多数情况下,您可以安全地忽略它们,因为无论如何其他操作系统都无法对它们执行任何操作。有关它们的更多信息请参见此处:
http://en.wikipedia.org/wiki/Resource_fork

The ._ files from the mac contain resource forks. More information can be found here: http://support.apple.com/kb/TA20578

I don't have enough rep to make a comment, hence the answer.

For the most part you can safely ignore them, as no other OS can probably do anything with them anyway. More info on them here:
http://en.wikipedia.org/wiki/Resource_fork

残月升风 2024-10-07 12:44:41

您可以在移动文件之前尝试对其进行独占写入锁定。这可以通过 fcntl 模块来完成:

http://docs.python.org/library/fcntl。除此之外

,您可以使用 lsof 实用程序查看系统已打开的文件。这需要更多的苦差事。

请注意, os.rename() 将在同一文件系统上工作,并且实际上不受此问题的影响(索引节点被移动,没有数据被移动)。使用shutil将像mv一样,如果文件系统相同,则重新链接文件,如果文件系统不同,则复制+删除。

You can try taking an exclusive write lock on the file before moving it. This can be done with the fcntl module:

http://docs.python.org/library/fcntl.html

Barring that, you can us the lsof utility to see files which the system has open. That requires more drudgery.

Note that os.rename() will work on the same filesystem, and would actually be immune to this issue (the inode gets moved, no data gets moved). Using shutil will do as mv does, which is either relink the file if its the same filesystem, or copy + delete if the filesystems are different.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文