File.lastModified() 非常慢!

发布于 2024-10-05 04:36:19 字数 593 浏览 1 评论 0原文

我正在对文件进行递归复制,例如 xcopy /D 我只想复制较新的文件目标文件(我不能直接使用xcopy,因为我需要在复制过程中更改一些文件)。

在java中我使用 < code>lastModified() 来检查目标文件是否比源文件旧,并且速度非常慢。

  • 我可以加快这个过程(也许使用 JNI??)?
  • 是否有其他复制脚本可以更好地完成这项工作(复制新文件+正则表达式更改一些文本文件)?

无论如何复制文件都不是一个选项,因为这比检查上次修改日期(通过网络复制)花费更多时间。

I'm doing a recursive copy of files and like xcopy /D I only want to copy newer files destination files (I cannot use xcopy directly since I need to alter some files in the copy process).

In java I use lastModified() to check if the destination file is older than the source file and it's very slow.

  • Can I speed up the process (maybe using JNI??)?
  • Are there any other copy scripts that can do the job better (copy new files + regexp change some text files)?

Copying files anyways is not an option since that will take more time than checking last modified date (copying over the network).

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

嘿哥们儿 2024-10-12 04:36:19

您需要确定为什么它这么慢。

当您运行 progrma 时,进程的 CPU 利用率是多少。如果超过 50% 用户,那么你应该能够优化你的程序,如果低于 20%,那么你无能为力。

通常这种方法很慢,因为您正在检查的文件位于磁盘上而不是内存中。如果是这种情况,您需要加快磁盘访问速度,或者购买更快的驱动器。例如,SSD 的速度可以快 10-100 倍。

批量查询可能会有所帮助。您可以通过使用多个线程检查上次修改日期来完成此操作。例如,有一个固定大小的线程池并为每个文件添加一个任务。线程池的大小决定了一次轮询的文件数量。

这允许操作系统重新排序请求以适应磁盘上的布局。注意:这在理论上很好,但是您必须测试这是否会使您的操作系统/硬件上的速度更快,因为它可能会使速度变慢。 ;)

You need to determine why it is so slow.

When you are running the progrma what is the CPU utilisation of your process. If it more than 50% user, then you should be able to optmise your program, if its less than 20% there isn't so much you can do.

Usually this method is slow because the file you are examining is on disk rather than in memory. If this is the case you need to speed up how you access your disk, or get a faster drive. e.g. SSD can be 10-100x faster at doing this.

A bulk query might help. You can do this by using multiple threads to check the lastModified date. e.g. have a fixed size thread pool and add a task for each file. The size of the thread pool determines the number of files polled at once.

This allows the OS to re-order the requests to suit the layout on the disk. Note: This is fine in theory, but you have to test whether this makes things faster on your OS/hardware as its just as likely to make things slower. ;)

素手挽清风 2024-10-12 04:36:19

所以我在网络驱动器上遇到了这个。痛苦。我有一个包含 17000 多个文件的目录。在本地驱动器上,检查上次修改日期只需不到 2 秒。在网络驱动器上花了 58 秒!!!当然,我的应用程序是一个交互式应用程序,所以我有一些抱怨。

经过一番研究后,我决定可以实现一些 JNI 代码来执行 Windows Kernel32 findfirstfile/findnextfile/findclose 来显着改进该过程,但后来我有了 32 位和 64 位版本等。呃。然后失去跨平台能力。

虽然我在这里做了一些令人讨厌的黑客行为。我的应用程序主要在 Windows 上运行,但我不想限制它这样做,所以我做了以下操作。检查一下我是否在 Windows 上操作。如果是的话,看看我是否使用本地硬盘。如果没有,那么我们将采用 hackish 方法。

我存储的所有内容都不区分大小写。对于可能具有同时包含文件“ABC”和“abc”的目录的其他操作系统来说,这可能不是一个好主意。如果您需要关心这一点,那么您可以通过创建一个新文件(“ABC”)和新文件(“abc”)然后使用 equals 方法来比较它们来决定。在不区分大小写的文件系统(如 Windows)上,它将返回 true,但在 UNIX 系统上,它将返回 false。

虽然这可能有点黑客行为,但在网络驱动器上花费的时间从 58 秒减少到 1.6 秒,所以我可以忍受黑客攻击。

        boolean useJaveDefaultMethod = true;

    if(System.getProperty("os.name").startsWith("Windows"))
    {
        File f2 = f.getParentFile();
        while(true)
        {
            if(f2.getParentFile() == null)
            {
                String s = FileSystemView.getFileSystemView().getSystemTypeDescription(f2);
                if(FileSystemView.getFileSystemView().isDrive(f2) && "Local Disk".equalsIgnoreCase(s))
                {
                    useJaveDefaultMethod = true;
                }
                else
                {
                    useJaveDefaultMethod = false;
                }
                break;
            }
            f2 = f2.getParentFile();
        }
    }
    if(!useJaveDefaultMethod)
    {
        try
        {
            ProcessBuilder pb = new ProcessBuilder("cmd.exe", "/C", "dir " + f.getParent());
            pb.redirectErrorStream(true);
            Process process = pb.start();
            InputStreamReader isr = new InputStreamReader(process.getInputStream());
            BufferedReader br = new BufferedReader(isr);

            String line;
            DateFormat df = new SimpleDateFormat("dd-MMM-yy hh:mm a");
            while((line = br.readLine()) != null)
            {
                try
                {
                    Date filedate = df.parse(line);
                    String filename = line.substring(38);
                    dirCache.put(filename.toLowerCase(), filedate.getTime());
                }
                catch(Exception ex)
                {

                }
            }
            process.waitFor();

            Long filetime = dirCache.get(f.getName().toLowerCase());
            if(filetime != null)
                return filetime;

        }
        catch(Exception Exception)
        {
        }
    }

    // this is SO SLOW on a networked drive!
    long lastModifiedDate = f.lastModified();
    dirCache.put(f.getName().toLowerCase(), lastModifiedDate);

    return lastModifiedDate;

So I ran across this on network drives. Painful. I had a directory with 17000+ files on it. On a local drive it took less than 2 seconds to check the last modified date. On a networked drive it took 58 seconds!!! Of course my app is an interactive app so I had some complaints.

After some research I decided that it would be possible to implement some JNI code to do the Windows Kernel32 findfirstfile/findnextfile/findclose to dramatically improve the process but then I had 32 and 64 bit version etc. ugh. and then lose the cross platform capabilities.

Although a bit of a nasty hack here is what I did. My app operates on windows mostly but I didn't want to restrict it to do so so I did the following. Check to see if I am operating on windows. If so then see if I am using a local hard disk. If not then we are going to do the hackish method.

I stored everything case insensitive. Probably not a great idea for other OS's that may have a directory with both files 'ABC' and 'abc'. If you need to care about this then you can decide by creating a new File("ABC") and new File("abc") and then using the equals method to compare them. On case insensitive file systems like windows it will return true but on unix systems it will return false.

Although it may be a little hackish the time it took went from 58 seconds to 1.6 seconds on a network drive so I can live with the hack.

        boolean useJaveDefaultMethod = true;

    if(System.getProperty("os.name").startsWith("Windows"))
    {
        File f2 = f.getParentFile();
        while(true)
        {
            if(f2.getParentFile() == null)
            {
                String s = FileSystemView.getFileSystemView().getSystemTypeDescription(f2);
                if(FileSystemView.getFileSystemView().isDrive(f2) && "Local Disk".equalsIgnoreCase(s))
                {
                    useJaveDefaultMethod = true;
                }
                else
                {
                    useJaveDefaultMethod = false;
                }
                break;
            }
            f2 = f2.getParentFile();
        }
    }
    if(!useJaveDefaultMethod)
    {
        try
        {
            ProcessBuilder pb = new ProcessBuilder("cmd.exe", "/C", "dir " + f.getParent());
            pb.redirectErrorStream(true);
            Process process = pb.start();
            InputStreamReader isr = new InputStreamReader(process.getInputStream());
            BufferedReader br = new BufferedReader(isr);

            String line;
            DateFormat df = new SimpleDateFormat("dd-MMM-yy hh:mm a");
            while((line = br.readLine()) != null)
            {
                try
                {
                    Date filedate = df.parse(line);
                    String filename = line.substring(38);
                    dirCache.put(filename.toLowerCase(), filedate.getTime());
                }
                catch(Exception ex)
                {

                }
            }
            process.waitFor();

            Long filetime = dirCache.get(f.getName().toLowerCase());
            if(filetime != null)
                return filetime;

        }
        catch(Exception Exception)
        {
        }
    }

    // this is SO SLOW on a networked drive!
    long lastModifiedDate = f.lastModified();
    dirCache.put(f.getName().toLowerCase(), lastModifiedDate);

    return lastModifiedDate;
千笙结 2024-10-12 04:36:19

不幸的是,Java 处理查找 LastModified 的方式很慢(基本上,它会在您请求信息时查询每个文件的底层文件系统,在 listFiles 或类似文件上不会批量加载此数据)。

您可以调用更高效的本机程序来批量执行此操作,但任何此类解决方案都将与您部署到的平台紧密相关。

Unfortunately the way Java handles looking up lastModified is slow (basically it queries the underlying file system for each file as you request the information, there is no bulk loading of this data on listFiles or similar).

You could potentially invoke a more efficient native program to do this in bulk, but any such solution would be closely tied to the platform you deploy to.

被翻牌 2024-10-12 04:36:19

我想您是通过网络执行此操作的,否则副本就没有什么意义。网络目录操作很慢,运气不好。您始终可以只复制低于特定大小阈值的文件,无论如何使总操作花费最少的时间。

我在这里不同意 Kris 的观点:Java 的做法并没有什么效率低得惊人的地方,而且无论如何它确实必须这样做,因为你想要最新的值。

I imagine you are doing this over the network, otherwise there would be little point in the copy. Network directory operations are slow, bad luck. You could always just copy the file below a certain size threshold, whatever makes the total operation take least time.

I disagree with Kris here: there's nothing startlingly inefficient in the way Java does it, and in any case it really has to do it that way because you want the latest value.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文