如何使用多线程读取不同文件夹和服务器localtion中的数千个文件?

发布于 2025-01-26 07:23:49 字数 3912 浏览 3 评论 0原文

我需要在不同文件夹和不同服务器位置搜索数千个XML文件中的数据。一旦有匹配,然后列出所有这些文件。

    String[] servers = {"server1", "server2", "server3", "server4", "server5", "server6"};
    
// number of folders can be up to 30
    String[] folders = {"folder1", "folder2", "folder3", "folder4", "folder5", "folder6", "folder7"};

        List<String> paths = new ArrayList<>();
String envName = "env1";
        for (String server : servers) {
            for (String folder: folders) {
                String path = File.separator+File.separator + server + File.separator + "e$" + File.separator + "archive" + File.separator + envName + File.separator + "BACKUP" + File.separator + folder;
                paths.add(path);
            }
        }

这意味着42个不同的文件位置(也可以更高)。现在,这些文件夹中的每个文件夹中的每个文件夹都可以包含大约2500至3000个XML文件。

一个我收到文件,我需要在其中搜索某些文本,并列出所有具有该文本的文件。

目前,我正在使用以下代码读取文件 -

private static void readFile(List<String> paths){
        for (String path : paths) {
            File filePath = new File(path);
            File[] filesList = filePath.listFiles();
            int numberOfThreads = 2;
            if (null != filesList) {
                Thread[] threads = new Thread[numberOfThreads];
                final int filesPerThread = filesList.length / numberOfThreads;
                final int remainingFiles = filesList.length % numberOfThreads;
                for (int t = 0; t < numberOfThreads; t++) {
                    final int thread = t;
                    threads[t] = new Thread(() -> runThread(filesList, numberOfThreads, thread, filesPerThread, remainingFiles));
                }
                for (Thread t1 : threads)
                    t1.start();

                for (Thread t2 : threads) {
                    try {
                        t2.join();
                    } catch (InterruptedException ignored) {
                    }
                }
            }
        }
    }

    private static void runThread(File[] filesList, int numberOfThreads, int thread, int filesPerThread, int remainingFiles) {
        List<File> inFiles = new ArrayList<>(Arrays.asList(filesList).subList(thread * filesPerThread, (thread + 1) * filesPerThread));

        if (thread == numberOfThreads - 1 && remainingFiles > 0) {
            inFiles.addAll(Arrays.asList(filesList).subList(filesList.length - remainingFiles, filesList.length));
        }

        for (File file : inFiles) {
            resultMap.putAll(checkFileDetails(file));
        }
    }

根据文件数量,此代码大约需要10分钟或更多。还有其他方法可以使我们多线程更快吗?

以下是所有服务器和文件夹的基本结构

编辑 我也尝试使用执行人员服务,但使用螺纹[](如上所述),它花费了更多时间与代码进行比较。

ExecutorService pool = Executors.newFixedThreadPool(10);
            List<Future<Map<String, String>>> futures = new ArrayList<>();
            if (null != folder.listFiles()) {
                Map<String, String> data = new TreeMap<>();
                Callable<Map<String, String>> callable = () -> {
                    List<File> filesList = new ArrayList<>(List.of(folder.listFiles()));
                    filesList.parallelStream().forEach(file -> data.putAll(checkFileDetails(file)));
                    return data;
                };
                futures.add(pool.submit(callable));
            }
            pool.shutdown();
            for (Future<Map<String, String>> future : futures) {
                try {
                    map.putAll(future.get());
                } catch (InterruptedException | ExecutionException e) {
                    throw new RuntimeException(e);
                }
            }

I need to search data in thousands of xml files across different folder and across different server locations. Once there is a match then list all those files.

    String[] servers = {"server1", "server2", "server3", "server4", "server5", "server6"};
    
// number of folders can be up to 30
    String[] folders = {"folder1", "folder2", "folder3", "folder4", "folder5", "folder6", "folder7"};

        List<String> paths = new ArrayList<>();
String envName = "env1";
        for (String server : servers) {
            for (String folder: folders) {
                String path = File.separator+File.separator + server + File.separator + "e
quot; + File.separator + "archive" + File.separator + envName + File.separator + "BACKUP" + File.separator + folder;
                paths.add(path);
            }
        }

This means 42 different locations of files(Which can be higher also). Now each of these folders can have around 2500 to 3000 xml files in them.

One I get the file, I need to search for certain text in them and list all the files having that text.

Right now I am using below code for reading the file -

private static void readFile(List<String> paths){
        for (String path : paths) {
            File filePath = new File(path);
            File[] filesList = filePath.listFiles();
            int numberOfThreads = 2;
            if (null != filesList) {
                Thread[] threads = new Thread[numberOfThreads];
                final int filesPerThread = filesList.length / numberOfThreads;
                final int remainingFiles = filesList.length % numberOfThreads;
                for (int t = 0; t < numberOfThreads; t++) {
                    final int thread = t;
                    threads[t] = new Thread(() -> runThread(filesList, numberOfThreads, thread, filesPerThread, remainingFiles));
                }
                for (Thread t1 : threads)
                    t1.start();

                for (Thread t2 : threads) {
                    try {
                        t2.join();
                    } catch (InterruptedException ignored) {
                    }
                }
            }
        }
    }

    private static void runThread(File[] filesList, int numberOfThreads, int thread, int filesPerThread, int remainingFiles) {
        List<File> inFiles = new ArrayList<>(Arrays.asList(filesList).subList(thread * filesPerThread, (thread + 1) * filesPerThread));

        if (thread == numberOfThreads - 1 && remainingFiles > 0) {
            inFiles.addAll(Arrays.asList(filesList).subList(filesList.length - remainingFiles, filesList.length));
        }

        for (File file : inFiles) {
            resultMap.putAll(checkFileDetails(file));
        }
    }

This code takes around 10 mins or more depending on the number of files. Is there any other way I can us multithreading to make it faster?

Below is basic structure of all the servers and folders
Below is basic structure of all the servers and folders

Edit
I tried using ExecutorService also but it is taking more time as compare to code using Thread[] (mentioned above)

ExecutorService pool = Executors.newFixedThreadPool(10);
            List<Future<Map<String, String>>> futures = new ArrayList<>();
            if (null != folder.listFiles()) {
                Map<String, String> data = new TreeMap<>();
                Callable<Map<String, String>> callable = () -> {
                    List<File> filesList = new ArrayList<>(List.of(folder.listFiles()));
                    filesList.parallelStream().forEach(file -> data.putAll(checkFileDetails(file)));
                    return data;
                };
                futures.add(pool.submit(callable));
            }
            pool.shutdown();
            for (Future<Map<String, String>> future : futures) {
                try {
                    map.putAll(future.get());
                } catch (InterruptedException | ExecutionException e) {
                    throw new RuntimeException(e);
                }
            }

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。
列表为空,暂无数据
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文