如何使用多线程读取不同文件夹和服务器localtion中的数千个文件?
我需要在不同文件夹和不同服务器位置搜索数千个XML文件中的数据。一旦有匹配,然后列出所有这些文件。
String[] servers = {"server1", "server2", "server3", "server4", "server5", "server6"};
// number of folders can be up to 30
String[] folders = {"folder1", "folder2", "folder3", "folder4", "folder5", "folder6", "folder7"};
List<String> paths = new ArrayList<>();
String envName = "env1";
for (String server : servers) {
for (String folder: folders) {
String path = File.separator+File.separator + server + File.separator + "e$" + File.separator + "archive" + File.separator + envName + File.separator + "BACKUP" + File.separator + folder;
paths.add(path);
}
}
这意味着42个不同的文件位置(也可以更高)。现在,这些文件夹中的每个文件夹中的每个文件夹都可以包含大约2500至3000个XML文件。
一个我收到文件,我需要在其中搜索某些文本,并列出所有具有该文本的文件。
目前,我正在使用以下代码读取文件 -
private static void readFile(List<String> paths){
for (String path : paths) {
File filePath = new File(path);
File[] filesList = filePath.listFiles();
int numberOfThreads = 2;
if (null != filesList) {
Thread[] threads = new Thread[numberOfThreads];
final int filesPerThread = filesList.length / numberOfThreads;
final int remainingFiles = filesList.length % numberOfThreads;
for (int t = 0; t < numberOfThreads; t++) {
final int thread = t;
threads[t] = new Thread(() -> runThread(filesList, numberOfThreads, thread, filesPerThread, remainingFiles));
}
for (Thread t1 : threads)
t1.start();
for (Thread t2 : threads) {
try {
t2.join();
} catch (InterruptedException ignored) {
}
}
}
}
}
private static void runThread(File[] filesList, int numberOfThreads, int thread, int filesPerThread, int remainingFiles) {
List<File> inFiles = new ArrayList<>(Arrays.asList(filesList).subList(thread * filesPerThread, (thread + 1) * filesPerThread));
if (thread == numberOfThreads - 1 && remainingFiles > 0) {
inFiles.addAll(Arrays.asList(filesList).subList(filesList.length - remainingFiles, filesList.length));
}
for (File file : inFiles) {
resultMap.putAll(checkFileDetails(file));
}
}
根据文件数量,此代码大约需要10分钟或更多。还有其他方法可以使我们多线程更快吗?
编辑 我也尝试使用执行人员服务,但使用螺纹[](如上所述),它花费了更多时间与代码进行比较。
ExecutorService pool = Executors.newFixedThreadPool(10);
List<Future<Map<String, String>>> futures = new ArrayList<>();
if (null != folder.listFiles()) {
Map<String, String> data = new TreeMap<>();
Callable<Map<String, String>> callable = () -> {
List<File> filesList = new ArrayList<>(List.of(folder.listFiles()));
filesList.parallelStream().forEach(file -> data.putAll(checkFileDetails(file)));
return data;
};
futures.add(pool.submit(callable));
}
pool.shutdown();
for (Future<Map<String, String>> future : futures) {
try {
map.putAll(future.get());
} catch (InterruptedException | ExecutionException e) {
throw new RuntimeException(e);
}
}
I need to search data in thousands of xml files across different folder and across different server locations. Once there is a match then list all those files.
String[] servers = {"server1", "server2", "server3", "server4", "server5", "server6"};
// number of folders can be up to 30
String[] folders = {"folder1", "folder2", "folder3", "folder4", "folder5", "folder6", "folder7"};
List<String> paths = new ArrayList<>();
String envName = "env1";
for (String server : servers) {
for (String folder: folders) {
String path = File.separator+File.separator + server + File.separator + "equot; + File.separator + "archive" + File.separator + envName + File.separator + "BACKUP" + File.separator + folder;
paths.add(path);
}
}
This means 42 different locations of files(Which can be higher also). Now each of these folders can have around 2500 to 3000 xml files in them.
One I get the file, I need to search for certain text in them and list all the files having that text.
Right now I am using below code for reading the file -
private static void readFile(List<String> paths){
for (String path : paths) {
File filePath = new File(path);
File[] filesList = filePath.listFiles();
int numberOfThreads = 2;
if (null != filesList) {
Thread[] threads = new Thread[numberOfThreads];
final int filesPerThread = filesList.length / numberOfThreads;
final int remainingFiles = filesList.length % numberOfThreads;
for (int t = 0; t < numberOfThreads; t++) {
final int thread = t;
threads[t] = new Thread(() -> runThread(filesList, numberOfThreads, thread, filesPerThread, remainingFiles));
}
for (Thread t1 : threads)
t1.start();
for (Thread t2 : threads) {
try {
t2.join();
} catch (InterruptedException ignored) {
}
}
}
}
}
private static void runThread(File[] filesList, int numberOfThreads, int thread, int filesPerThread, int remainingFiles) {
List<File> inFiles = new ArrayList<>(Arrays.asList(filesList).subList(thread * filesPerThread, (thread + 1) * filesPerThread));
if (thread == numberOfThreads - 1 && remainingFiles > 0) {
inFiles.addAll(Arrays.asList(filesList).subList(filesList.length - remainingFiles, filesList.length));
}
for (File file : inFiles) {
resultMap.putAll(checkFileDetails(file));
}
}
This code takes around 10 mins or more depending on the number of files. Is there any other way I can us multithreading to make it faster?
Below is basic structure of all the servers and folders
Edit
I tried using ExecutorService also but it is taking more time as compare to code using Thread[] (mentioned above)
ExecutorService pool = Executors.newFixedThreadPool(10);
List<Future<Map<String, String>>> futures = new ArrayList<>();
if (null != folder.listFiles()) {
Map<String, String> data = new TreeMap<>();
Callable<Map<String, String>> callable = () -> {
List<File> filesList = new ArrayList<>(List.of(folder.listFiles()));
filesList.parallelStream().forEach(file -> data.putAll(checkFileDetails(file)));
return data;
};
futures.add(pool.submit(callable));
}
pool.shutdown();
for (Future<Map<String, String>> future : futures) {
try {
map.putAll(future.get());
} catch (InterruptedException | ExecutionException e) {
throw new RuntimeException(e);
}
}
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论