使用 FileInfo 类移动配对文件后，需要优雅的方式从文件夹中移动孤立文件

发布于 2024-10-26 06:21:20 字数 2004 浏览 1 评论 0原文

我有一个文件夹，从中移动成对的相关文件（xml 与 pdf 配对）。可以随时将其他文件存入此文件夹，但该实用程序每 10 分钟左右运行一次。我们可以使用 FileSystemWatcher 类，但由于内部原因，我们不使用此实用程序。

我使用 System.IO.FileInfo 类在每次运行期间读取文件夹中的所有文件（仅是 xml 和 pdf）。一旦我将文件放入 FileInfo 对象中，我就会迭代这些文件，将匹配项移动到工作文件夹。完成后，我想将所有未配对但位于 FileInfo 对象中的文件移动到失败文件夹。

由于我似乎无法从 FileInfo 对象中删除项目（或者我丢失了某些内容），因此（1）使用 Directory 类 .GetFiles 中的字符串数组，（2）从 FileInfo 对象创建一个 Dictionary 并在迭代期间从中删除值，或者 (3) 是否有使用 LINQ 或其他方法的更优雅的方法？

这是到目前为止的代码：

internal static bool CompareXMLandPDFFileNames(FileInfo[] xmlFiles, FileInfo[] pdfFiles, string xmlFilePath)
    {
        string workingFilePath = xmlFilePath + @"\WORKING";            

        if (xmlFiles.Length > 0)
        {
            foreach (var xmlFile in xmlFiles)
            {
                string xfn = xmlFile.Name; //xml file name
                string pdfName = xfn.Substring(0,xfn.IndexOf('_')) + ".pdf"; //parsed pdf file name contained in xml file name

                foreach (var pdfFile in pdfFiles)
                {
                    string pfn = pdfFile.Name; //pdf file name
                    if (pfn == pdfName)
                    {
                        //move xml and pdf files to working folder...
                        FileInfo xmlInfo = new FileInfo(xmlFilePath + xfn);
                        FileInfo pdfInfo = new FileInfo(xmlFilePath + pfn);
                        if (!File.Exists(workingFilePath + xfn))
                        {
                            xmlInfo.MoveTo(workingFilePath + xfn);                                
                        }

                        if (!File.Exists(workingFilePath + pfn))
                        {
                            pdfInfo.MoveTo(workingFilePath + pfn);
                        }                            
                    }
                }
            }

            //all files in the file objects should now be moved to working folder, if not, fix orphans...
        }

        return true;
    }

原文

I have a folder from which I'm moving pairs of related files (xml paired with pdf). Additional files could be deposited into this folder at any time, but the utility runs every 10 minutes or so. We could use the FileSystemWatcher class but for internal reasons we don't for this utility.

I'm using the System.IO.FileInfo class to read all the files in the folder (will only be xml and pdf) during each run. Once I have the files in the FileInfo object, I iterate through the files, moving matches to a working folder. Once that is done, I want to move any files that were not paired, but are in the FileInfo object, to a failure folder.

Since I can't seem to remove items from the FileInfo object (or I am missing something), would it be easier to (1) use a string array from Directory class .GetFiles, (2) create a Dictionary from the FileInfo object and remove values from that during iteration, or (3) is there a more elegant approach using LINQ or something else?

Here is the code so far:

internal static bool CompareXMLandPDFFileNames(FileInfo[] xmlFiles, FileInfo[] pdfFiles, string xmlFilePath)
    {
        string workingFilePath = xmlFilePath + @"\WORKING";            

        if (xmlFiles.Length > 0)
        {
            foreach (var xmlFile in xmlFiles)
            {
                string xfn = xmlFile.Name; //xml file name
                string pdfName = xfn.Substring(0,xfn.IndexOf('_')) + ".pdf"; //parsed pdf file name contained in xml file name

                foreach (var pdfFile in pdfFiles)
                {
                    string pfn = pdfFile.Name; //pdf file name
                    if (pfn == pdfName)
                    {
                        //move xml and pdf files to working folder...
                        FileInfo xmlInfo = new FileInfo(xmlFilePath + xfn);
                        FileInfo pdfInfo = new FileInfo(xmlFilePath + pfn);
                        if (!File.Exists(workingFilePath + xfn))
                        {
                            xmlInfo.MoveTo(workingFilePath + xfn);                                
                        }

                        if (!File.Exists(workingFilePath + pfn))
                        {
                            pdfInfo.MoveTo(workingFilePath + pfn);
                        }                            
                    }
                }
            }

            //all files in the file objects should now be moved to working folder, if not, fix orphans...
        }

        return true;
    }

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

初见 2024-11-02 06:21:20

说实话我觉得这个问题问得有点差劲。这个问题是以一种非常复杂的方式表述的。我认为工作流程应该设计得更加稳健和确定性。（例如，为什么不首先上传压缩集中的文件对？）

（并且没有“某人”很可能“以前一定没有来过这里”）

以下是一些随机改进：

using System;
using System.Linq;
using System.IO;
using System.Text;
using System.Text.RegularExpressions;
using System.Collections.Generic;

namespace O
{
    static class X
    {
        private static readonly Regex _xml2pdf = new Regex("(_.*).xml$", RegexOptions.Compiled | RegexOptions.IgnoreCase);

        internal static void MoveFileGroups(string uploadFolder)
        {
            string workingFilePath = Path.Combine(uploadFolder, "PROGRESS");

            var groups = new DirectoryInfo(uploadFolder)
                .GetFiles()
                .GroupBy(fi => _xml2pdf.Replace(fi.Name, ".pdf"), StringComparer.InvariantCultureIgnoreCase)
                .Where(group => group.Count() >1);

            foreach (var group in groups)
            {
                if (!group.Any(fi => File.Exists(Path.Combine(workingFilePath, fi.Name))))
                    foreach (var file in group)
                        file.MoveTo(Path.Combine(workingFilePath, file.Name));
            }
        }

        public static void Main(string[]args)
        {
        }
    }
}

使用可读名称（说出你的意思）
如果文件名不包含“_”，则 IndexOf 返回 -1；随机上传文件名可能会使程序失败
在 Windows 上处理文件名不区分
大小写不要手动执行路径连接（您可能会意外地制造 UNC 路径，并且您的代码的可移植性较差）
不要假设一个 xml 将映射到一个 pdf：命名方案意味着许多 xml 映射到相同的 pdf 名称。此实现允许（或者您可以通过拒绝 groups.Where(g => g.Count()>2) 来检测情况）
仅以原子方式移动组 (!)：如果满足以下任一条件组中的文件存在于目标目录中，请勿移动任何文件（否则您将遇到竞争条件，其中组的一部分在最后一个文件（完全）上传之前被移动，并且永远不会被移动，因为该组是不再检测到

其他项目 (todo)

不要传递冗余参数如果您想要进行过滤，您可以传递 FI[] 而不是原始 GetFiles() 调用
，特别是：
- 处理IO异常
- 上传过程中可能会出现锁定错误（进行测试或最终导致文件损坏）；您需要原子处理这些（即不要移动组中的任何文件，除非所有文件都可以移动；这会有点棘手）
  - 测试您的代码（我的示例均未经过测试；它只是在 Linux 上使用 Mono 进行编译）

To be honest I think the question is a bit poor. The problem is stated in a very complicated fashion. I think the workflow be designed to be more robust and deterministic. (e.g. why not upload file pairs in zipped sets in the first place?)

(And no "Someone" most likely "must not have been here before")

Here are some random improvements:

using System;
using System.Linq;
using System.IO;
using System.Text;
using System.Text.RegularExpressions;
using System.Collections.Generic;

namespace O
{
    static class X
    {
        private static readonly Regex _xml2pdf = new Regex("(_.*).xml$", RegexOptions.Compiled | RegexOptions.IgnoreCase);

        internal static void MoveFileGroups(string uploadFolder)
        {
            string workingFilePath = Path.Combine(uploadFolder, "PROGRESS");

            var groups = new DirectoryInfo(uploadFolder)
                .GetFiles()
                .GroupBy(fi => _xml2pdf.Replace(fi.Name, ".pdf"), StringComparer.InvariantCultureIgnoreCase)
                .Where(group => group.Count() >1);

            foreach (var group in groups)
            {
                if (!group.Any(fi => File.Exists(Path.Combine(workingFilePath, fi.Name))))
                    foreach (var file in group)
                        file.MoveTo(Path.Combine(workingFilePath, file.Name));
            }
        }

        public static void Main(string[]args)
        {
        }
    }
}

use readable names (say what you mean)
IndexOf returns -1 if filename contains no "_"; random upload filenames could make procedure fail
Handle filenames case insensitive on Windows
Don't manually do the path concats (you could accidentally manufacture UNC paths, and your code is less portable)
don't assume one xml will map to one pdf: the naming scheme implies that many xmls map to the same pdf name. This implementation allows that (or you could detect the situation by rejecting groups.Where(g => g.Count()>2)
Move groups atomically only (!): if any one of the files in a group exist in the target dir, don't move any (or you will have a race condition, where part of a group get's moved before the last file was (completely) uploaded and it will never get moved because the group is no longer detected

Other items (todo)

Don't pass redundant parameters. You might pass a FI[] instead of the raw GetFiles() call if you want filtering.
Do error handling, notably:
- handle IO exceptions
- locking errors are expectable while uploads in progress (test it or end up with corrupted files); you need to atomically handle these (i.e. not move any files in a group unless all could be moved; this will be somewhat tricky)
  - test your code (none of my sample was tested; it just compiled on linux with mono)

回复收藏 0 原文

~没有更多了~