如何从文件名列表中解析编号序列?

发布于 2024-10-09 11:42:26 字数 1241 浏览 0 评论 0原文

我想从已排序 List中自动解析编号序列范围。通过检查文件名的哪一部分发生更改来更改文件名。

这是一个示例(文件扩展名已被删除):

第一个文件名IMG_0000
最后一个文件名IMG_1000
我需要的编号范围00001000

除了我需要处理每种可能类型的文件命名约定,例如作为:

0000 ... 9999
20080312_0000 ... 20080312_9999
IMG_0000 - 复制... IMG_9999 - 复制
8er_green3_00001 .. 8er_green3_09999
等等

  • 我想要整个 0 填充范围,例如 0001 而不仅仅是 1
  • 序列号是 0 填充的,例如 0001
  • 序列号可以位于任何地方,例如 IMG_0000 - Copy
  • 该范围可以以任何数字开始和结束,即不必以 1 开头并以 9999
  • 数字 结束可能会在序列的文件名中多次出现,例如 20080312_0000

每当我得到适用于 8 个随机测试用例的东西时,第 9 个测试会破坏所有内容,最终我会从头开始。

我目前仅比较第一个和最后一个文件名(而不是遍历所有文件名):

void FindRange(List<FileData> files, out string startRange, out string endRange)
{
    string firstFile = files.First().ShortName;
    string lastFile = files.Last().ShortName;

    ...
}

有人有什么聪明的想法吗?也许与正则表达式有关?

I would like to automatically parse a range of numbered sequences from an already sorted List<FileData> of filenames by checking which part of the filename changes.

Here is an example (file extension has already been removed):

First filename: IMG_0000
Last filename: IMG_1000
Numbered Range I need: 0000 and 1000

Except I need to deal with every possible type of file naming convention such as:

0000 ... 9999
20080312_0000 ... 20080312_9999
IMG_0000 - Copy ... IMG_9999 - Copy
8er_green3_00001 .. 8er_green3_09999
etc.

  • I would like the entire 0-padded range e.g. 0001 not just 1
  • The sequence number is 0-padded e.g. 0001
  • The sequence number can be located anywhere e.g. IMG_0000 - Copy
  • The range can start and end with anything i.e. doesn't have to start with 1 and end with 9999
  • Numbers may appear multiple times in the filename of the sequence e.g. 20080312_0000

Whenever I get something working for 8 random test cases, the 9th test breaks everything and I end up re-starting from scratch.

I've currently been comparing only the first and last filenames (as opposed to iterating through all filenames):

void FindRange(List<FileData> files, out string startRange, out string endRange)
{
    string firstFile = files.First().ShortName;
    string lastFile = files.Last().ShortName;

    ...
}

Does anyone have any clever ideas? Perhaps something with Regex?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

指尖微凉心微凉 2024-10-16 11:42:26

如果您保证知道文件以数字结尾(例如 _\d+),并且已排序,则只需获取第一个和最后一个元素,这就是您的范围。如果文件名全部相同,您可以对列表进行排序以按数字顺序排列它们。除非我在这里遗漏了一些明显的东西——问题出在哪里?

If you're guaranteed to know the files end with the number (eg. _\d+), and are sorted, just grab the first and last elements and that's your range. If the filenames are all the same, you can sort the list to get them in order numerically. Unless I'm missing something obvious here -- where's the problem?

数理化全能战士 2024-10-16 11:42:26

使用正则表达式从文件名中解析出数字:

^.+\w(\d+)[^\d]*$

从这些解析的字符串中,找到最大长度,并将小于最大长度的任何内容用零填充到左侧。

按字母顺序对这些填充字符串进行排序。从此排序列表中取出第一个和最后一个,即可得到最小和最大数字。

Use a regex to parse out the numbers from the filenames:

^.+\w(\d+)[^\d]*$

From these parsed strings, find the maximum length, and left-pad any that are less than the maximum length with zeros.

Sort these padded strings alphabetically. Take the first and last from this sorted list to give you your min and max numbers.

物价感观 2024-10-16 11:42:26

首先,我假设数字总是用零填充,以便它们具有相同的长度。如果不是,那么更大的麻烦还在后面。

其次,假设除了增量数字部分之外,文件名完全相同。

如果这些假设成立,那么算法应该是查看第一个和最后一个文件名中的每个字符,以确定哪些相同位置的字符不匹配。

var start = String.Empty;
var end = String.Empty;

for (var index = 0; index < firstFile.Length; index++)
{
    char c = firstFile[index];

    if (filenames.Any(filename => filename[index] != c))
    {            
        start += firstFile[index];
        end += lastFile[index];
    }
}    
// convert to int if required

编辑:更改为检查每个文件名,直到发现差异。虽然效率不高,但非常简单明了。

Firstly, I will assume that the numbers are always zero-padded so that they are the same length. If not then bigger headaches lie ahead.

Secondly, assume that the file names are exactly the same apart from the increment number component.

If these assumptions are true then the algorithm should be to look at each character in the first and last filenames to determine which same-positioned characters do not match.

var start = String.Empty;
var end = String.Empty;

for (var index = 0; index < firstFile.Length; index++)
{
    char c = firstFile[index];

    if (filenames.Any(filename => filename[index] != c))
    {            
        start += firstFile[index];
        end += lastFile[index];
    }
}    
// convert to int if required

edit: Changed to check every filename until a difference is found. Not as efficient as it could be but very simple and straightforward.

全部不再 2024-10-16 11:42:26

这是我的解决方案。它适用于您提供的所有示例,并且假设输入数组已排序。

请注意,它并不专门针对数字;它看起来像是数字。它会查找所有字符串中可能不同的一致字符序列。因此,如果您为其提供 {"0000", "0001", "0002"} ,它将返回“0”和“2”作为开始和结束字符串,因为这是唯一的部分不同的字符串。如果您输入{"0000", "0010", "0100"},它会返回“00”和“10”。

但是如果你给它{"0000", "0101"},它会抱怨,因为字符串的不同部分不连续。如果您希望修改此行为,以便它将返回从第一个不同字符到最后一个不同字符的所有内容,那很好;我可以做出这样的改变。但是,如果您向其提供大量文件名,这些文件名将对数字区域进行连续更改,那么这应该不是问题。

public static class RangeFinder
{
    public static void FindRange(IEnumerable<string> strings,
        out string startRange, out string endRange)
    {
        using (var e = strings.GetEnumerator()) {
            if (!e.MoveNext())
                throw new ArgumentException("strings", "No elements.");

            if (e.Current == null)
                throw new ArgumentException("strings",
                    "Null element encountered at index 0.");

            var template = e.Current;

            // If an element in here is true, it means that index differs.
            var matchMatrix = new bool[template.Length];

            int index = 1;

            string last = null;
            while (e.MoveNext()) {
                if (e.Current == null)
                    throw new ArgumentException("strings",
                        "Null element encountered at index " + index + ".");

                last = e.Current;
                if (last.Length != template.Length)
                    throw new ArgumentException("strings",
                        "Element at index " + index + " has incorrect length.");

                for (int i = 0; i < template.Length; i++)
                    if (last[i] != template[i])
                        matchMatrix[i] = true;
            }

            // Verify the matrix:
            // * There must be at least one true value.
            // * All true values must be consecutive.
            int start = -1;
            int end = -1;
            for (int i = 0; i < matchMatrix.Length; i++) {
                if (matchMatrix[i]) {
                    if (end != -1)
                        throw new ArgumentException("strings",
                            "Inconsistent match matrix; no usable pattern discovered.");

                    if (start == -1)
                        start = i;
                } else {
                    if (start != -1 && end == -1)
                        end = i;
                }
            }

            if (start == -1)
                throw new ArgumentException("strings",
                    "Strings did not vary; no usable pattern discovered.");

            if (end == -1)
                end = matchMatrix.Length;

            startRange = template.Substring(start, end - start);
            endRange = last.Substring(start, end - start);
        }
    }
}

Here is my solution. It works with all of the examples that you have provided and it assumes the input array to be sorted.

Note that it doesn't look exclusively for numbers; it looks for a consistent sequence of characters that might differ across all of the strings. So if you provide it with {"0000", "0001", "0002"} it will hand back "0" and "2" as the start and end strings, since that's the only part of the strings that differ. If you give it {"0000", "0010", "0100"}, it will give you back "00" and "10".

But if you give it {"0000", "0101"}, it will whine since the differing parts of the string are not contiguous. If you would like this behavior modified so it will return everything from the first differing character to the last, that's fine; I can make that change. But if you are feeding it a ton of filenames that will have sequential changes to the number region, this should not be a problem.

public static class RangeFinder
{
    public static void FindRange(IEnumerable<string> strings,
        out string startRange, out string endRange)
    {
        using (var e = strings.GetEnumerator()) {
            if (!e.MoveNext())
                throw new ArgumentException("strings", "No elements.");

            if (e.Current == null)
                throw new ArgumentException("strings",
                    "Null element encountered at index 0.");

            var template = e.Current;

            // If an element in here is true, it means that index differs.
            var matchMatrix = new bool[template.Length];

            int index = 1;

            string last = null;
            while (e.MoveNext()) {
                if (e.Current == null)
                    throw new ArgumentException("strings",
                        "Null element encountered at index " + index + ".");

                last = e.Current;
                if (last.Length != template.Length)
                    throw new ArgumentException("strings",
                        "Element at index " + index + " has incorrect length.");

                for (int i = 0; i < template.Length; i++)
                    if (last[i] != template[i])
                        matchMatrix[i] = true;
            }

            // Verify the matrix:
            // * There must be at least one true value.
            // * All true values must be consecutive.
            int start = -1;
            int end = -1;
            for (int i = 0; i < matchMatrix.Length; i++) {
                if (matchMatrix[i]) {
                    if (end != -1)
                        throw new ArgumentException("strings",
                            "Inconsistent match matrix; no usable pattern discovered.");

                    if (start == -1)
                        start = i;
                } else {
                    if (start != -1 && end == -1)
                        end = i;
                }
            }

            if (start == -1)
                throw new ArgumentException("strings",
                    "Strings did not vary; no usable pattern discovered.");

            if (end == -1)
                end = matchMatrix.Length;

            startRange = template.Substring(start, end - start);
            endRange = last.Substring(start, end - start);
        }
    }
}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文