C#:正则表达式如何区分字符串的两个变体
这很难解释足以提出这个问题,但我会尝试:
我有两种可能的用户输入:
S01E05或0105(两个不同的输入字符串)
,它们都翻译为第01季,第05集,
但如果他们用户向后输入它E05S01 或 0501,我需要能够返回相同的结果,Season 01 Episode 05
对此的控制将是用户定义原始文件名的格式,如下所示: “SssEee”——大写“S”表示后面的小写“s”属于“季节”,大写“E”表示后面的小写“e”属于“剧集”。因此,如果用户决定将格式定义为 EeeSss,那么我的函数仍应返回相同的结果,因为它知道哪些数字属于季节或剧集。
我还没有任何工作可以分享,但我正在玩的是构建正则表达式模式的循环。到目前为止,该函数接受用户格式和文件名:
public static int(string userFormat, string fileName)
{
}
userFormat 将是一个字符串,如下所示:
tttSssEee
甚至
t.SssEee
其中 t 代表标题,其余的你知道。
文件名可能如下所示:
太空堡垒.银河.S01E05.mkv
我已经获得了通过使用 userFormat 构建正则表达式字符串从文件名中提取标题的功能
public static string GetTitle(string userFormat, string fileName)
{
string pattern = "^";
char positionChar;
string fileTitle;
for (short i = 0; i < userFormat.Length; i++)
{
positionChar = userFormat[i];
//build the regex pattern
if (positionChar == 't')
{
pattern += @"\w+";
}
else if (positionChar == '#')
{
pattern += @"\d+";
}
else if (positionChar == ' ')
{
pattern += @"\s+";
}
else
pattern += positionChar;
}
//pulls out the title with or without the delimiter
Match title = Regex.Match(fileName, pattern, RegexOptions.IgnoreCase);
fileTitle = title.Groups[0].Value;
//remove the delimiter
string[] tempString = fileTitle.Split(@"\/.-<>".ToCharArray());
fileTitle = "";
foreach (string part in tempString)
{
fileTitle += part + " ";
}
return CultureInfo.CurrentCulture.TextInfo.ToTitleCase(fileTitle);
}
,但我对如何提取剧集和季号感到困惑。在我的脑海中,我认为该过程将类似于:
- 查看 userFormat 字符串以查找大写 S
- 确定大写 S 后面有多少个小写 ' '
- 构建描述此内容的正则表达式
- 搜索文件名并找到该模式
- 从该模式中提取数字
听起来很简单,但我很难将其付诸实践。复杂的是,文件名的格式可能是 S01E05,也可能只是 0105。用户在定义格式时会识别这两种情况。
例 1. 文件名是 Battlestar.galoida.S01E05
提交的用户格式为 tt?ss?ee
例 2. 文件名是 Battlestar.galoida.0105
提交的用户格式将为 ttSssEee
Ex 3. 文件名为 Battlestar.galacica.0501
提交的用户格式将为ttEeeSss
对这本书感到抱歉...这个概念很简单,正则表达式函数应该是动态的,允许用户定义文件名的格式,我的方法可以在其中生成表达式并使用它从文件名中提取信息。有些东西告诉我,这比看起来要简单……但我不知所措。哈哈...有什么建议吗?
This is tough to explain enough to ask the question, but i'll try:
I have two possibilities of user input:
S01E05 or 0105 (two different input strings)
which both translate to season 01, episode 05
but if they user inputs it backwards E05S01 or 0501, i need to be able to return the same result, Season 01 Episode 05
The control for this would be the user defining the format of the original filename with something like this:
"SssEee" -- uppercase 'S' denoting that the following lowercase 's' belong to Season and uppercase 'E' denoting that the following lowercase 'e' belong to Episode. So if the user decides to define the format as EeeSss then my function should still return the same result since it knows which numbers belong to season or episode.
I don't have anything working quite yet to share, but what I was toying with is a loop that builds the regex pattern. The function, so far, accepts the user format and the file name:
public static int(string userFormat, string fileName)
{
}
the userFormat would be a string and look something like this:
t.t.t.SssEee
or even
t.SssEee
where t is for title, and the rest you know.
The file name might look like this:
battlestar.galactica.S01E05.mkv
Ive got the function that extracts the title from the file name by using the userFormat to build the regex string
public static string GetTitle(string userFormat, string fileName)
{
string pattern = "^";
char positionChar;
string fileTitle;
for (short i = 0; i < userFormat.Length; i++)
{
positionChar = userFormat[i];
//build the regex pattern
if (positionChar == 't')
{
pattern += @"\w+";
}
else if (positionChar == '#')
{
pattern += @"\d+";
}
else if (positionChar == ' ')
{
pattern += @"\s+";
}
else
pattern += positionChar;
}
//pulls out the title with or without the delimiter
Match title = Regex.Match(fileName, pattern, RegexOptions.IgnoreCase);
fileTitle = title.Groups[0].Value;
//remove the delimiter
string[] tempString = fileTitle.Split(@"\/.-<>".ToCharArray());
fileTitle = "";
foreach (string part in tempString)
{
fileTitle += part + " ";
}
return CultureInfo.CurrentCulture.TextInfo.ToTitleCase(fileTitle);
}
but im kind of stumped on how to do the extraction of the episode and season numbers. In my head im thinking the process would look something like:
- Look through the userFormat string to find the uppercase S
- Determine how many lowercase 's' are following the uppercase S
- Build the regex expression that describes this
- Search through the file name and find that pattern
- Extract the number from that pattern
Sounds simple enough but im having a hard time putting it into actions. The complication being the the fact that the format in the filename could be S01E05 or it could be simply 0105. Either scenario would be identified by the user when they define the format.
Ex 1. the file name is battlestar.galactica.S01E05
the user format submitted will be t.t.?ss?ee
Ex 2. the file name is battlestar.galactica.0105
the user format submitted will be t.t.SssEee
Ex 3. the file name is battlestar.galactica.0501
the user format submitted will be t.t.EeeSss
Sorry for the book... the concept is simple, the regex function should be dynamic, allowing the user to define the format of a file name to where my method can generate the expression and use it to extract information from the file name. Something is telling me that this is simpler than it seems... but im at a loss. lol... any suggestions?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
因此,如果我没看错的话,您就会知道季节/剧集编号在字符串中的位置,因为用户已经告诉您了。也就是说,您有
tt.more.stuff
。并且
可以采用以下形式之一:或者您是否说过用户可以定义季和剧集将使用多少位数字?也就是说,会不会是S01E123?
我不确定你是否需要一个正则表达式。由于您知道格式,并且内容似乎是由句点分隔的(我假设各个字段中不能有句点),因此您应该能够使用
String.Split
来提取片段,并且您可以从用户的格式知道季节/剧集在结果数组中的位置。现在您有了一个采用上述形式之一的字符串。您拥有用户的格式定义和季/集编号。您应该能够编写一个循环来一起遍历两个字符串并提取必要的信息,或者发出错误。
请注意,您可能需要进行一些检查才能确保长度正确。也就是说,当用户的格式为SssEee时,上面的代码将接受S01E1。您可以添加更多的错误处理,具体取决于您对错误输入的担心程度。但我认为这给了你这个想法的要点。
我必须认为这比尝试动态构建正则表达式要容易得多。
So if I read this right, you know where the the Season/Episode number is in the string because the user has told you. That is, you have
t.t.<number>.more.stuff
. And<number>
can take one of these forms:Or did you say that the user can define how many digits will be used for season and episode? That is, could it be S01E123?
I'm not sure you need a regex for this. Since you know the format, and it appears that things are separated by periods (I assume that there can't be periods in the individual fields), you should be able to use
String.Split
to extract the pieces, and you know from the user's format where the Season/Episode is in the resulting array. So you now have a string that takes one of the forms above.You have the user's format definition and the Season/Episode number. You should be able to write a loop that steps through the two strings together and extracts the necessary information, or issues an error.
Note that you'll probably have to do some checking to see that the lengths are correct. That is, the code above will accept S01E1 when the user's format is SssEee. There's a bit more error handling that you can add, depending on how worried you are about bad input. But I think this gives you the gist of the idea.
I have to think that's going to be a whole lot easier than trying to dynamically build regular expressions.
在 @Sinaesthetic 回答了我的问题后,我们可以将他的原始帖子缩减为:
挑战是接收以下任何输入:
并转换任何这些输入到:S01E05
此时标题和文件格式无关紧要,它们只是附加到末尾。
基于此,以下代码将始终生成“Battlestar.Galoida.S01E05.mkv”
After @Sinaesthetic answered my question we can reduce his original post to:
The challenge is to receive any of these inputs:
and transform any of these inputs into: S01E05
At this point title and file format are irrelevant, they just get tacked on to the ends.
Based on that the following code will always result in 'Battlestar.Galactica.S01E05.mkv'