如何从路径和文件名中删除非法字符?
我需要一种强大且简单的方法来从简单字符串中删除非法路径和文件字符。 我使用了下面的代码,但它似乎没有做任何事情,我错过了什么?
using System;
using System.IO;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
string illegal = "\"M<>\"\\a/ry/ h**ad:>> a\\/:*?\"<>| li*tt|le|| la\"mb.?";
illegal = illegal.Trim(Path.GetInvalidFileNameChars());
illegal = illegal.Trim(Path.GetInvalidPathChars());
Console.WriteLine(illegal);
Console.ReadLine();
}
}
}
I need a robust and simple way to remove illegal path and file characters from a simple string. I've used the below code but it doesn't seem to do anything, what am I missing?
using System;
using System.IO;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
string illegal = "\"M<>\"\\a/ry/ h**ad:>> a\\/:*?\"<>| li*tt|le|| la\"mb.?";
illegal = illegal.Trim(Path.GetInvalidFileNameChars());
illegal = illegal.Trim(Path.GetInvalidPathChars());
Console.WriteLine(illegal);
Console.ReadLine();
}
}
}
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(30)
最初的问题要求“删除非法字符”:
您可能想替换它们:
此答案是 Ceres 的另一个线程,我真的很喜欢它简洁明了。
The original question asked to "remove illegal characters":
You may instead want to replace them:
This answer was on another thread by Ceres, I really like it neat and simple.
尝试这样的事情;
但我必须同意这些评论,我可能会尝试处理非法路径的根源,而不是尝试将非法路径改造成合法但可能是无意的路径。
编辑:或者使用正则表达式的潜在“更好”解决方案。
尽管如此,还是有人要问,为什么你首先要这样做。
Try something like this instead;
But I have to agree with the comments, I'd probably try to deal with the source of the illegal paths, rather than try to mangle an illegal path into a legitimate but probably unintended one.
Edit: Or a potentially 'better' solution, using Regex's.
Still, the question begs to be asked, why you're doing this in the first place.
我使用 Linq 来清理文件名。 您也可以轻松扩展它来检查有效路径。
更新
一些评论表明此方法不适用于他们,因此我添加了一个指向 DotNetFiddle 片段的链接,以便您可以验证该方法。
https://dotnetfiddle.net/nw1SWY
I use Linq to clean up filenames. You can easily extend this to check for valid paths as well.
Update
Some comments indicate this method is not working for them so I've included a link to a DotNetFiddle snippet so you may validate the method.
https://dotnetfiddle.net/nw1SWY
您可以使用 Linq 删除非法字符,如下所示:
编辑
这是评论中提到的所需编辑后的样子:
You can remove illegal chars using Linq like this:
EDIT
This is how it looks with the required edit mentioned in the comments:
对于文件名:
对于完整路径:
请注意,如果您打算将其用作安全功能,更可靠的方法是展开所有路径,然后验证用户提供的路径确实是用户应具有的目录的子目录进入。
For file names:
For full paths:
Note that if you intend to use this as a security feature, a more robust approach would be to expand all paths and then verify that the user supplied path is indeed a child of a directory the user should have access to.
这些都是很棒的解决方案,但它们都依赖于 Path.GetInvalidFileNameChars,这可能不像您想象的那么可靠。 请注意 MSDN 文档中关于
Path.GetInvalidFileNameChars
:使用
Path.GetInvalidPathChars
方法。它包含完全相同的注释。These are all great solutions, but they all rely on
Path.GetInvalidFileNameChars
, which may not be as reliable as you'd think. Notice the following remark in the MSDN documentation onPath.GetInvalidFileNameChars
:It's not any better with
Path.GetInvalidPathChars
method. It contains the exact same remark.从用户输入中删除非法字符的最佳方法是使用 Regex 类替换非法字符,在代码隐藏中创建方法,或者使用正则表达式控件在客户端进行验证。
或
The best way to remove illegal character from user input is to replace illegal character using Regex class, create method in code behind or also it validate at client side using RegularExpression control.
OR
对于初学者来说,Trim 仅删除开头或结尾的字符字符串。 其次,您应该评估是否真的想删除攻击性字符,或者快速失败并让用户知道他们的文件名无效。 我的选择是后者,但我的答案至少应该告诉你如何以正确和错误的方式做事:
StackOverflow 问题显示如何检查给定字符串是否为有效文件名。 请注意,您可以使用此问题中的正则表达式通过正则表达式替换来删除字符(如果您确实需要这样做)。
For starters, Trim only removes characters from the beginning or end of the string. Secondly, you should evaluate if you really want to remove the offensive characters, or fail fast and let the user know their filename is invalid. My choice is the latter, but my answer should at least show you how to do things the right AND wrong way:
StackOverflow question showing how to check if a given string is a valid file name. Note you can use the regex from this question to remove characters with a regular expression replacement (if you really need to do this).
我使用正则表达式来实现这一点。 首先,我动态构建正则表达式。
然后我只需调用removeInvalidChars.Replace来进行查找和替换。 这显然也可以扩展到覆盖路径字符。
I use regular expressions to achieve this. First, I dynamically build the regex.
Then I just call removeInvalidChars.Replace to do the find and replace. This can obviously be extended to cover path chars as well.
我绝对更喜欢杰夫·耶茨的想法。 如果您稍微修改它,它将完美地工作:
改进只是为了转义自动生成的正则表达式。
I absolutely prefer the idea of Jeff Yates. It will work perfectly, if you slightly modify it:
The improvement is just to escape the automaticially generated regex.
下面的代码片段应该对 .NET 3 及更高版本有所帮助。
Here's a code snippet that should help for .NET 3 and higher.
上面的大多数解决方案都将非法字符组合到路径和文件名中,这是错误的(即使两个调用当前返回同一组字符)。 我首先将路径和文件名中的路径+文件名分开,然后将适当的集应用于其中一个,然后再次将两者组合。
wvd_vegt
Most solutions above combine illegal chars for both path and filename which is wrong (even when both calls currently return the same set of chars). I would first split the path+filename in path and filename, then apply the appropriate set to either if them and then combine the two again.
wvd_vegt
如果删除或用单个字符替换无效字符,则可能会发生冲突:
以下是避免这种情况的简单方法:
结果:
If you remove or replace with a single character the invalid characters, you can have collisions:
Here is a simple method to avoid this:
The result:
这似乎是 O(n) 并且不会在字符串上花费太多内存:
This seems to be O(n) and does not spend too much memory on strings:
抛出异常。
Throw an exception.
文件名不能包含
Path.GetInvalidPathChars()
、+
和#
符号中的字符以及其他特定名称。 我们将所有检查合并到一个类中:方法
GetValidFileName
将所有不正确的数据替换为_
。File name can not contain characters from
Path.GetInvalidPathChars()
,+
and#
symbols, and other specific names. We combined all checks into one class:Method
GetValidFileName
replaces all incorrect data to_
.如果您必须在项目中的许多地方使用该方法,您还可以创建一个扩展方法并在项目中的任何位置调用它以获取字符串。
您可以将上述扩展方法调用为:
If you have to use the method in many places in a project, you could also make an extension method and call it anywhere in the project for strings.
You can call the above extension method as:
我写这个怪物是为了好玩,它可以让你往返:
I wrote this monster for fun, it lets you roundtrip:
我认为使用正则表达式进行验证并指定允许使用哪些字符要容易得多,而不是尝试检查所有错误字符。
请参阅这些链接:
http://www.c-sharpcorner.com/UploadFile/prasad_1/ RegExpPSD12062005021717AM/RegExpPSD.aspx
http://www.windowsdevcenter.com/pub/a/ oreilly/windows/news/csharp_0101.html
另外,搜索“正则表达式编辑器”,它们很有帮助。 有一些甚至可以为您输出 C# 代码。
I think it is much easier to validate using a regex and specifiing which characters are allowed, instead of trying to check for all bad characters.
See these links:
http://www.c-sharpcorner.com/UploadFile/prasad_1/RegExpPSD12062005021717AM/RegExpPSD.aspx
http://www.windowsdevcenter.com/pub/a/oreilly/windows/news/csharp_0101.html
Also, do a search for "regular expression editor"s, they help a lot. There are some around which even output the code in c# for you.
浏览这里的答案,它们**似乎都涉及使用无效文件名字符的字符数组。
诚然,这可能是微观优化 - 但为了任何可能希望检查大量值是否为有效文件名的人的利益,值得注意的是,构建无效字符的哈希集将带来明显更好的性能。
过去,我对哈希集(或字典)的性能优于列表迭代的速度感到非常惊讶(震惊)。 对于字符串来说,这个数字低得离谱(记忆中大约有 5-7 条)。 对于大多数其他简单数据(对象引用、数字等),神奇的交叉似乎约为 20 个项目。
Path.InvalidFileNameChars“列表”中有 40 个无效字符。 今天进行了搜索,StackOverflow 上有一个相当不错的基准测试,显示哈希集将花费 40 个项目的数组/列表的一半多一点的时间:https://stackoverflow.com/a/10762995/949129
这是我用来清理路径的帮助器类。 我现在忘记了为什么我有这个花哨的替换选项,但它是一个可爱的奖励。
还有额外的奖励方法“IsValidLocalPath”:)
(**那些不使用正则表达式的方法)
Scanning over the answers here, they all** seem to involve using a char array of invalid filename characters.
Granted, this may be micro-optimising - but for the benefit of anyone who might be looking to check a large number of values for being valid filenames, it's worth noting that building a hashset of invalid chars will bring about notably better performance.
I have been very surprised (shocked) in the past just how quickly a hashset (or dictionary) outperforms iterating over a list. With strings, it's a ridiculously low number (about 5-7 items from memory). With most other simple data (object references, numbers etc) the magic crossover seems to be around 20 items.
There are 40 invalid characters in the Path.InvalidFileNameChars "list". Did a search today and there's quite a good benchmark here on StackOverflow that shows the hashset will take a little over half the time of an array/list for 40 items: https://stackoverflow.com/a/10762995/949129
Here's the helper class I use for sanitising paths. I forget now why I had the fancy replacement option in it, but it's there as a cute bonus.
Additional bonus method "IsValidLocalPath" too :)
(** those which don't use regular expressions)
这是我的小贡献。 一种在同一字符串中进行替换而不创建新字符串或字符串构建器的方法。 它快速、易于理解,是本文中所有提到的一个很好的替代方案。
您可以这样调用它:
并返回:
值得注意的是,此方法始终会用给定值替换无效字符,但不会删除它们。 如果您想删除无效字符,则此替代方案可以解决问题:
BENCHMARK
如果您追求的是性能,我使用本文中找到的大多数方法执行了定时测试运行。 其中一些方法不会替换给定的字符,因为 OP 要求清理字符串。 我添加了用给定字符替换的测试,如果您的预期场景只需要删除不需要的字符,则添加一些用空字符替换的测试。 用于此基准测试的代码位于最后,因此您可以运行自己的测试。
注意:本文提出了方法
Test1
和Test2
。第一次运行
替换为“_”,1000000次迭代
结果:
第二次运行
删除无效字符,1000000次iterations
注意:Test1不会删除,只会替换。
结果:
基准结果
方法
Test1
、Test2
和Test5
是最快的。 方法Test8
是最慢的。代码
这是基准测试的完整代码:
Here is my small contribution. A method to replace within the same string without creating new strings or stringbuilders. It's fast, easy to understand and a good alternative to all mentions in this post.
You can call it like this:
and returns:
It's worth to note that this method will always replace invalid chars with a given value, but will not remove them. If you want to remove invalid chars, this alternative will do the trick:
BENCHMARK
I executed timed test runs with most methods found in this post, if performance is what you are after. Some of these methods don't replace with a given char, since OP was asking to clean the string. I added tests replacing with a given char, and some others replacing with an empty char if your intended scenario only needs to remove the unwanted chars. Code used for this benchmark is at the end, so you can run your own tests.
Note: Methods
Test1
andTest2
are both proposed in this post.First Run
replacing with '_', 1000000 iterations
Results:
Second Run
removing invalid chars, 1000000 iterations
Note: Test1 will not remove, only replace.
Results:
BENCHMARK RESULTS
Methods
Test1
,Test2
andTest5
are the fastest. MethodTest8
is the slowest.CODE
Here's the complete code of the benchmark:
你可以清楚地使用方法。
You can use method clearly.
一个用于清除 Windows 文件命名中任何非法字符的字符串的衬垫:
One liner to cleanup string from any illegal chars for windows file naming:
我已经推出了自己的方法,这似乎比此处发布的其他方法要快得多(尤其是正则表达式,它太慢了),但我没有测试发布的所有方法。
https://dotnetfiddle.net/haIXiY
第一种方法(我的)和第二种方法(也是我的,但是旧的) )还对反斜杠进行了额外的检查,因此基准测试并不完美,但无论如何它只是为了给您一个想法。
我的笔记本电脑上的结果(100 000 次迭代):
最快的方法:
方法不会“按原样”编译
InvalidCharacters
属性,请检查小提琴以获取完整代码I've rolled my own method, which seems to be a lot faster of other posted here (especially the regex which is so sloooooow) but I didn't tested all methods posted.
https://dotnetfiddle.net/haIXiY
The first method (mine) and second (also mine, but old one) also do an added check on backslashes, so the benchmark are not perfect, but anyways it's just to give you an idea.
Result on my laptop (for 100 000 iterations):
The fastest method:
Method doesn't compile "as is" dur to
InvalidCharacters
property, check the fiddle for full code这将满足您的要求,并避免碰撞
This will do want you want, and avoid collisions
我认为这个问题还没有完全回答......
答案仅描述干净的文件名或路径......而不是两者。 这是我的解决方案:
I think the question already not full answered...
The answers only describe clean filename OR path... not both. Here is my solution:
我创建了一个结合了几个建议的扩展方法:
来源:
I created an extension method that combines several suggestions:
Source:
这是一个用替换字符替换文件名中所有非法字符的函数:
例如下划线可以用作替换字符:
Here is a function which replaces all illegal characters in a file name by a replacement character:
For example the underscore can be used as a replacement character:
或者你可以这样做
Or you can just do