如何清理用作文件名的字符串?

发布于 2024-07-23 12:24:30 字数 214 浏览 6 评论 0原文

我有一个例程可以将文件转换为不同的格式并保存它。 原始数据文件已编号,但我的例程根据原始数据中找到的内部名称为输出提供了一个文件名。

我尝试在整个目录上批量运行它,它工作得很好,直到我遇到一个内部名称中带有斜杠的文件。 哎呀! 如果它在这里这样做,它就可以轻松地在其他文件上这样做。 是否有 RTL(或 WinAPI)例程可以清理字符串并删除无效符号,以便可以安全地用作文件名?

I've got a routine that converts a file into a different format and saves it. The original datafiles were numbered, but my routine gives the output a filename based on an internal name found in the original.

I tried to batch-run it on a whole directory, and it worked fine until I hit one file whose internal name had a slash in it. Oops! And if it does that here, it could easily do it on other files. Is there an RTL (or WinAPI) routine somewhere that will sanitize a string and remove invalid symbols so it's safe to use as a filename?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(8

遥远的她 2024-07-30 12:24:30

您可以使用 PathGetCharType 函数PathCleanupSpec 函数 或以下技巧:

  function IsValidFilePath(const FileName: String): Boolean;
  var
    S: String;
    I: Integer;
  begin
    Result := False;
    S := FileName;
    repeat
      I := LastDelimiter('\/', S);
      MoveFile(nil, PChar(S));
      if (GetLastError = ERROR_ALREADY_EXISTS) or
         (
           (GetFileAttributes(PChar(Copy(S, I + 1, MaxInt))) = INVALID_FILE_ATTRIBUTES)
           and
           (GetLastError=ERROR_INVALID_NAME)
         ) then
        Exit;
      if I>0 then
        S := Copy(S,1,I-1);
    until I = 0;
    Result := True;
  end;

此代码将字符串分为几部分并使用MoveFile来验证每个部分。 对于无效字符或保留文件名(如“COM”),MoveFile 将失败;对于有效文件名,MoveFile 将返回成功或 ERROR_ALREADY_EXISTS。


PathCleanupSpec 位于 Win32API/JwaShlObj.pas 下的 Jedi Windows API

You can use PathGetCharType function, PathCleanupSpec function or the following trick:

  function IsValidFilePath(const FileName: String): Boolean;
  var
    S: String;
    I: Integer;
  begin
    Result := False;
    S := FileName;
    repeat
      I := LastDelimiter('\/', S);
      MoveFile(nil, PChar(S));
      if (GetLastError = ERROR_ALREADY_EXISTS) or
         (
           (GetFileAttributes(PChar(Copy(S, I + 1, MaxInt))) = INVALID_FILE_ATTRIBUTES)
           and
           (GetLastError=ERROR_INVALID_NAME)
         ) then
        Exit;
      if I>0 then
        S := Copy(S,1,I-1);
    until I = 0;
    Result := True;
  end;

This code divides string into parts and uses MoveFile to verify each part. MoveFile will fail for invalid characters or reserved file names (like 'COM') and return success or ERROR_ALREADY_EXISTS for valid file name.


PathCleanupSpec is in the Jedi Windows API under Win32API/JwaShlObj.pas

贵在坚持 2024-07-30 12:24:30

关于是否有 API 函数来清理文件名称(甚至检查其有效性)的问题 - 似乎没有。 引用 PathSearchAndQualify()函数

似乎没有任何 Windows API 可以验证用户输入的路径; 这将作为每个应用程序的临时练习。

因此,您只能从 文件名、路径、和命名空间 (Windows)

  • 使用当前代码页中的几乎所有字符作为名称,包括 Unicode 字符和扩展字符集中的字符 (128–255),但以下字符除外:

    • 不允许使用以下保留字符:
      < > : " / \ | ? *
    • 不允许使用整数表示形式在 0 到 31 范围内的字符。
    • 目标文件系统不允许的任何其他字符。
  • 请勿使用以下保留设备名称作为名称文件:CONPRNAUXNULCOM1..COM9 , LPT1..LPT9.
    还要避免这些名称后紧跟着扩展名; 例如,不建议使用 NUL.txt

如果您知道您的程序只会写入 NTFS 文件系统,您可能可以确定没有文件系统不允许的其他字符,因此您只需检查文件名是否太长(使用MAX_PATH 常量)在所有无效字符被删除(或例如用下划线替换)之后。

程序还应确保文件名清理不会导致文件名冲突,并且它会默默地覆盖最终具有相同名称的其他文件。

Regarding the question whether there is any API function to sanitize a file a name (or even check for its validity) - there seems to be none. Quoting from the comment on the PathSearchAndQualify() function:

There does not appear to be any Windows API that will validate a path entered by the user; this is left as an an ad hoc exercise for each application.

So you can only consult the rules for file name validity from File Names, Paths, and Namespaces (Windows):

  • Use almost any character in the current code page for a name, including Unicode characters and characters in the extended character set (128–255), except for the following:

    • The following reserved characters are not allowed:
      < > : " / \ | ? *
    • Characters whose integer representations are in the range from zero through 31 are not allowed.
    • Any other character that the target file system does not allow.
  • Do not use the following reserved device names for the name of a file: CON, PRN, AUX, NUL, COM1..COM9, LPT1..LPT9.
    Also avoid these names followed immediately by an extension; for example, NUL.txt is not recommended.

If you know that your program will only ever write to NTFS file systems you can probably be sure that there are no other characters that the file system does not allow, so you would only have to check that the file name is not too long (use the MAX_PATH constant) after all invalid chars have been removed (or replaced by underscores, for example).

A program should also make sure that the file name sanitizing has not lead to file name conflicts and it silently overwrites other files which ended up with the same name.

她如夕阳 2024-07-30 12:24:30
{
  CleanFileName
  ---------------------------------------------------------------------------

  Given an input string strip any chars that would result
  in an invalid file name.  This should just be passed the
  filename not the entire path because the slashes will be
  stripped.  The function ensures that the resulting string
  does not hae multiple spaces together and does not start
  or end with a space.  If the entire string is removed the
  result would not be a valid file name so an error is raised.

}

function CleanFileName(const InputString: string): string;
var
  i: integer;
  ResultWithSpaces: string;
begin

  ResultWithSpaces := InputString;

  for i := 1 to Length(ResultWithSpaces) do
  begin
    // These chars are invalid in file names.
    case ResultWithSpaces[i] of 
      '/', '\', ':', '*', '?', '"', '<', '>', '|', ' ', #$D, #$A, #9:
        // Use a * to indicate a duplicate space so we can remove
        // them at the end.
        {$WARNINGS OFF} // W1047 Unsafe code 'String index to var param'
        if (i > 1) and
          ((ResultWithSpaces[i - 1] = ' ') or (ResultWithSpaces[i - 1] = '*')) then
          ResultWithSpaces[i] := '*'
        else
          ResultWithSpaces[i] := ' ';

        {$WARNINGS ON}
    end;
  end;

  // A * indicates duplicate spaces.  Remove them.
  result := ReplaceStr(ResultWithSpaces, '*', '');

  // Also trim any leading or trailing spaces
  result := Trim(Result);

  if result = '' then
  begin
    raise(Exception.Create('Resulting FileName was empty Input string was: '
      + InputString));
  end;
end;
{
  CleanFileName
  ---------------------------------------------------------------------------

  Given an input string strip any chars that would result
  in an invalid file name.  This should just be passed the
  filename not the entire path because the slashes will be
  stripped.  The function ensures that the resulting string
  does not hae multiple spaces together and does not start
  or end with a space.  If the entire string is removed the
  result would not be a valid file name so an error is raised.

}

function CleanFileName(const InputString: string): string;
var
  i: integer;
  ResultWithSpaces: string;
begin

  ResultWithSpaces := InputString;

  for i := 1 to Length(ResultWithSpaces) do
  begin
    // These chars are invalid in file names.
    case ResultWithSpaces[i] of 
      '/', '\', ':', '*', '?', '"', '<', '>', '|', ' ', #$D, #$A, #9:
        // Use a * to indicate a duplicate space so we can remove
        // them at the end.
        {$WARNINGS OFF} // W1047 Unsafe code 'String index to var param'
        if (i > 1) and
          ((ResultWithSpaces[i - 1] = ' ') or (ResultWithSpaces[i - 1] = '*')) then
          ResultWithSpaces[i] := '*'
        else
          ResultWithSpaces[i] := ' ';

        {$WARNINGS ON}
    end;
  end;

  // A * indicates duplicate spaces.  Remove them.
  result := ReplaceStr(ResultWithSpaces, '*', '');

  // Also trim any leading or trailing spaces
  result := Trim(Result);

  if result = '' then
  begin
    raise(Exception.Create('Resulting FileName was empty Input string was: '
      + InputString));
  end;
end;
江城子 2024-07-30 12:24:30
// for all platforms (Windows\Unix), uses IOUtils.
function ReplaceInvalidFileNameChars(const aFileName: string; const aReplaceWith: Char = '_'): string;
var
  i: integer;
begin
  Result := aFileName;
  for i := Low(Result) to High(Result) do
  begin
    if not TPath.IsValidFileNameChar(Result[i]) then
      Result[i] := aReplaceWith;
  end;
end.
// for all platforms (Windows\Unix), uses IOUtils.
function ReplaceInvalidFileNameChars(const aFileName: string; const aReplaceWith: Char = '_'): string;
var
  i: integer;
begin
  Result := aFileName;
  for i := Low(Result) to High(Result) do
  begin
    if not TPath.IsValidFileNameChar(Result[i]) then
      Result[i] := aReplaceWith;
  end;
end.
后来的我们 2024-07-30 12:24:30

对于其他阅读本文并想要使用 PathCleanupSpec 的人,我编写了这个测试例程,它似乎有效......网络上肯定缺乏示例。
您需要包含ShlObj.pas(不确定何时添加PathCleanupSpec,但我在Delphi 2010中对此进行了测试)
您还需要检查 XP sp2 或更高版本

procedure TMainForm.btnTestClick(Sender: TObject);
var
  Path: array [0..MAX_PATH - 1] of WideChar;
  Filename: array[0..MAX_PATH - 1] of WideChar;
  ReturnValue: integer;
  DebugString: string;

begin
  StringToWideChar('a*dodgy%\filename.
amp;^abc',FileName, MAX_PATH);
  StringToWideChar('C:\',Path, MAX_PATH);
  ReturnValue:= PathCleanupSpec(Path,Filename);
  DebugString:= ('Cleaned up filename:'+Filename+#13+#10);
  if (ReturnValue and $80000000)=$80000000 then
    DebugString:= DebugString+'Fatal result. The cleaned path is not a valid file name'+#13+#10;
  if (ReturnValue and $00000001)=$00000001 then
    DebugString:= DebugString+'Replaced one or more invalid characters'+#13+#10;
  if (ReturnValue and $00000002)=$00000002 then
    DebugString:= DebugString+'Removed one or more invalid characters'+#13+#10;
  if (ReturnValue and $00000004)=$00000004 then
    DebugString:= DebugString+'The returned path is truncated'+#13+#10;
  if (ReturnValue and $00000008)=$00000008 then
    DebugString:= DebugString+'The input path specified at pszDir is too long to allow the formation of a valid file name from pszSpec'+#13;
  ShowMessage(DebugString);
end;

For anyone else reading this and wanting to use PathCleanupSpec, I wrote this test routine which seems to work... there is a definate lack of examples on the 'net.
You need to include ShlObj.pas (not sure when PathCleanupSpec was added but I tested this in Delphi 2010)
You will also need to check for XP sp2 or higher

procedure TMainForm.btnTestClick(Sender: TObject);
var
  Path: array [0..MAX_PATH - 1] of WideChar;
  Filename: array[0..MAX_PATH - 1] of WideChar;
  ReturnValue: integer;
  DebugString: string;

begin
  StringToWideChar('a*dodgy%\filename.
amp;^abc',FileName, MAX_PATH);
  StringToWideChar('C:\',Path, MAX_PATH);
  ReturnValue:= PathCleanupSpec(Path,Filename);
  DebugString:= ('Cleaned up filename:'+Filename+#13+#10);
  if (ReturnValue and $80000000)=$80000000 then
    DebugString:= DebugString+'Fatal result. The cleaned path is not a valid file name'+#13+#10;
  if (ReturnValue and $00000001)=$00000001 then
    DebugString:= DebugString+'Replaced one or more invalid characters'+#13+#10;
  if (ReturnValue and $00000002)=$00000002 then
    DebugString:= DebugString+'Removed one or more invalid characters'+#13+#10;
  if (ReturnValue and $00000004)=$00000004 then
    DebugString:= DebugString+'The returned path is truncated'+#13+#10;
  if (ReturnValue and $00000008)=$00000008 then
    DebugString:= DebugString+'The input path specified at pszDir is too long to allow the formation of a valid file name from pszSpec'+#13;
  ShowMessage(DebugString);
end;
囚你心 2024-07-30 12:24:30

嗯,最简单的事情就是使用正则表达式和您最喜欢的语言版本的 gsub 来替换任何不是“单词字符”的内容。 在大多数具有类似 Perl 正则表达式的语言中,此字符类为“\w”,否则作为简单选项为“[A-Za-z0-9]”。

特别是,与其他答案中的一些示例相比,您不想查找要删除的无效字符,而是查找要保留的有效字符。 如果您正在寻找无效的字符,那么您总是很容易受到新字符引入的影响,但如果您只寻找有效的字符,那么您的效率可能会稍微低一些(因为您替换了一个实际上并不存在的字符)需要),但至少你永远不会错。

现在,如果您想让新版本尽可能与旧版本一样,您可以考虑替换。 您可以替换一个或多个您认为可以的字符,而不是删除。 但这样做是一个足够有趣的问题,它可能是另一个问题的好主题。

Well, the easy thing is to use a regex and your favourite language's version of gsub to replace anything that's not a "word character." This character class would be "\w" in most languages with Perl-like regexes, or "[A-Za-z0-9]" as a simple option otherwise.

Particularly, in contrast to some of the examples in other answers, you don't want to look for invalid characters to remove, but look for valid characters to keep. If you're looking for invalid characters, you're always vulnerable to the introduction of new characters, but if you're looking for only valid ones, you might be slightly less inefficient (in that you replaced a character you didn't really need to), but at least you'll never be wrong.

Now, if you want to make the new version as much like the old as possible, you might consider replacement. Instead of deleting, you can substitute a character or characters you know to be ok. But doing that is an interesting enough problem that it's probably a good topic for another question.

很酷又爱笑 2024-07-30 12:24:30

在现代 delphi 上尝试一下:

 use System.IOUtils;
 ...
 result := TPath.HasValidFileNameChars(FileName, False)

我还允许在文件名中包含德语变音符号或其他字符,例如 -、_、..。

Try this on a modern delphi:

 use System.IOUtils;
 ...
 result := TPath.HasValidFileNameChars(FileName, False)

I allows also to have german umlauts or other chars like -, _,.. in a filename.

萌辣 2024-07-30 12:24:30

使用此功能。 对我来说工作得很好
取回一级目录名称...

目的是使用 shelobj

function  CleanDirName(DirFileName : String) : String;
var
  CheckStr : String;
  Path: array [0..MAX_PATH - 1] of WideChar;
  Filename: array[0..MAX_PATH - 1] of WideChar;
  ReturnValue: integer;

begin
  //--     The following are considered invalid characters in all names.
  //--     \ / : * ? " < > |

  CheckStr := Trim(DirFileName);
  CheckStr := StringReplace(CheckStr,'/','-',[rfReplaceAll, rfIgnoreCase]);
  CheckStr := StringReplace(CheckStr,'\','-',[rfReplaceAll, rfIgnoreCase]);
  CheckStr := StringReplace(CheckStr,'.','-',[rfReplaceAll, rfIgnoreCase]);
  CheckStr := StringReplace(CheckStr,':',' ',[rfReplaceAll, rfIgnoreCase]);
  CheckStr := StringReplace(CheckStr,'?',' ',[rfReplaceAll, rfIgnoreCase]);
  CheckStr := StringReplace(CheckStr,'<',' ',[rfReplaceAll, rfIgnoreCase]);
  CheckStr := StringReplace(CheckStr,'>',' ',[rfReplaceAll, rfIgnoreCase]);
  CheckStr := StringReplace(CheckStr,'|',' ',[rfReplaceAll, rfIgnoreCase]);
  CheckStr := StringReplace(CheckStr,'!',' ',[rfReplaceAll, rfIgnoreCase]);
  CheckStr := StringReplace(CheckStr,'~',' ',[rfReplaceAll, rfIgnoreCase]);
  CheckStr := StringReplace(CheckStr,'+',' ',[rfReplaceAll, rfIgnoreCase]);
  CheckStr := StringReplace(CheckStr,'=',' ',[rfReplaceAll, rfIgnoreCase]);
  CheckStr := StringReplace(CheckStr,')',' ',[rfReplaceAll, rfIgnoreCase]);
  CheckStr := StringReplace(CheckStr,'(',' ',[rfReplaceAll, rfIgnoreCase]);
  CheckStr := StringReplace(CheckStr,'*',' ',[rfReplaceAll, rfIgnoreCase]);
  CheckStr := StringReplace(CheckStr,'&',' ',[rfReplaceAll, rfIgnoreCase]);
  CheckStr := StringReplace(CheckStr,'^',' ',[rfReplaceAll, rfIgnoreCase]);
  CheckStr := StringReplace(CheckStr,'%',' ',[rfReplaceAll, rfIgnoreCase]);
  CheckStr := StringReplace(CheckStr,'
,' ',[rfReplaceAll, rfIgnoreCase]);
  CheckStr := StringReplace(CheckStr,'#',' ',[rfReplaceAll, rfIgnoreCase]);
  CheckStr := StringReplace(CheckStr,'@',' ',[rfReplaceAll, rfIgnoreCase]);
  CheckStr := StringReplace(CheckStr,'{',' ',[rfReplaceAll, rfIgnoreCase]);
  CheckStr := StringReplace(CheckStr,'}',' ',[rfReplaceAll, rfIgnoreCase]);
  CheckStr := StringReplace(CheckStr,'"',' ',[rfReplaceAll, rfIgnoreCase]);
  CheckStr := StringReplace(CheckStr,';',' ',[rfReplaceAll, rfIgnoreCase]);
  CheckStr := StringReplace(CheckStr,',',' ',[rfReplaceAll, rfIgnoreCase]);

  // '' become - nil
  CheckStr := StringReplace(CheckStr,'''','',[rfReplaceAll, rfIgnoreCase]);

  StringToWideChar(CheckStr,FileName, MAX_PATH);
  StringToWideChar('C:\',Path, MAX_PATH);
  ReturnValue:= PathCleanupSpec(Path,Filename);

  Filename := StringReplace(Filename,'  ',' ',[rfReplaceAll, rfIgnoreCase]);
  Result := String(Filename);
end;

use this function. work fine for me
the purpose is to get back ONE level of directory name

uses shelobj...

function  CleanDirName(DirFileName : String) : String;
var
  CheckStr : String;
  Path: array [0..MAX_PATH - 1] of WideChar;
  Filename: array[0..MAX_PATH - 1] of WideChar;
  ReturnValue: integer;

begin
  //--     The following are considered invalid characters in all names.
  //--     \ / : * ? " < > |

  CheckStr := Trim(DirFileName);
  CheckStr := StringReplace(CheckStr,'/','-',[rfReplaceAll, rfIgnoreCase]);
  CheckStr := StringReplace(CheckStr,'\','-',[rfReplaceAll, rfIgnoreCase]);
  CheckStr := StringReplace(CheckStr,'.','-',[rfReplaceAll, rfIgnoreCase]);
  CheckStr := StringReplace(CheckStr,':',' ',[rfReplaceAll, rfIgnoreCase]);
  CheckStr := StringReplace(CheckStr,'?',' ',[rfReplaceAll, rfIgnoreCase]);
  CheckStr := StringReplace(CheckStr,'<',' ',[rfReplaceAll, rfIgnoreCase]);
  CheckStr := StringReplace(CheckStr,'>',' ',[rfReplaceAll, rfIgnoreCase]);
  CheckStr := StringReplace(CheckStr,'|',' ',[rfReplaceAll, rfIgnoreCase]);
  CheckStr := StringReplace(CheckStr,'!',' ',[rfReplaceAll, rfIgnoreCase]);
  CheckStr := StringReplace(CheckStr,'~',' ',[rfReplaceAll, rfIgnoreCase]);
  CheckStr := StringReplace(CheckStr,'+',' ',[rfReplaceAll, rfIgnoreCase]);
  CheckStr := StringReplace(CheckStr,'=',' ',[rfReplaceAll, rfIgnoreCase]);
  CheckStr := StringReplace(CheckStr,')',' ',[rfReplaceAll, rfIgnoreCase]);
  CheckStr := StringReplace(CheckStr,'(',' ',[rfReplaceAll, rfIgnoreCase]);
  CheckStr := StringReplace(CheckStr,'*',' ',[rfReplaceAll, rfIgnoreCase]);
  CheckStr := StringReplace(CheckStr,'&',' ',[rfReplaceAll, rfIgnoreCase]);
  CheckStr := StringReplace(CheckStr,'^',' ',[rfReplaceAll, rfIgnoreCase]);
  CheckStr := StringReplace(CheckStr,'%',' ',[rfReplaceAll, rfIgnoreCase]);
  CheckStr := StringReplace(CheckStr,'
,' ',[rfReplaceAll, rfIgnoreCase]);
  CheckStr := StringReplace(CheckStr,'#',' ',[rfReplaceAll, rfIgnoreCase]);
  CheckStr := StringReplace(CheckStr,'@',' ',[rfReplaceAll, rfIgnoreCase]);
  CheckStr := StringReplace(CheckStr,'{',' ',[rfReplaceAll, rfIgnoreCase]);
  CheckStr := StringReplace(CheckStr,'}',' ',[rfReplaceAll, rfIgnoreCase]);
  CheckStr := StringReplace(CheckStr,'"',' ',[rfReplaceAll, rfIgnoreCase]);
  CheckStr := StringReplace(CheckStr,';',' ',[rfReplaceAll, rfIgnoreCase]);
  CheckStr := StringReplace(CheckStr,',',' ',[rfReplaceAll, rfIgnoreCase]);

  // '' become - nil
  CheckStr := StringReplace(CheckStr,'''','',[rfReplaceAll, rfIgnoreCase]);

  StringToWideChar(CheckStr,FileName, MAX_PATH);
  StringToWideChar('C:\',Path, MAX_PATH);
  ReturnValue:= PathCleanupSpec(Path,Filename);

  Filename := StringReplace(Filename,'  ',' ',[rfReplaceAll, rfIgnoreCase]);
  Result := String(Filename);
end;
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文