如何使 Ruby 字符串对文件系统安全?
我有用户条目作为文件名。当然这不是一个好主意,所以我想删除除 [az]
、[AZ]
、[0-9]
之外的所有内容、_
和 -
。
例如:
my§document$is°° very&interesting___thisIs%nice445.doc.pdf
应该成为
my_document_is_____very_interesting___thisIs_nice445_doc.pdf
然后理想地
my_document_is_very_interesting_thisIs_nice445_doc.pdf
有一个好的和优雅的方式来做到这一点吗?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(7)
我想提出一种与旧解决方案不同的解决方案。请注意,旧版本使用已弃用
返回
。顺便说一句,它无论如何都是特定于Rails的,并且您没有在问题中明确提及Rails(仅作为标签)。此外,现有的解决方案无法按照您的要求将.doc.pdf
编码为_doc.pdf
。当然,它不会将下划线合并为一个。这是我的解决方案:
您尚未指定有关转换的所有详细信息。因此,我做出以下假设:
A
–Z
、a
–z
、0
之外的任何字符序列>–9
和-
应折叠成单个_
(即下划线本身被视为不允许的字符,并且字符串'$%__°#'
将变为'_'
– 而不是'$%'
部分中的'___'
,'__'
和'°#'
)其中复杂的部分是我将文件名分为主要部分和扩展名。在正则表达式的帮助下,我正在搜索最后一个句点,后面跟着句点以外的其他内容,这样字符串中就不会出现与相同条件匹配的后续句点。但是,它前面必须有某个字符,以确保它不是字符串中的第一个字符。
我测试该功能的结果:
我认为这就是您所要求的。我希望这足够漂亮和优雅。
I'd like to suggest a solution that differs from the old one. Note that the old one uses the deprecated
returning
. By the way, it's anyway specific to Rails, and you didn't explicitly mention Rails in your question (only as a tag). Also, the existing solution fails to encode.doc.pdf
into_doc.pdf
, as you requested. And, of course, it doesn't collapse the underscores into one.Here's my solution:
You haven't specified all the details about the conversion. Thus, I'm making the following assumptions:
A
–Z
,a
–z
,0
–9
and-
should be collapsed into a single_
(i.e. underscore is itself regarded as a disallowed character, and the string'$%__°#'
would become'_'
– rather than'___'
from the parts'$%'
,'__'
and'°#'
)The complicated part of this is where I split the filename into the main part and extension. With the help of a regular expression, I'm searching for the last period, which is followed by something else than a period, so that there are no following periods matching the same criteria in the string. It must, however, be preceded by some character to make sure it's not the first character in the string.
My results from testing the function:
which I think is what you requested. I hope this is nice and elegant enough.
来自
From http://web.archive.org/web/20110529023841/http://devblog.muziboo.com/2008/06/17/attachment-fu-sanitize-filename-regex-and-unicode-gotcha/:
在 Rails 中,您还可以使用
ActiveStorage::Filename#sanitized
:In Rails you might also be able to use
ActiveStorage::Filename#sanitized
:如果您使用 Rails,您还可以使用 String#parameterize。这并不是专门为此目的,但您将获得令人满意的结果。
If you use Rails you can also use String#parameterize. This is not particularly intended for that, but you will obtain a satisfying result.
对于 Rails,我发现自己想要保留任何文件扩展名,但对其余字符使用
parameterize
:实现细节和想法请参阅源代码: https://github.com/rails/rails/blob/master/activesupport/ lib/active_support/inflector/transliterate.rb
For Rails I found myself wanting to keep any file extensions but using
parameterize
for the remainder of the characters:Implementation details and ideas see source: https://github.com/rails/rails/blob/master/activesupport/lib/active_support/inflector/transliterate.rb
如果您的目标只是生成一个在所有操作系统上“安全”使用的文件名(而不是删除任何和所有非 ASCII 字符),那么我会推荐 zaru 宝石。它不会执行原始问题指定的所有操作,但生成的文件名应该可以安全使用(并且仍然保持任何文件名安全的 unicode 字符不变):
If your goal is just to generate a filename that is "safe" to use on all operating systems (and not to remove any and all non-ASCII characters), then I would recommend the zaru gem. It doesn't do everything the original question specifies, but the filename produced should be safe to use (and still keep any filename-safe unicode characters untouched):
有一个库可能会有所帮助,特别是如果您有兴趣用 ASCII 替换奇怪的 Unicode 字符: unidecode< /a>.
There is a library that may be helpful, especially if you're interested in replacing weird Unicode characters with ASCII: unidecode.