有没有一种方便的方法将文件 uri 映射到 os.path?
我无法控制的子系统坚持以 uri 的形式提供文件系统路径。是否有一个 python 模块/函数可以以独立于平台的方式将此路径转换为文件系统所需的适当形式?
A subsystem which I have no control over insists on providing filesystem paths in the form of a uri. Is there a python module/function which can convert this path into the appropriate form expected by the filesystem in a platform independent manner?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
@Jakob Bowyer 的解决方案不会将 URL 编码字符 转换为常规 UTF-8 字符。为此,您需要使用
urllib.parse.unquote< /代码>
。
The solution from @Jakob Bowyer doesn't convert URL encoded characters to regular UTF-8 characters. For that you need to use
urllib.parse.unquote
.使用
urllib.parse.urlparse
从 URI 获取路径:Use
urllib.parse.urlparse
to get the path from the URI:到目前为止,在所有答案中,我发现没有一个能够捕获边缘情况、不需要分支、2/3 兼容、和跨平台。
简而言之,仅使用内置函数就可以完成这项工作:(
我发现)棘手的一点是在 Windows 中使用指定主机的路径工作时。这在 Windows 之外不是问题:*NIX 中的网络位置只能在安装到文件系统根目录后通过路径访问。
来自维基百科:
文件 URI 采用
file://host/path
的形式,其中 host 是可访问路径的系统的完全限定域名 [...]。如果省略主机,则将其视为“localhost”。考虑到这一点,我制定了一条规则,即在将路径传递给 os.path.abspath 之前,始终使用
urlparse
提供的netloc
为路径添加前缀。 code>,这是必要的,因为它删除了任何产生的冗余斜杠(os.path.normpath
,它也声称修复了斜杠,可以稍微超过- 热衷于 Windows,因此使用abspath
)。转换中的另一个关键部分是使用
unquote
转义/解码 URL 百分比编码,否则您的文件系统将无法理解该编码。同样,这在 Windows 上可能是一个更大的问题,它允许在路径中使用诸如$
和 空格 之类的内容,这些内容将在文件 URI 中进行编码。对于演示:
结果 (WINDOWS):
结果 (*NIX):
Of all the answers so far, I found none that catch edge cases, doesn't require branching, are both 2/3 compatible, and cross-platform.
In short, this does the job, using only builtins:
The tricky bit (I found) was when working in Windows with paths specifying a host. This is a non-issue outside of Windows: network locations in *NIX can only be reached via paths after being mounted to the root of the filesystem.
From Wikipedia:
A file URI takes the form of
file://host/path
, where host is the fully qualified domain name of the system on which the path is accessible [...]. If host is omitted, it is taken to be "localhost".With that in mind, I make it a rule to ALWAYS prefix the path with the
netloc
provided byurlparse
, before passing it toos.path.abspath
, which is necessary as it removes any resulting redundant slashes (os.path.normpath
, which also claims to fix the slashes, can get a little over-zealous in Windows, hence the use ofabspath
).The other crucial component in the conversion is using
unquote
to escape/decode the URL percent-encoding, which your filesystem won't otherwise understand. Again, this might be a bigger issue on Windows, which allows things like$
and spaces in paths, which will have been encoded in the file URI.For a demo:
Results (WINDOWS):
Results (*NIX):
要使用 python 将文件 uri 转换为路径(具体到 3,如果有人真的想要的话,我可以制作 python 2):
使用 urllib.parse.urlparse 解析 uri
使用
urllib.parse.unquote
取消引用已解析 uri 的路径部分then ...
...如果路径是 Windows 路径并以
/
开头:去掉不带引号的路径组件的第一个字符(file:///C:/some/file.txt
的路径组件是/C:/some/file.txt
,它不被pathlib.PureWindowsPath
C:\some\file.txt >)b.否则,只需按原样使用未加引号的路径组件。
下面是执行此操作的函数:
使用示例(在 Linux 上运行):
该函数适用于 Windows 和 posix 文件 URI,并且它将处理没有权限部分的文件 URI。然而,它不会验证 URI 的权限,因此不会遵守:
IETF RFC 8089:“文件”URI 方案 / 2. 语法
该功能的验证(pytest):
此贡献已根据 Zero- 获得许可(除了可能适用的任何其他许可证之外)条款 BSD 许可证 (0BSD) 许可证
允许为任何目的使用、复制、修改和/或分发本软件
特此授予有或无费用的目的。
该软件按“原样”提供,作者不承担任何保证
关于本软件,包括所有默示保证
适销性和适用性。在任何情况下,作者均不承担任何责任
任何特殊、直接、间接或后果性损害或任何损害
无论是由于使用、数据或利润损失而导致的,无论是在
合同行为、疏忽或其他侵权行为,引起的
或与本软件的使用或性能有关。
在法律允许的范围内,Iwan Aucamp 已放弃此 stackexchange 贡献的所有版权以及相关或邻接权。本作品出版自:挪威。
To convert a file uri to a path with python (specific to 3, I can make for python 2 if someone really wants it):
Parse the uri with
urllib.parse.urlparse
Unquote the path component of the parsed uri with
urllib.parse.unquote
then ...
a. If path is a windows path and starts with
/
: strip the first character of unquoted path component (path component offile:///C:/some/file.txt
is/C:/some/file.txt
which is not interpreted to be equivalent toC:\some\file.txt
bypathlib.PureWindowsPath
)b. Otherwise just use the unquoted path component as is.
Here is a function that does this:
Usage examples (ran on linux):
This function works for windows and posix file URIs and it will handle file URIs without an authority section. It will however NOT do validation of the URI's authority so this will not be honoured:
IETF RFC 8089: The "file" URI Scheme / 2. Syntax
Validation (pytest) for the function:
This contribution is licensed (in addition to any other licenses which may apply) under the Zero-Clause BSD License (0BSD) license
Permission to use, copy, modify, and/or distribute this software for any
purpose with or without fee is hereby granted.
THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
To the extent possible under law, Iwan Aucamp has waived all copyright and related or neighboring rights to this stackexchange contribution. This work is published from: Norway.
从 python 3.13 开始,您可以使用
pathlib。 Path.from_uri()
方法,一个新的构造函数,用于从“文件”URI (file://) 创建pathlib.Path
对象。例如:
Starting from python 3.13, you can use the
pathlib.Path.from_uri()
method, a new constructor to create apathlib.Path
object from a ‘file’ URI (file://).For example:
@colton7909 的解决方案大部分是正确的,并帮助我得到了这个答案,但在 Python 3 中存在一些导入错误。我认为这是处理
'file://'
URL 的一部分,而不是简单地删除前 7 个字符。所以我觉得这是使用标准库执行此操作的最惯用的方法:此示例应生成字符串
'/home/user/some file.txt'
The solution from @colton7909 is mostly correct and helped me get to this answer, but has some import errors with Python 3. That and I think this is a better way to deal with the
'file://'
part of the URL than simply chopping off the first 7 characters. So I feel this is the most idiomatic way to do this using the standard library:This example should produce the string
'/home/user/some file.txt'