规范化 (webdav) unicode 路径

发布于 2024-08-26 06:59:58 字数 671 浏览 9 评论 0原文

我正在开发PHP 的 WebDAV 实现。为了使 Windows 和其他操作系统更容易协同工作,我需要跳过一些字符编码环节。

Windows 在其 HTTP 请求中使用 ISO-8859-1,而大多数其他客户端将除 ascii 之外的任何内容编码为 UTF-8。

我的第一个方法是完全忽略这一点,但在返回网址时我很快就遇到了问题。然后我认为最好标准化所有网址。

以 ü 为例。这将由 OS/X 通过网络发送,因为

u%CC%88 (this is codepoint U+0308)

Windows 将其发送为:

%FC (latin1)

但是,在 %FC 上执行 utf8_encode,我得到:

%C3%BC (this is codepoint U+00FC)

我应该将 %C3%BC 和 u%CC%88 视为同一件事吗?如果是这样..怎么办?不碰它似乎对 Windows 来说工作正常。它以某种方式理解它是一个 unicode 字符,但更新同一文件会引发错误(没有特殊原因)。

我很乐意提供更多信息。

I'm working on a WebDAV implementation for PHP. In order to make it easier for Windows and other operating systems to work together, I need jump through some character encoding hoops.

Windows uses ISO-8859-1 in it's HTTP request, while most other clients encode anything beyond ascii as UTF-8.

My first approach was to ignore this altogether, but I quickly ran into issues when returning urls. I then figured it's probably best to normalize all urls.

Using ü as an example. This will get sent over the wire by OS/X as

u%CC%88 (this is codepoint U+0308)

Windows sents this as:

%FC (latin1)

But, doing a utf8_encode on %FC, I get :

%C3%BC (this is codepoint U+00FC)

Should I treat %C3%BC and u%CC%88 as the same thing? If so.. how? Not touching it seems to work OK for windows. It somehow understands that it's a unicode character, but updating the same file throws an error (for no particular reason).

I'd be happy to provide more information.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

蹲在坟头点根烟 2024-09-02 06:59:58

Mac 将 unicode 字符存储为“分解”,即“u”+ ¡(分音符)而不是“ü”。 Normalizer 可以解决这个问题。如果您没有 Normalizer,请尝试 iconv('UTF8-MAC', 'UTF8', $str)

Mac stores unicode chars as "decomposed", that is, "u" + ¨ (diaresis) instead of "ü". Normalizer can take care of that. If you don't have Normalizer, try iconv('UTF8-MAC', 'UTF8', $str)

浅浅淡淡 2024-09-02 06:59:58

我讨厌回答自己的问题,但就这样吧。

我最终没有打扰。对各种操作系统如何编码和处理编码进行了广泛的研究。事实证明,在大多数情况下,其他操作系统可以使用其他标准化形式来处理路径。虽然 Windows 工作起来有点糟糕,但它确实有效。

每当我收到实际上完全非 utf8 的路径时,我都会尝试检测编码并将其转换为 UTF-8。

I hate answering my own questions, but here goes.

I ended up not bothering. Did extensive research on how various operating systems encode, and handle encodings. Turns out that in most cases other os's handle paths using other normalization forms alright. Windows worked a bit shitty though, but it works.

Whenever I receive a path that's actually non-utf8 altogether, I try to detect the encoding and convert it to UTF-8.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文