规范化 (webdav) unicode 路径

发布于 2024-08-26 06:59:58 字数 671 浏览 9 评论 0原文

我正在开发PHP 的 WebDAV 实现。为了使 Windows 和其他操作系统更容易协同工作，我需要跳过一些字符编码环节。

Windows 在其 HTTP 请求中使用 ISO-8859-1，而大多数其他客户端将除 ascii 之外的任何内容编码为 UTF-8。

我的第一个方法是完全忽略这一点，但在返回网址时我很快就遇到了问题。然后我认为最好标准化所有网址。

以 ü 为例。这将由 OS/X 通过网络发送，因为

u%CC%88 (this is codepoint U+0308)

Windows 将其发送为：

%FC (latin1)

但是，在 %FC 上执行 utf8_encode，我得到：

%C3%BC (this is codepoint U+00FC)

我应该将 %C3%BC 和 u%CC%88 视为同一件事吗？如果是这样..怎么办？不碰它似乎对 Windows 来说工作正常。它以某种方式理解它是一个 unicode 字符，但更新同一文件会引发错误（没有特殊原因）。

我很乐意提供更多信息。

原文

I'm working on a WebDAV implementation for PHP. In order to make it easier for Windows and other operating systems to work together, I need jump through some character encoding hoops.

Windows uses ISO-8859-1 in it's HTTP request, while most other clients encode anything beyond ascii as UTF-8.

My first approach was to ignore this altogether, but I quickly ran into issues when returning urls. I then figured it's probably best to normalize all urls.

Using ü as an example. This will get sent over the wire by OS/X as

u%CC%88 (this is codepoint U+0308)

Windows sents this as:

%FC (latin1)

But, doing a utf8_encode on %FC, I get :

%C3%BC (this is codepoint U+00FC)

Should I treat %C3%BC and u%CC%88 as the same thing? If so.. how? Not touching it seems to work OK for windows. It somehow understands that it's a unicode character, but updating the same file throws an error (for no particular reason).

I'd be happy to provide more information.

分享到QQ

分享到微博