在 Java 中构造 URI 时保留转义字符
对于任何 URI u ... 并且不编码除必须引用的字符之外的字符,以下标识也保留...
但是确实编码不需要的字符的 URI 又如何呢?被引用?
URI test1 = new URI("http://foo.bar.baz/%E2%82%AC123");
URI test2 = new URI(test1.getScheme(), test1.getUserInfo(), test1.getHost(), test1.getPort(), test1.getPath(), test1.getQuery(), test1.getFragment());
assert test1.equals(test2); // blows up
这会失败,因为 test2
的结果是 http://foo.bar.baz/€123
——转义字符未转义。
那么,我的问题是:如何从其组件中构造一个等于 test1
的 URI(保留转义字符)?使用 getRawPath()
而不是 getPath()
是没有好处的,因为这样转义字符本身就会被转义,最终会得到 http://foo。 bar.baz/%25E2%2582%25AC123
。
附加说明:
- 不要问为什么我需要保留理论上不需要转义的转义字符——相信我,你不会想知道的。
- 实际上,我不想保留所有原始 URL,只是保留其中的大部分 - 可能会替换主机、端口、协议,甚至路径的一部分,因此
new URI(test1.toString())< /code> 不是答案。也许答案是用字符串做所有事情,并在我自己的代码中复制 URI 类解析和构造 URI 的能力,但这似乎很愚蠢。
更新添加:
请注意,查询参数等也存在同样的问题——不仅仅是路径。
The documentation for java.net.URI
specifies that
For any URI u that ... and that does not encode characters except those that must be quoted, the following identities also hold...
But what about URIs that do encode characters that don't need to be quoted?
URI test1 = new URI("http://foo.bar.baz/%E2%82%AC123");
URI test2 = new URI(test1.getScheme(), test1.getUserInfo(), test1.getHost(), test1.getPort(), test1.getPath(), test1.getQuery(), test1.getFragment());
assert test1.equals(test2); // blows up
This fails, because what test2
comes out as, is http://foo.bar.baz/€123
-- with the escaped characters un-escaped.
My question, then, is: how can I construct a URI equal to test1
-- preserving the escaped characters -- out of its components? It's no good using getRawPath()
instead of getPath()
, because then the escaping characters themselves get escaped, and you end up with http://foo.bar.baz/%25E2%2582%25AC123
.
Additional notes:
- Don't ask why I need to preserve escaped characters that in theory don't need to be escaped -- trust me, you don't want to know.
- In reality I don't want to preserve all of the original URL, just most of it -- possibly replacing the host, port, protocol, even parts of the path, so
new URI(test1.toString())
is not the answer. Maybe the answer is to do everything with strings and replicate the URI class's ability to parse and construct URIs in my own code, but that seems daft.
Updated to add:
Note that the same issue exists with query parameters etc. -- it's not just the path.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我认为这个技巧对你有用:
}
我使用 toASCIIString() 进行了额外的步骤
I think this hack will work for you:
}
I use an additional step using toASCIIString()