Java 中的 HTTP URL 地址编码
我的 Java 独立应用程序从用户那里获取一个 URL(指向一个文件),我需要点击它并下载它。 我面临的问题是我无法正确编码 HTTP URL 地址...
示例:
URL: http://search.barnesandnoble.com/booksearch/first book.pdf
java.net.URLEncoder.encode(url.toString(), "ISO-8859-1");
返回我:
http%3A%2F%2Fsearch.barnesandnoble.com%2Fbooksearch%2Ffirst+book.pdf
但是,我想要的是
http://search.barnesandnoble.com/booksearch/first%20book.pdf
(空格替换为 %20)
我猜 URLEncoder
是不是设计来编码 HTTP URL...JavaDoc 说“用于 HTML 表单编码的实用程序类”...还有其他方法可以做到这一点吗?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(24)
java.net.URI 类可以提供帮助; 在您找到的 URL 文档中
具有多个参数的构造函数之一,例如:
(URI 的单参数构造函数不会转义非法字符)
上面的代码仅转义非法字符 - 它不会转义非 ASCII 字符(请参阅 fatih 的评论)。
toASCIIString
方法可用于获取仅包含 US-ASCII 字符的字符串:对于具有类似
http://www.google.com/ig/api?weather= 的查询的 URL圣保罗
,使用构造函数的 5 参数版本:The java.net.URI class can help; in the documentation of URL you find
Use one of the constructors with more than one argument, like:
(the single-argument constructor of URI does NOT escape illegal characters)
Only illegal characters get escaped by above code - it does NOT escape non-ASCII characters (see fatih's comment).
The
toASCIIString
method can be used to get a String only with US-ASCII characters:For an URL with a query like
http://www.google.com/ig/api?weather=São Paulo
, use the 5-parameter version of the constructor:请注意,上面的大多数答案都是不正确的。
尽管有名称,
URLEncoder
类并不是此处所需要的。 不幸的是,Sun 把这个类命名得这么烦人。URLEncoder
用于将数据作为参数传递,而不是用于对 URL 本身进行编码。换句话说,
“http://search.barnesandnoble.com/booksearch/first book.pdf”
是 URL。 例如,参数可以是“http://search.barnesandnoble.com/booksearch/first book.pdf?parameter1=this¶m2=that”
。 这些参数是您将使用URLEncoder
的参数。以下两个示例突出了两者之间的差异。
根据 HTTP 标准,以下内容会产生错误的参数。 请注意,与号 (&) 和加号 (+) 的编码不正确。
以下将产生正确的参数,并对查询进行正确编码。 请注意空格、与号和加号。
Please be warned that most of the answers above are INCORRECT.
The
URLEncoder
class, despite is name, is NOT what needs to be here. It's unfortunate that Sun named this class so annoyingly.URLEncoder
is meant for passing data as parameters, not for encoding the URL itself.In other words,
"http://search.barnesandnoble.com/booksearch/first book.pdf"
is the URL. Parameters would be, for example,"http://search.barnesandnoble.com/booksearch/first book.pdf?parameter1=this¶m2=that"
. The parameters are what you would useURLEncoder
for.The following two examples highlights the differences between the two.
The following produces the wrong parameters, according to the HTTP standard. Note the ampersand (&) and plus (+) are encoded incorrectly.
The following will produce the correct parameters, with the query properly encoded. Note the spaces, ampersands, and plus marks.
我将在这里针对 Android 用户添加一项建议。 您可以这样做,从而避免获取任何外部库。 此外,上面一些答案中建议的所有搜索/替换字符解决方案都是危险的,应该避免。
尝试一下:
您可以看到,在这个特定的 URL 中,我需要对这些空格进行编码,以便我可以将其用于请求。
这利用了 Android 类中提供的几个功能。 首先,URL 类可以将 url 分解为其适当的组件,因此您无需执行任何字符串搜索/替换工作。 其次,当您通过组件而不是从单个字符串构造 URI 时,此方法利用了正确转义组件的 URI 类功能。
这种方法的优点在于,您可以获取任何有效的 url 字符串并让它工作,而无需您自己具备任何特殊知识。
I'm going to add one suggestion here aimed at Android users. You can do this which avoids having to get any external libraries. Also, all the search/replace characters solutions suggested in some of the answers above are perilous and should be avoided.
Give this a try:
You can see that in this particular URL, I need to have those spaces encoded so that I can use it for a request.
This takes advantage of a couple features available to you in Android classes. First, the URL class can break a url into its proper components so there is no need for you to do any string search/replace work. Secondly, this approach takes advantage of the URI class feature of properly escaping components when you construct a URI via components rather than from a single string.
The beauty of this approach is that you can take any valid url string and have it work without needing any special knowledge of it yourself.
我开发的解决方案比任何其他解决方案都稳定得多:
a solution i developed and much more stable than any other:
如果您有 URL,则可以将 url.toString() 传递到此方法中。 首先解码,以避免双重编码(例如,对空格进行编码会导致%20,对百分号进行编码会导致%25,因此双重编码会将空格变成%2520)。 然后,按照上面的说明使用 URI,添加 URL 的所有部分(这样就不会删除查询参数)。
If you have a URL, you can pass url.toString() into this method. First decode, to avoid double encoding (for example, encoding a space results in %20 and encoding a percent sign results in %25, so double encoding will turn a space into %2520). Then, use the URI as explained above, adding in all the parts of the URL (so that you don't drop the query parameters).
是的,URL 编码将对该字符串进行编码,以便它可以在 url 中正确传递到最终目的地。 例如,您不能 http://stackoverflow.com?url=http://yyy.com。 对参数进行 UrlEncoding 将修复该参数值。
所以我有两个选择给你:
你可以访问与域分开的路径吗? 如果是这样,您可以简单地对路径进行 UrlEncode。 但是,如果情况并非如此,那么选项 2 可能适合您。
获取 commons-httpclient-3.1。 它有一个 URIUtil 类:
System.out.println(URIUtil.encodePath("http://example.com/x y" , "ISO-8859-1"));
这将准确输出您正在寻找的内容,因为它只会对 URI 的路径部分进行编码。
仅供参考,您需要 commons-codec 和 commons-logging 才能使此方法在运行时工作。
Yeah URL encoding is going to encode that string so that it would be passed properly in a url to a final destination. For example you could not have http://stackoverflow.com?url=http://yyy.com. UrlEncoding the parameter would fix that parameter value.
So i have two choices for you:
Do you have access to the path separate from the domain? If so you may be able to simply UrlEncode the path. However, if this is not the case then option 2 may be for you.
Get commons-httpclient-3.1. This has a class URIUtil:
System.out.println(URIUtil.encodePath("http://example.com/x y", "ISO-8859-1"));
This will output exactly what you are looking for, as it will only encode the path part of the URI.
FYI, you'll need commons-codec and commons-logging for this method to work at runtime.
如果有人不想向他们的项目添加依赖项,这些功能可能会有所帮助。
我们将 URL 的“路径”部分传递到此处。 您可能不想将完整的 URL 作为参数传递(查询字符串需要不同的转义等)。
和测试:
If anybody doesn't want to add a dependency to their project, these functions may be helpful.
We pass the 'path' part of our URL into here. You probably don't want to pass the full URL in as a parameter (query strings need different escapes, etc).
And tests:
不幸的是,
org.apache.commons.httpclient.util.URIUtil
已被弃用,并且替换org.apache.commons.codec.net.URLCodec
进行了适合表单帖子的编码,而不是在实际的 URL 中。 所以我必须编写自己的函数,它执行单个组件(不适合具有 ? 和 & 的整个查询字符串)Unfortunately,
org.apache.commons.httpclient.util.URIUtil
is deprecated, and thereplacement org.apache.commons.codec.net.URLCodec
does coding suitable for form posts, not in actual URL's. So I had to write my own function, which does a single component (not suitable for entire query strings that have ?'s and &'s)正如您不幸发现的那样,URLEncoding 可以很好地对 HTTP URL 进行编码。 您传入的字符串“http://search.barnesandnoble.com/booksearch/first book .pdf”,已正确且完整地编码为 URL 编码形式。 您可以将返回的整个长字符串作为 URL 中的参数传递,并且可以将其解码回您传入的字符串。
听起来您想做的事情与传递整个 URL 略有不同作为参数。 据我所知,您正在尝试创建一个类似于“http://search.barnesandnoble”的搜索网址.com/booksearch/whateverTheUserPassesIn”。 您唯一需要编码的是“whateverTheUserPassesIn”位,因此也许您需要做的就是这样:
这应该会产生对您来说更有效的东西。
URLEncoding can encode HTTP URLs just fine, as you've unfortunately discovered. The string you passed in, "http://search.barnesandnoble.com/booksearch/first book.pdf", was correctly and completely encoded into a URL-encoded form. You could pass that entire long string of gobbledigook that you got back as a parameter in a URL, and it could be decoded back into exactly the string you passed in.
It sounds like you want to do something a little different than passing the entire URL as a parameter. From what I gather, you're trying to create a search URL that looks like "http://search.barnesandnoble.com/booksearch/whateverTheUserPassesIn". The only thing that you need to encode is the "whateverTheUserPassesIn" bit, so perhaps all you need to do is something like this:
That should produce something rather more valid for you.
如果您的 URL 中包含编码的“/”(%2F),则仍然存在问题。
RFC 3986 - 第 2.2 节规定:“如果 URI 组件的数据与保留字符作为分隔符的用途发生冲突,则必须在形成 URI 之前对冲突数据进行百分比编码。” (RFC 3986 - 第 2.2 节)
但是 Tomcat 存在一个问题:
因此,如果您的 URL 包含 %2F 字符,Tomcat 将返回:“400 Invalid URI: noSlash”
您可以在 Tomcat 启动脚本中切换错误修复:
There is still a problem if you have got an encoded "/" (%2F) in your URL.
RFC 3986 - Section 2.2 says: "If data for a URI component would conflict with a reserved character's purpose as a delimiter, then the conflicting data must be percent-encoded before the URI is formed." (RFC 3986 - Section 2.2)
But there is an Issue with Tomcat:
So if you have got an URL with the %2F character, Tomcat returns: "400 Invalid URI: noSlash"
You can switch of the bugfix in the Tomcat startup script:
我阅读了以前的答案来编写自己的方法,因为使用以前的答案的解决方案无法让某些东西正常工作,它看起来对我来说很好,但如果您找到不适用于此的 URL,请告诉我。
I read the previous answers to write my own method because I could not have something properly working using the solution of the previous answers, it looks good for me but if you can find URL that does not work with this, please let me know.
也许可以尝试 UriUtils 在 org.springframework.web.util 中
Maybe can try UriUtils in org.springframework.web.util
您还可以使用
GUAVA
和路径转义器:UrlEscapers.urlFragmentEscaper().escape(relativePath)
You can also use
GUAVA
and path escaper:UrlEscapers.urlFragmentEscaper().escape(relativePath)
我同意马特的观点。 事实上,我从未在教程中看到它得到很好的解释,但一件事是如何对 URL 路径进行编码,另一件事是如何对附加到 URL 的参数进行编码(查询部分,在“?”后面) “ 象征)。 它们使用相似的编码,但不相同。
专门用于空白字符的编码。 URL 路径需要将其编码为 %20,而查询部分允许 %20 以及“+”号。 最好的想法是使用 Web 浏览器针对我们的 Web 服务器进行测试。
对于这两种情况,我总是会编码COMPONENT BY COMPONENT,而不是整个字符串。 事实上,URLEncoder 允许在查询部分这样做。 对于路径部分,您可以使用类 URI,尽管在这种情况下它要求整个字符串,而不是单个组件。
无论如何,我相信避免这些问题的最好方法是使用个人的非冲突设计。如何? 例如,我绝不会使用 aZ、AZ、0-9 和 _ 以外的其他字符来命名目录或参数。 这样,唯一需要的是对每个参数的值进行编码,因为它可能来自用户输入并且使用的字符是未知的。
I agree with Matt. Indeed, I've never seen it well explained in tutorials, but one matter is how to encode the URL path, and a very different one is how to encode the parameters which are appended to the URL (the query part, behind the "?" symbol). They use similar encoding, but not the same.
Specially for the encoding of the white space character. The URL path needs it to be encoded as %20, whereas the query part allows %20 and also the "+" sign. The best idea is to test it by ourselves against our Web server, using a Web browser.
For both cases, I ALWAYS would encode COMPONENT BY COMPONENT, never the whole string. Indeed URLEncoder allows that for the query part. For the path part you can use the class URI, although in this case it asks for the entire string, not a single component.
Anyway, I believe that the best way to avoid these problems is to use a personal non-conflictive design. How? For example, I never would name directories or parameters using other characters than a-Z, A-Z, 0-9 and _ . That way, the only need is to encode the value of every parameter, since it may come from an user input and the used characters are unknown.
我把上面的内容做了一些修改。 我首先喜欢正逻辑,并且我认为 HashSet 可能比其他一些选项(例如搜索字符串)提供更好的性能。 虽然,我不确定自动装箱的惩罚是否值得,但如果编译器针对 ASCII 字符进行优化,那么装箱的成本将会很低。
I took the content above and changed it around a bit. I like positive logic first, and I thought a HashSet might give better performance than some other options, like searching through a String. Although, I'm not sure if the autoboxing penalty is worth it, but if the compiler optimizes for ASCII chars, then the cost of boxing will be low.
除了 Carlos Heuberger 的回复之外:
如果需要与默认值 (80) 不同的值,则应使用 7 参数构造函数:
In addition to the Carlos Heuberger's reply:
if a different than the default (80) is needed, the 7 param constructor should be used:
使用以下标准 Java 解决方案(通过 Web 平台测试):
0. 测试 URL 是否已编码< /a>.
1. 将 URL 拆分为结构部分。 使用 java.net.URL 来实现它。
2.对每个结构部分进行正确编码!
3.使用
IDN.toASCII(putDomainNameHere)
到 Punycode 对主机名进行编码!4. 使用
java.net.URI.toASCIIString()
进行百分比编码、NFC 编码的 unicode -(最好是 NFKC!)。在此处查找更多信息:https://stackoverflow.com/a/49796882/1485527
Use the following standard Java solution (passes around 100 of the testcases provided by Web Plattform Tests):
0. Test if URL is already encoded.
1. Split URL into structural parts. Use
java.net.URL
for it.2. Encode each structural part properly!
3. Use
IDN.toASCII(putDomainNameHere)
to Punycode encode the host name!4. Use
java.net.URI.toASCIIString()
to percent-encode, NFC encoded unicode - (better would be NFKC!).Find more here: https://stackoverflow.com/a/49796882/1485527
如果你正在使用spring,你可以尝试一下
org.springframework.web.util.UriUtils#encodePath
If you are using spring, you can try
org.springframework.web.util.UriUtils#encodePath
我创建了一个新项目来帮助构建 HTTP URL。 该库将自动对路径段和查询参数进行 URL 编码。
https://github.com/Widen/urlbuilder 查看源代码并下载二进制文件
您可以在 此问题中的 URL:
产生
http://search.barnesandnoble.com/booksearch/first%20book.pdf
I've created a new project to help construct HTTP URLs. The library will automatically URL encode path segments and query parameters.
You can view the source and download a binary at https://github.com/Widen/urlbuilder
The example URL in this question:
produces
http://search.barnesandnoble.com/booksearch/first%20book.pdf
我有同样的问题。 通过 unsing 解决了这个问题:
它对字符串进行编码,但跳过“:”和“/”。
I had the same problem. Solved this by unsing:
It encodes the string but skips ":" and "/".
我开发了一个用于此目的的库:galimatias。 它以与 Web 浏览器相同的方式解析 URL。 也就是说,如果 URL 在浏览器中有效,galimatias 将正确解析该 URL。
在这种情况下:
将为您提供:
http://search.barnesandnoble.com/booksearch/first%20book.pdf
。 当然,这是最简单的情况,但它适用于任何情况,远远超出java.net.URI
的范围。您可以在以下位置查看:https://github.com/smola/galimatias
I develop a library that serves this purpose: galimatias. It parses URL the same way web browsers do. That is, if a URL works in a browser, it will be correctly parsed by galimatias.
In this case:
Will give you:
http://search.barnesandnoble.com/booksearch/first%20book.pdf
. Of course this is the simplest case, but it'll work with anything, way beyondjava.net.URI
.You can check it out at: https://github.com/smola/galimatias
我用这个
添加这个依赖
i use this
add this dependecy
您可以使用这样的函数。 根据您的需要完成并修改它:
使用示例:
结果为: http://www.growup.com/folder/int%C3%A9rieur-%C3%A0_vendre?o=4
You can use a function like this. Complete and modify it to your need :
Example of use :
The result is : http://www.growup.com/folder/int%C3%A9rieur-%C3%A0_vendre?o=4
怎么样:
public String UrlEncode(String in_) {
}
How about:
public String UrlEncode(String in_) {
}