Java 中的 HTTP URL 地址编码

发布于 2024-07-17 05:14:07 字数 583 浏览 4 评论 0 原文

我的 Java 独立应用程序从用户那里获取一个 URL(指向一个文件),我需要点击它并下载它。 我面临的问题是我无法正确编码 HTTP URL 地址...

示例:

URL:  http://search.barnesandnoble.com/booksearch/first book.pdf

java.net.URLEncoder.encode(url.toString(), "ISO-8859-1");

返回我:

http%3A%2F%2Fsearch.barnesandnoble.com%2Fbooksearch%2Ffirst+book.pdf

但是,我想要的是

http://search.barnesandnoble.com/booksearch/first%20book.pdf

(空格替换为 %20)

我猜 URLEncoder 是不是设计来编码 HTTP URL...JavaDoc 说“用于 HTML 表单编码的实用程序类”...还有其他方法可以做到这一点吗?

My Java standalone application gets a URL (which points to a file) from the user and I need to hit it and download it. The problem I am facing is that I am not able to encode the HTTP URL address properly...

Example:

URL:  http://search.barnesandnoble.com/booksearch/first book.pdf

java.net.URLEncoder.encode(url.toString(), "ISO-8859-1");

returns me:

http%3A%2F%2Fsearch.barnesandnoble.com%2Fbooksearch%2Ffirst+book.pdf

But, what I want is

http://search.barnesandnoble.com/booksearch/first%20book.pdf

(space replaced by %20)

I guess URLEncoder is not designed to encode HTTP URLs... The JavaDoc says "Utility class for HTML form encoding"... Is there any other way to do this?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(24

疯狂的代价 2024-07-24 05:14:07

java.net.URI 类可以提供帮助; 在您找到的 URL 文档中

请注意,URI 类在某些情况下确实会执行其组件字段的转义。 管理 URL 编码和解码的推荐方法是使用 URI

具有多个参数的构造函数之一,例如:

URI uri = new URI(
    "http", 
    "search.barnesandnoble.com", 
    "/booksearch/first book.pdf",
    null);
URL url = uri.toURL();
//or String request = uri.toString();

(URI 的单参数构造函数不会转义非法字符)


上面的代码仅转义非法字符 - 它不会转义非 ASCII 字符(请参阅 fatih 的评论)。
toASCIIString 方法可用于获取仅包含 US-ASCII 字符的字符串:

URI uri = new URI(
    "http", 
    "search.barnesandnoble.com", 
    "/booksearch/é",
    null);
String request = uri.toASCIIString();

对于具有类似 http://www.google.com/ig/api?weather= 的查询的 URL圣保罗,使用构造函数的 5 参数版本:

URI uri = new URI(
        "http", 
        "www.google.com", 
        "/ig/api",
        "weather=São Paulo",
        null);
String request = uri.toASCIIString();

The java.net.URI class can help; in the documentation of URL you find

Note, the URI class does perform escaping of its component fields in certain circumstances. The recommended way to manage the encoding and decoding of URLs is to use an URI

Use one of the constructors with more than one argument, like:

URI uri = new URI(
    "http", 
    "search.barnesandnoble.com", 
    "/booksearch/first book.pdf",
    null);
URL url = uri.toURL();
//or String request = uri.toString();

(the single-argument constructor of URI does NOT escape illegal characters)


Only illegal characters get escaped by above code - it does NOT escape non-ASCII characters (see fatih's comment).
The toASCIIString method can be used to get a String only with US-ASCII characters:

URI uri = new URI(
    "http", 
    "search.barnesandnoble.com", 
    "/booksearch/é",
    null);
String request = uri.toASCIIString();

For an URL with a query like http://www.google.com/ig/api?weather=São Paulo, use the 5-parameter version of the constructor:

URI uri = new URI(
        "http", 
        "www.google.com", 
        "/ig/api",
        "weather=São Paulo",
        null);
String request = uri.toASCIIString();
如果没有你 2024-07-24 05:14:07

请注意,上面的大多数答案都是不正确的。

尽管有名称,URLEncoder 类并不是此处所需要的。 不幸的是,Sun 把这个类命名得这么烦人。 URLEncoder 用于将数据作为参数传递,而不是用于对 URL 本身进行编码。

换句话说,“http://search.barnesandnoble.com/booksearch/first book.pdf” 是 URL。 例如,参数可以是“http://search.barnesandnoble.com/booksearch/first book.pdf?parameter1=this¶m2=that”。 这些参数是您将使用 URLEncoder 的参数。

以下两个示例突出了两者之间的差异。

根据 HTTP 标准,以下内容会产生错误的参数。 请注意,与号 (&) 和加号 (+) 的编码不正确。

uri = new URI("http", null, "www.google.com", 80, 
"/help/me/book name+me/", "MY CRZY QUERY! +&+ :)", null);

// URI: http://www.google.com:80/help/me/book%20name+me/?MY%20CRZY%20QUERY!%20+&+%20:)

以下将产生正确的参数,并对查询进行正确编码。 请注意空格、与号和加号。

uri = new URI("http", null, "www.google.com", 80, "/help/me/book name+me/", URLEncoder.encode("MY CRZY QUERY! +&+ :)", "UTF-8"), null);

// URI: http://www.google.com:80/help/me/book%20name+me/?MY+CRZY+QUERY%2521+%252B%2526%252B+%253A%2529

Please be warned that most of the answers above are INCORRECT.

The URLEncoder class, despite is name, is NOT what needs to be here. It's unfortunate that Sun named this class so annoyingly. URLEncoder is meant for passing data as parameters, not for encoding the URL itself.

In other words, "http://search.barnesandnoble.com/booksearch/first book.pdf" is the URL. Parameters would be, for example, "http://search.barnesandnoble.com/booksearch/first book.pdf?parameter1=this¶m2=that". The parameters are what you would use URLEncoder for.

The following two examples highlights the differences between the two.

The following produces the wrong parameters, according to the HTTP standard. Note the ampersand (&) and plus (+) are encoded incorrectly.

uri = new URI("http", null, "www.google.com", 80, 
"/help/me/book name+me/", "MY CRZY QUERY! +&+ :)", null);

// URI: http://www.google.com:80/help/me/book%20name+me/?MY%20CRZY%20QUERY!%20+&+%20:)

The following will produce the correct parameters, with the query properly encoded. Note the spaces, ampersands, and plus marks.

uri = new URI("http", null, "www.google.com", 80, "/help/me/book name+me/", URLEncoder.encode("MY CRZY QUERY! +&+ :)", "UTF-8"), null);

// URI: http://www.google.com:80/help/me/book%20name+me/?MY+CRZY+QUERY%2521+%252B%2526%252B+%253A%2529
雨落星ぅ辰 2024-07-24 05:14:07

我将在这里针对 Android 用户添加一项建议。 您可以这样做,从而避免获取任何外部库。 此外,上面一些答案中建议的所有搜索/替换字符解决方案都是危险的,应该避免。

尝试一下:

String urlStr = "http://abc.dev.domain.com/0007AC/ads/800x480 15sec h.264.mp4";
URL url = new URL(urlStr);
URI uri = new URI(url.getProtocol(), url.getUserInfo(), url.getHost(), url.getPort(), url.getPath(), url.getQuery(), url.getRef());
url = uri.toURL();

您可以看到,在这个特定的 URL 中,我需要对这些空格进行编码,以便我可以将其用于请求。

这利用了 Android 类中提供的几个功能。 首先,URL 类可以将 url 分解为其适当的组件,因此您无需执行任何字符串搜索/替换工作。 其次,当您通过组件而不是从单个字符串构造 URI 时,此方法利用了正确转义组件的 URI 类功能。

这种方法的优点在于,您可以获取任何有效的 url 字符串并让它工作,而无需您自己具备任何特殊知识。

I'm going to add one suggestion here aimed at Android users. You can do this which avoids having to get any external libraries. Also, all the search/replace characters solutions suggested in some of the answers above are perilous and should be avoided.

Give this a try:

String urlStr = "http://abc.dev.domain.com/0007AC/ads/800x480 15sec h.264.mp4";
URL url = new URL(urlStr);
URI uri = new URI(url.getProtocol(), url.getUserInfo(), url.getHost(), url.getPort(), url.getPath(), url.getQuery(), url.getRef());
url = uri.toURL();

You can see that in this particular URL, I need to have those spaces encoded so that I can use it for a request.

This takes advantage of a couple features available to you in Android classes. First, the URL class can break a url into its proper components so there is no need for you to do any string search/replace work. Secondly, this approach takes advantage of the URI class feature of properly escaping components when you construct a URI via components rather than from a single string.

The beauty of this approach is that you can take any valid url string and have it work without needing any special knowledge of it yourself.

不羁少年 2024-07-24 05:14:07

我开发的解决方案比任何其他解决方案都稳定得多:

public class URLParamEncoder {

    public static String encode(String input) {
        StringBuilder resultStr = new StringBuilder();
        for (char ch : input.toCharArray()) {
            if (isUnsafe(ch)) {
                resultStr.append('%');
                resultStr.append(toHex(ch / 16));
                resultStr.append(toHex(ch % 16));
            } else {
                resultStr.append(ch);
            }
        }
        return resultStr.toString();
    }

    private static char toHex(int ch) {
        return (char) (ch < 10 ? '0' + ch : 'A' + ch - 10);
    }

    private static boolean isUnsafe(char ch) {
        if (ch > 128 || ch < 0)
            return true;
        return " %
amp;+,/:;=?@<>#%".indexOf(ch) >= 0;
    }

}

a solution i developed and much more stable than any other:

public class URLParamEncoder {

    public static String encode(String input) {
        StringBuilder resultStr = new StringBuilder();
        for (char ch : input.toCharArray()) {
            if (isUnsafe(ch)) {
                resultStr.append('%');
                resultStr.append(toHex(ch / 16));
                resultStr.append(toHex(ch % 16));
            } else {
                resultStr.append(ch);
            }
        }
        return resultStr.toString();
    }

    private static char toHex(int ch) {
        return (char) (ch < 10 ? '0' + ch : 'A' + ch - 10);
    }

    private static boolean isUnsafe(char ch) {
        if (ch > 128 || ch < 0)
            return true;
        return " %
amp;+,/:;=?@<>#%".indexOf(ch) >= 0;
    }

}
浮光之海 2024-07-24 05:14:07

如果您有 URL,则可以将 url.toString() 传递到此方法中。 首先解码,以避免双重编码(例如,对空格进行编码会导致%20,对百分号进行编码会导致%25,因此双重编码会将空格变成%2520)。 然后,按照上面的说明使用 URI,添加 URL 的所有部分(这样就不会删除查询参数)。

public URL convertToURLEscapingIllegalCharacters(String string){
    try {
        String decodedURL = URLDecoder.decode(string, "UTF-8");
        URL url = new URL(decodedURL);
        URI uri = new URI(url.getProtocol(), url.getUserInfo(), url.getHost(), url.getPort(), url.getPath(), url.getQuery(), url.getRef()); 
        return uri.toURL(); 
    } catch (Exception ex) {
        ex.printStackTrace();
        return null;
    }
}

If you have a URL, you can pass url.toString() into this method. First decode, to avoid double encoding (for example, encoding a space results in %20 and encoding a percent sign results in %25, so double encoding will turn a space into %2520). Then, use the URI as explained above, adding in all the parts of the URL (so that you don't drop the query parameters).

public URL convertToURLEscapingIllegalCharacters(String string){
    try {
        String decodedURL = URLDecoder.decode(string, "UTF-8");
        URL url = new URL(decodedURL);
        URI uri = new URI(url.getProtocol(), url.getUserInfo(), url.getHost(), url.getPort(), url.getPath(), url.getQuery(), url.getRef()); 
        return uri.toURL(); 
    } catch (Exception ex) {
        ex.printStackTrace();
        return null;
    }
}
眼波传意 2024-07-24 05:14:07

是的,URL 编码将对该字符串进行编码,以便它可以在 url 中正确传递到最终目的地。 例如,您不能 http://stackoverflow.com?url=http://yyy.com。 对参数进行 UrlEncoding 将修复该参数值。

所以我有两个选择给你:

  1. 你可以访问与域分开的路径吗? 如果是这样,您可以简单地对路径进行 UrlEncode。 但是,如果情况并非如此,那么选项 2 可能适合您。

  2. 获取 commons-httpclient-3.1。 它有一个 URIUtil 类:

    System.out.println(URIUtil.encodePath("http://example.com/x y" , "ISO-8859-1"));

这将准确输出您正在寻找的内容,因为它只会对 URI 的路径部分进行编码。

仅供参考,您需要 commons-codec 和 commons-logging 才能使此方法在运行时工作。

Yeah URL encoding is going to encode that string so that it would be passed properly in a url to a final destination. For example you could not have http://stackoverflow.com?url=http://yyy.com. UrlEncoding the parameter would fix that parameter value.

So i have two choices for you:

  1. Do you have access to the path separate from the domain? If so you may be able to simply UrlEncode the path. However, if this is not the case then option 2 may be for you.

  2. Get commons-httpclient-3.1. This has a class URIUtil:

    System.out.println(URIUtil.encodePath("http://example.com/x y", "ISO-8859-1"));

This will output exactly what you are looking for, as it will only encode the path part of the URI.

FYI, you'll need commons-codec and commons-logging for this method to work at runtime.

抠脚大汉 2024-07-24 05:14:07

如果有人不想向他们的项目添加依赖项,这些功能可能会有所帮助。

我们将 URL 的“路径”部分传递到此处。 您可能不想将完整的 URL 作为参数传递(查询字符串需要不同的转义等)。

/**
 * Percent-encodes a string so it's suitable for use in a URL Path (not a query string / form encode, which uses + for spaces, etc)
 */
public static String percentEncode(String encodeMe) {
    if (encodeMe == null) {
        return "";
    }
    String encoded = encodeMe.replace("%", "%25");
    encoded = encoded.replace(" ", "%20");
    encoded = encoded.replace("!", "%21");
    encoded = encoded.replace("#", "%23");
    encoded = encoded.replace("$", "%24");
    encoded = encoded.replace("&", "%26");
    encoded = encoded.replace("'", "%27");
    encoded = encoded.replace("(", "%28");
    encoded = encoded.replace(")", "%29");
    encoded = encoded.replace("*", "%2A");
    encoded = encoded.replace("+", "%2B");
    encoded = encoded.replace(",", "%2C");
    encoded = encoded.replace("/", "%2F");
    encoded = encoded.replace(":", "%3A");
    encoded = encoded.replace(";", "%3B");
    encoded = encoded.replace("=", "%3D");
    encoded = encoded.replace("?", "%3F");
    encoded = encoded.replace("@", "%40");
    encoded = encoded.replace("[", "%5B");
    encoded = encoded.replace("]", "%5D");
    return encoded;
}

/**
 * Percent-decodes a string, such as used in a URL Path (not a query string / form encode, which uses + for spaces, etc)
 */
public static String percentDecode(String encodeMe) {
    if (encodeMe == null) {
        return "";
    }
    String decoded = encodeMe.replace("%21", "!");
    decoded = decoded.replace("%20", " ");
    decoded = decoded.replace("%23", "#");
    decoded = decoded.replace("%24", "$");
    decoded = decoded.replace("%26", "&");
    decoded = decoded.replace("%27", "'");
    decoded = decoded.replace("%28", "(");
    decoded = decoded.replace("%29", ")");
    decoded = decoded.replace("%2A", "*");
    decoded = decoded.replace("%2B", "+");
    decoded = decoded.replace("%2C", ",");
    decoded = decoded.replace("%2F", "/");
    decoded = decoded.replace("%3A", ":");
    decoded = decoded.replace("%3B", ";");
    decoded = decoded.replace("%3D", "=");
    decoded = decoded.replace("%3F", "?");
    decoded = decoded.replace("%40", "@");
    decoded = decoded.replace("%5B", "[");
    decoded = decoded.replace("%5D", "]");
    decoded = decoded.replace("%25", "%");
    return decoded;
}

和测试:

@Test
public void testPercentEncode_Decode() {
    assertEquals("", percentDecode(percentEncode(null)));
    assertEquals("", percentDecode(percentEncode("")));

    assertEquals("!", percentDecode(percentEncode("!")));
    assertEquals("#", percentDecode(percentEncode("#")));
    assertEquals("$", percentDecode(percentEncode("$")));
    assertEquals("@", percentDecode(percentEncode("@")));
    assertEquals("&", percentDecode(percentEncode("&")));
    assertEquals("'", percentDecode(percentEncode("'")));
    assertEquals("(", percentDecode(percentEncode("(")));
    assertEquals(")", percentDecode(percentEncode(")")));
    assertEquals("*", percentDecode(percentEncode("*")));
    assertEquals("+", percentDecode(percentEncode("+")));
    assertEquals(",", percentDecode(percentEncode(",")));
    assertEquals("/", percentDecode(percentEncode("/")));
    assertEquals(":", percentDecode(percentEncode(":")));
    assertEquals(";", percentDecode(percentEncode(";")));

    assertEquals("=", percentDecode(percentEncode("=")));
    assertEquals("?", percentDecode(percentEncode("?")));
    assertEquals("@", percentDecode(percentEncode("@")));
    assertEquals("[", percentDecode(percentEncode("[")));
    assertEquals("]", percentDecode(percentEncode("]")));
    assertEquals(" ", percentDecode(percentEncode(" ")));

    // Get a little complex
    assertEquals("[]]", percentDecode(percentEncode("[]]")));
    assertEquals("a=d%*", percentDecode(percentEncode("a=d%*")));
    assertEquals(")  (", percentDecode(percentEncode(")  (")));
    assertEquals("%21%20%2A%20%27%20%28%20%25%20%29%20%3B%20%3A%20%40%20%26%20%3D%20%2B%20%24%20%2C%20%2F%20%3F%20%23%20%5B%20%5D%20%25",
                    percentEncode("! * ' ( % ) ; : @ & = + $ , / ? # [ ] %"));
    assertEquals("! * ' ( % ) ; : @ & = + $ , / ? # [ ] %", percentDecode(
                    "%21%20%2A%20%27%20%28%20%25%20%29%20%3B%20%3A%20%40%20%26%20%3D%20%2B%20%24%20%2C%20%2F%20%3F%20%23%20%5B%20%5D%20%25"));

    assertEquals("%23456", percentDecode(percentEncode("%23456")));

}

If anybody doesn't want to add a dependency to their project, these functions may be helpful.

We pass the 'path' part of our URL into here. You probably don't want to pass the full URL in as a parameter (query strings need different escapes, etc).

/**
 * Percent-encodes a string so it's suitable for use in a URL Path (not a query string / form encode, which uses + for spaces, etc)
 */
public static String percentEncode(String encodeMe) {
    if (encodeMe == null) {
        return "";
    }
    String encoded = encodeMe.replace("%", "%25");
    encoded = encoded.replace(" ", "%20");
    encoded = encoded.replace("!", "%21");
    encoded = encoded.replace("#", "%23");
    encoded = encoded.replace("$", "%24");
    encoded = encoded.replace("&", "%26");
    encoded = encoded.replace("'", "%27");
    encoded = encoded.replace("(", "%28");
    encoded = encoded.replace(")", "%29");
    encoded = encoded.replace("*", "%2A");
    encoded = encoded.replace("+", "%2B");
    encoded = encoded.replace(",", "%2C");
    encoded = encoded.replace("/", "%2F");
    encoded = encoded.replace(":", "%3A");
    encoded = encoded.replace(";", "%3B");
    encoded = encoded.replace("=", "%3D");
    encoded = encoded.replace("?", "%3F");
    encoded = encoded.replace("@", "%40");
    encoded = encoded.replace("[", "%5B");
    encoded = encoded.replace("]", "%5D");
    return encoded;
}

/**
 * Percent-decodes a string, such as used in a URL Path (not a query string / form encode, which uses + for spaces, etc)
 */
public static String percentDecode(String encodeMe) {
    if (encodeMe == null) {
        return "";
    }
    String decoded = encodeMe.replace("%21", "!");
    decoded = decoded.replace("%20", " ");
    decoded = decoded.replace("%23", "#");
    decoded = decoded.replace("%24", "$");
    decoded = decoded.replace("%26", "&");
    decoded = decoded.replace("%27", "'");
    decoded = decoded.replace("%28", "(");
    decoded = decoded.replace("%29", ")");
    decoded = decoded.replace("%2A", "*");
    decoded = decoded.replace("%2B", "+");
    decoded = decoded.replace("%2C", ",");
    decoded = decoded.replace("%2F", "/");
    decoded = decoded.replace("%3A", ":");
    decoded = decoded.replace("%3B", ";");
    decoded = decoded.replace("%3D", "=");
    decoded = decoded.replace("%3F", "?");
    decoded = decoded.replace("%40", "@");
    decoded = decoded.replace("%5B", "[");
    decoded = decoded.replace("%5D", "]");
    decoded = decoded.replace("%25", "%");
    return decoded;
}

And tests:

@Test
public void testPercentEncode_Decode() {
    assertEquals("", percentDecode(percentEncode(null)));
    assertEquals("", percentDecode(percentEncode("")));

    assertEquals("!", percentDecode(percentEncode("!")));
    assertEquals("#", percentDecode(percentEncode("#")));
    assertEquals("$", percentDecode(percentEncode("$")));
    assertEquals("@", percentDecode(percentEncode("@")));
    assertEquals("&", percentDecode(percentEncode("&")));
    assertEquals("'", percentDecode(percentEncode("'")));
    assertEquals("(", percentDecode(percentEncode("(")));
    assertEquals(")", percentDecode(percentEncode(")")));
    assertEquals("*", percentDecode(percentEncode("*")));
    assertEquals("+", percentDecode(percentEncode("+")));
    assertEquals(",", percentDecode(percentEncode(",")));
    assertEquals("/", percentDecode(percentEncode("/")));
    assertEquals(":", percentDecode(percentEncode(":")));
    assertEquals(";", percentDecode(percentEncode(";")));

    assertEquals("=", percentDecode(percentEncode("=")));
    assertEquals("?", percentDecode(percentEncode("?")));
    assertEquals("@", percentDecode(percentEncode("@")));
    assertEquals("[", percentDecode(percentEncode("[")));
    assertEquals("]", percentDecode(percentEncode("]")));
    assertEquals(" ", percentDecode(percentEncode(" ")));

    // Get a little complex
    assertEquals("[]]", percentDecode(percentEncode("[]]")));
    assertEquals("a=d%*", percentDecode(percentEncode("a=d%*")));
    assertEquals(")  (", percentDecode(percentEncode(")  (")));
    assertEquals("%21%20%2A%20%27%20%28%20%25%20%29%20%3B%20%3A%20%40%20%26%20%3D%20%2B%20%24%20%2C%20%2F%20%3F%20%23%20%5B%20%5D%20%25",
                    percentEncode("! * ' ( % ) ; : @ & = + $ , / ? # [ ] %"));
    assertEquals("! * ' ( % ) ; : @ & = + $ , / ? # [ ] %", percentDecode(
                    "%21%20%2A%20%27%20%28%20%25%20%29%20%3B%20%3A%20%40%20%26%20%3D%20%2B%20%24%20%2C%20%2F%20%3F%20%23%20%5B%20%5D%20%25"));

    assertEquals("%23456", percentDecode(percentEncode("%23456")));

}
携君以终年 2024-07-24 05:14:07

不幸的是,org.apache.commons.httpclient.util.URIUtil已被弃用,并且替换org.apache.commons.codec.net.URLCodec进行了适合表单帖子的编码,而不是在实际的 URL 中。 所以我必须编写自己的函数,它执行单个组件(不适合具有 ? 和 & 的整个查询字符串)

public static String encodeURLComponent(final String s)
{
  if (s == null)
  {
    return "";
  }

  final StringBuilder sb = new StringBuilder();

  try
  {
    for (int i = 0; i < s.length(); i++)
    {
      final char c = s.charAt(i);

      if (((c >= 'A') && (c <= 'Z')) || ((c >= 'a') && (c <= 'z')) ||
          ((c >= '0') && (c <= '9')) ||
          (c == '-') ||  (c == '.')  || (c == '_') || (c == '~'))
      {
        sb.append(c);
      }
      else
      {
        final byte[] bytes = ("" + c).getBytes("UTF-8");

        for (byte b : bytes)
        {
          sb.append('%');

          int upper = (((int) b) >> 4) & 0xf;
          sb.append(Integer.toHexString(upper).toUpperCase(Locale.US));

          int lower = ((int) b) & 0xf;
          sb.append(Integer.toHexString(lower).toUpperCase(Locale.US));
        }
      }
    }

    return sb.toString();
  }
  catch (UnsupportedEncodingException uee)
  {
    throw new RuntimeException("UTF-8 unsupported!?", uee);
  }
}

Unfortunately, org.apache.commons.httpclient.util.URIUtil is deprecated, and the replacement org.apache.commons.codec.net.URLCodec does coding suitable for form posts, not in actual URL's. So I had to write my own function, which does a single component (not suitable for entire query strings that have ?'s and &'s)

public static String encodeURLComponent(final String s)
{
  if (s == null)
  {
    return "";
  }

  final StringBuilder sb = new StringBuilder();

  try
  {
    for (int i = 0; i < s.length(); i++)
    {
      final char c = s.charAt(i);

      if (((c >= 'A') && (c <= 'Z')) || ((c >= 'a') && (c <= 'z')) ||
          ((c >= '0') && (c <= '9')) ||
          (c == '-') ||  (c == '.')  || (c == '_') || (c == '~'))
      {
        sb.append(c);
      }
      else
      {
        final byte[] bytes = ("" + c).getBytes("UTF-8");

        for (byte b : bytes)
        {
          sb.append('%');

          int upper = (((int) b) >> 4) & 0xf;
          sb.append(Integer.toHexString(upper).toUpperCase(Locale.US));

          int lower = ((int) b) & 0xf;
          sb.append(Integer.toHexString(lower).toUpperCase(Locale.US));
        }
      }
    }

    return sb.toString();
  }
  catch (UnsupportedEncodingException uee)
  {
    throw new RuntimeException("UTF-8 unsupported!?", uee);
  }
}
梦断已成空 2024-07-24 05:14:07

正如您不幸发现的那样,URLEncoding 可以很好地对 HTTP URL 进行编码。 您传入的字符串“http://search.barnesandnoble.com/booksearch/first book .pdf”,已正确且完整地编码为 URL 编码形式。 您可以将返回的整个长字符串作为 URL 中的参数传递,并且可以将其解码回您传入的字符串。

听起来您想做的事情与传递整个 URL 略有不同作为参数。 据我所知,您正在尝试创建一个类似于“http://search.barnesandnoble”的搜索网址.com/booksearch/whateverTheUserPassesIn”。 您唯一需要编码的是“whateverTheUserPassesIn”位,因此也许您需要做的就是这样:

String url = "http://search.barnesandnoble.com/booksearch/" + 
       URLEncoder.encode(userInput,"UTF-8");

这应该会产生对您来说更有效的东西。

URLEncoding can encode HTTP URLs just fine, as you've unfortunately discovered. The string you passed in, "http://search.barnesandnoble.com/booksearch/first book.pdf", was correctly and completely encoded into a URL-encoded form. You could pass that entire long string of gobbledigook that you got back as a parameter in a URL, and it could be decoded back into exactly the string you passed in.

It sounds like you want to do something a little different than passing the entire URL as a parameter. From what I gather, you're trying to create a search URL that looks like "http://search.barnesandnoble.com/booksearch/whateverTheUserPassesIn". The only thing that you need to encode is the "whateverTheUserPassesIn" bit, so perhaps all you need to do is something like this:

String url = "http://search.barnesandnoble.com/booksearch/" + 
       URLEncoder.encode(userInput,"UTF-8");

That should produce something rather more valid for you.

吻安 2024-07-24 05:14:07

如果您的 URL 中包含编码的“/”(%2F),则仍然存在问题。

RFC 3986 - 第 2.2 节规定:“如果 URI 组件的数据与保留字符作为分隔符的用途发生冲突,则必须在形成 URI 之前对冲突数据进行百分比编码。” (RFC 3986 - 第 2.2 节)

但是 Tomcat 存在一个问题:

http://tomcat.apache.org/security-6.html - Apache Tomcat 6.0.10 中已修复

重要提示:目录遍历 CVE-2007-0450

Tomcat 允许“\”、“%2F”和“%5C”
[...]。

以下 Java 系统属性
已添加到Tomcat以提供
对处理的额外控制
URL 中的路径分隔符(两个选项
默认为 false):

  • org.apache.tomcat.util.buf.UDecoder.ALLOW_ENCODED_SLASH:
    真|假
  • org.apache.catalina.connector.CoyoteAdapter.ALLOW_BACKSLASH:
    真|假

由于无法保证
所有 URL 均由 Tomcat 处理
它们位于代理服务器、Tomcat 中
应始终受到保护,就好像没有一样
代理限制上下文访问是
使用过。

影响:6.0.0-6.0.9

因此,如果您的 URL 包含 %2F 字符,Tomcat 将返回:“400 Invalid URI: noSlash”

您可以在 Tomcat 启动脚本中切换错误修复:

set JAVA_OPTS=%JAVA_OPTS% %LOGGING_CONFIG%   -Dorg.apache.tomcat.util.buf.UDecoder.ALLOW_ENCODED_SLASH=true 

There is still a problem if you have got an encoded "/" (%2F) in your URL.

RFC 3986 - Section 2.2 says: "If data for a URI component would conflict with a reserved character's purpose as a delimiter, then the conflicting data must be percent-encoded before the URI is formed." (RFC 3986 - Section 2.2)

But there is an Issue with Tomcat:

http://tomcat.apache.org/security-6.html - Fixed in Apache Tomcat 6.0.10

important: Directory traversal CVE-2007-0450

Tomcat permits '\', '%2F' and '%5C'
[...] .

The following Java system properties
have been added to Tomcat to provide
additional control of the handling of
path delimiters in URLs (both options
default to false):

  • org.apache.tomcat.util.buf.UDecoder.ALLOW_ENCODED_SLASH:
    true|false
  • org.apache.catalina.connector.CoyoteAdapter.ALLOW_BACKSLASH:
    true|false

Due to the impossibility to guarantee
that all URLs are handled by Tomcat as
they are in proxy servers, Tomcat
should always be secured as if no
proxy restricting context access was
used.

Affects: 6.0.0-6.0.9

So if you have got an URL with the %2F character, Tomcat returns: "400 Invalid URI: noSlash"

You can switch of the bugfix in the Tomcat startup script:

set JAVA_OPTS=%JAVA_OPTS% %LOGGING_CONFIG%   -Dorg.apache.tomcat.util.buf.UDecoder.ALLOW_ENCODED_SLASH=true 
挽梦忆笙歌 2024-07-24 05:14:07

我阅读了以前的答案来编写自己的方法,因为使用以前的答案的解决方案无法让某些东西正常工作,它看起来对我来说很好,但如果您找到不适用于此的 URL,请告诉我。

public static URL convertToURLEscapingIllegalCharacters(String toEscape) throws MalformedURLException, URISyntaxException {
            URL url = new URL(toEscape);
            URI uri = new URI(url.getProtocol(), url.getUserInfo(), url.getHost(), url.getPort(), url.getPath(), url.getQuery(), url.getRef());
            //if a % is included in the toEscape string, it will be re-encoded to %25 and we don't want re-encoding, just encoding
            return new URL(uri.toString().replace("%25", "%"));
}

I read the previous answers to write my own method because I could not have something properly working using the solution of the previous answers, it looks good for me but if you can find URL that does not work with this, please let me know.

public static URL convertToURLEscapingIllegalCharacters(String toEscape) throws MalformedURLException, URISyntaxException {
            URL url = new URL(toEscape);
            URI uri = new URI(url.getProtocol(), url.getUserInfo(), url.getHost(), url.getPort(), url.getPath(), url.getQuery(), url.getRef());
            //if a % is included in the toEscape string, it will be re-encoded to %25 and we don't want re-encoding, just encoding
            return new URL(uri.toString().replace("%25", "%"));
}
几度春秋 2024-07-24 05:14:07

也许可以尝试 UriUtils 在 org.springframework.web.util 中

UriUtils.encodeUri(input, "UTF-8")

Maybe can try UriUtils in org.springframework.web.util

UriUtils.encodeUri(input, "UTF-8")
你另情深 2024-07-24 05:14:07

您还可以使用 GUAVA 和路径转义器:
UrlEscapers.urlFragmentEscaper().escape(relativePath)

You can also use GUAVA and path escaper:
UrlEscapers.urlFragmentEscaper().escape(relativePath)

我ぃ本無心為│何有愛 2024-07-24 05:14:07

我同意马特的观点。 事实上,我从未在教程中看到它得到很好的解释,但一件事是如何对 URL 路径进行编码,另一件事是如何对附加到 URL 的参数进行编码(查询部分,在“?”后面) “ 象征)。 它们使用相似的编码,但不相同。

专门用于空白字符的编码。 URL 路径需要将其编码为 %20,而查询部分允许 %20 以及“+”号。 最好的想法是使用 Web 浏览器针对我们的 Web 服务器进行测试。

对于这两种情况,我总是会编码COMPONENT BY COMPONENT,而不是整个字符串。 事实上,URLEncoder 允许在查询部分这样做。 对于路径部分,您可以使用类 URI,尽管在这种情况下它要求整个字符串,而不是单个组件。

无论如何,我相信避免这些问题的最好方法是使用个人的非冲突设计。如何? 例如,我绝不会使用 aZ、AZ、0-9 和 _ 以外的其他字符来命名目录或参数。 这样,唯一需要的是对每个参数的值进行编码,因为它可能来自用户输入并且使用的字符是未知的。

I agree with Matt. Indeed, I've never seen it well explained in tutorials, but one matter is how to encode the URL path, and a very different one is how to encode the parameters which are appended to the URL (the query part, behind the "?" symbol). They use similar encoding, but not the same.

Specially for the encoding of the white space character. The URL path needs it to be encoded as %20, whereas the query part allows %20 and also the "+" sign. The best idea is to test it by ourselves against our Web server, using a Web browser.

For both cases, I ALWAYS would encode COMPONENT BY COMPONENT, never the whole string. Indeed URLEncoder allows that for the query part. For the path part you can use the class URI, although in this case it asks for the entire string, not a single component.

Anyway, I believe that the best way to avoid these problems is to use a personal non-conflictive design. How? For example, I never would name directories or parameters using other characters than a-Z, A-Z, 0-9 and _ . That way, the only need is to encode the value of every parameter, since it may come from an user input and the used characters are unknown.

少女七分熟 2024-07-24 05:14:07

我把上面的内容做了一些修改。 我首先喜欢正逻辑,并且我认为 HashSet 可能比其他一些选项(例如搜索字符串)提供更好的性能。 虽然,我不确定自动装箱的惩罚是否值得,但如果编译器针对 ASCII 字符进行优化,那么装箱的成本将会很低。

/***
 * Replaces any character not specifically unreserved to an equivalent 
 * percent sequence.
 * @param s
 * @return
 */
public static String encodeURIcomponent(String s)
{
    StringBuilder o = new StringBuilder();
    for (char ch : s.toCharArray()) {
        if (isSafe(ch)) {
            o.append(ch);
        }
        else {
            o.append('%');
            o.append(toHex(ch / 16));
            o.append(toHex(ch % 16));
        }
    }
    return o.toString();
}

private static char toHex(int ch)
{
    return (char)(ch < 10 ? '0' + ch : 'A' + ch - 10);
}

// https://tools.ietf.org/html/rfc3986#section-2.3
public static final HashSet<Character> UnreservedChars = new HashSet<Character>(Arrays.asList(
        'A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z',
        'a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z',
        '0','1','2','3','4','5','6','7','8','9',
        '-','_','.','~'));
public static boolean isSafe(char ch)
{
    return UnreservedChars.contains(ch);
}

I took the content above and changed it around a bit. I like positive logic first, and I thought a HashSet might give better performance than some other options, like searching through a String. Although, I'm not sure if the autoboxing penalty is worth it, but if the compiler optimizes for ASCII chars, then the cost of boxing will be low.

/***
 * Replaces any character not specifically unreserved to an equivalent 
 * percent sequence.
 * @param s
 * @return
 */
public static String encodeURIcomponent(String s)
{
    StringBuilder o = new StringBuilder();
    for (char ch : s.toCharArray()) {
        if (isSafe(ch)) {
            o.append(ch);
        }
        else {
            o.append('%');
            o.append(toHex(ch / 16));
            o.append(toHex(ch % 16));
        }
    }
    return o.toString();
}

private static char toHex(int ch)
{
    return (char)(ch < 10 ? '0' + ch : 'A' + ch - 10);
}

// https://tools.ietf.org/html/rfc3986#section-2.3
public static final HashSet<Character> UnreservedChars = new HashSet<Character>(Arrays.asList(
        'A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z',
        'a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z',
        '0','1','2','3','4','5','6','7','8','9',
        '-','_','.','~'));
public static boolean isSafe(char ch)
{
    return UnreservedChars.contains(ch);
}
绝不服输 2024-07-24 05:14:07

除了 Carlos Heuberger 的回复之外:
如果需要与默认值 (80) 不同的值,则应使用 7 参数构造函数:

URI uri = new URI(
        "http",
        null, // this is for userInfo
        "www.google.com",
        8080, // port number as int
        "/ig/api",
        "weather=São Paulo",
        null);
String request = uri.toASCIIString();

In addition to the Carlos Heuberger's reply:
if a different than the default (80) is needed, the 7 param constructor should be used:

URI uri = new URI(
        "http",
        null, // this is for userInfo
        "www.google.com",
        8080, // port number as int
        "/ig/api",
        "weather=São Paulo",
        null);
String request = uri.toASCIIString();
独守阴晴ぅ圆缺 2024-07-24 05:14:07

使用以下标准 Java 解决方案(通过 Web 平台测试):

0. 测试 URL 是否已编码< /a>.

1. 将 URL 拆分为结构部分。 使用 java.net.URL 来实现它。

2.对每个结构部分进行正确编码!

3.使用IDN.toASCII(putDomainNameHere) Punycode 对主机名进行编码!

4. 使用java.net.URI.toASCIIString() 进行百分比编码、NFC 编码的 unicode -(最好是 NFKC!)。

在此处查找更多信息:https://stackoverflow.com/a/49796882/1485527

Use the following standard Java solution (passes around 100 of the testcases provided by Web Plattform Tests):

0. Test if URL is already encoded.

1. Split URL into structural parts. Use java.net.URL for it.

2. Encode each structural part properly!

3. Use IDN.toASCII(putDomainNameHere) to Punycode encode the host name!

4. Use java.net.URI.toASCIIString() to percent-encode, NFC encoded unicode - (better would be NFKC!).

Find more here: https://stackoverflow.com/a/49796882/1485527

寄人书 2024-07-24 05:14:07

如果你正在使用spring,你可以尝试一下
org.springframework.web.util.UriUtils#encodePath

If you are using spring, you can try
org.springframework.web.util.UriUtils#encodePath

烏雲後面有陽光 2024-07-24 05:14:07

我创建了一个新项目来帮助构建 HTTP URL。 该库将自动对路径段和查询参数进行 URL 编码。

https://github.com/Widen/urlbuilder 查看源代码并下载二进制文件

您可以在 此问题中的 URL:

new UrlBuilder("search.barnesandnoble.com", "booksearch/first book.pdf").toString()

产生

http://search.barnesandnoble.com/booksearch/first%20book.pdf

I've created a new project to help construct HTTP URLs. The library will automatically URL encode path segments and query parameters.

You can view the source and download a binary at https://github.com/Widen/urlbuilder

The example URL in this question:

new UrlBuilder("search.barnesandnoble.com", "booksearch/first book.pdf").toString()

produces

http://search.barnesandnoble.com/booksearch/first%20book.pdf

苏别ゝ 2024-07-24 05:14:07

我有同样的问题。 通过 unsing 解决了这个问题:

android.net.Uri.encode(urlString, ":/");

它对字符串进行编码,但跳过“:”和“/”。

I had the same problem. Solved this by unsing:

android.net.Uri.encode(urlString, ":/");

It encodes the string but skips ":" and "/".

小苏打饼 2024-07-24 05:14:07

我开发了一个用于此目的的库:galimatias。 它以与 Web 浏览器相同的方式解析 URL。 也就是说,如果 URL 在浏览器中有效,galimatias 将正确解析该 URL。

在这种情况下:

// Parse
io.mola.galimatias.URL.parse(
    "http://search.barnesandnoble.com/booksearch/first book.pdf"
).toString()

将为您提供:http://search.barnesandnoble.com/booksearch/first%20book.pdf。 当然,这是最简单的情况,但它适用于任何情况,远远超出 java.net.URI 的范围。

您可以在以下位置查看:https://github.com/smola/galimatias

I develop a library that serves this purpose: galimatias. It parses URL the same way web browsers do. That is, if a URL works in a browser, it will be correctly parsed by galimatias.

In this case:

// Parse
io.mola.galimatias.URL.parse(
    "http://search.barnesandnoble.com/booksearch/first book.pdf"
).toString()

Will give you: http://search.barnesandnoble.com/booksearch/first%20book.pdf. Of course this is the simplest case, but it'll work with anything, way beyond java.net.URI.

You can check it out at: https://github.com/smola/galimatias

滥情稳全场 2024-07-24 05:14:07

我用这个

org.apache.commons.text.StringEscapeUtils.escapeHtml4("my text % & < >");

添加这个依赖

 <dependency>
        <groupId>org.apache.commons</groupId>
        <artifactId>commons-text</artifactId>
        <version>1.8</version>
    </dependency>

i use this

org.apache.commons.text.StringEscapeUtils.escapeHtml4("my text % & < >");

add this dependecy

 <dependency>
        <groupId>org.apache.commons</groupId>
        <artifactId>commons-text</artifactId>
        <version>1.8</version>
    </dependency>
何以心动 2024-07-24 05:14:07

您可以使用这样的函数。 根据您的需要完成并修改它:

/**
     * Encode URL (except :, /, ?, &, =, ... characters)
     * @param url to encode
     * @param encodingCharset url encoding charset
     * @return encoded URL
     * @throws UnsupportedEncodingException
     */
    public static String encodeUrl (String url, String encodingCharset) throws UnsupportedEncodingException{
            return new URLCodec().encode(url, encodingCharset).replace("%3A", ":").replace("%2F", "/").replace("%3F", "?").replace("%3D", "=").replace("%26", "&");
    }

使用示例:

String urlToEncode = ""http://www.growup.com/folder/intérieur-à_vendre?o=4";
Utils.encodeUrl (urlToEncode , "UTF-8")

结果为: http://www.growup.com/folder/int%C3%A9rieur-%C3%A0_vendre?o=4

You can use a function like this. Complete and modify it to your need :

/**
     * Encode URL (except :, /, ?, &, =, ... characters)
     * @param url to encode
     * @param encodingCharset url encoding charset
     * @return encoded URL
     * @throws UnsupportedEncodingException
     */
    public static String encodeUrl (String url, String encodingCharset) throws UnsupportedEncodingException{
            return new URLCodec().encode(url, encodingCharset).replace("%3A", ":").replace("%2F", "/").replace("%3F", "?").replace("%3D", "=").replace("%26", "&");
    }

Example of use :

String urlToEncode = ""http://www.growup.com/folder/intérieur-à_vendre?o=4";
Utils.encodeUrl (urlToEncode , "UTF-8")

The result is : http://www.growup.com/folder/int%C3%A9rieur-%C3%A0_vendre?o=4

小…楫夜泊 2024-07-24 05:14:07

怎么样:

public String UrlEncode(String in_) {

String retVal = "";

try {
    retVal = URLEncoder.encode(in_, "UTF8");
} catch (UnsupportedEncodingException ex) {
    Log.get().exception(Log.Level.Error, "urlEncode ", ex);
}

return retVal;

}

How about:

public String UrlEncode(String in_) {

String retVal = "";

try {
    retVal = URLEncoder.encode(in_, "UTF8");
} catch (UnsupportedEncodingException ex) {
    Log.get().exception(Log.Level.Error, "urlEncode ", ex);
}

return retVal;

}

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文