Java 中的百分比编码空格问题

发布于 2024-08-26 21:55:47 字数 4039 浏览 3 评论 0原文

我正在使用 W3C 中的 URLUTF8Encoder.java 类 (www.w3.org/International/URLUTF8Encoder.java )。

目前,它将所有空格“ ”编码为加号“+”。

我在修改代码以将空格百分比编码为“%20”时遇到困难。不幸的是,我对十六进制不太熟悉。有人可以帮我吗?我需要修改这个片段......

else if (ch == ' ') { // space
                sbuf.append('+');

在以下代码中:

final static String[] hex = { "%00", "%01", "%02", "%03", "%04", "%05",
            "%06", "%07", "%08", "%09", "%0A", "%0B", "%0C", "%0D", "%0E",
            "%0F", "%10", "%11", "%12", "%13", "%14", "%15", "%16", "%17",
            "%18", "%19", "%1A", "%1B", "%1C", "%1D", "%1E", "%1F", "%20",
            "%21", "%22", "%23", "%24", "%25", "%26", "%27", "%28", "%29",
            "%2A", "%2B", "%2C", "%2D", "%2E", "%2F", "%30", "%31", "%32",
            "%33", "%34", "%35", "%36", "%37", "%38", "%39", "%3A", "%3B",
            "%3C", "%3D", "%3E", "%3F", "%40", "%41", "%42", "%43", "%44",
            "%45", "%46", "%47", "%48", "%49", "%4A", "%4B", "%4C", "%4D",
            "%4E", "%4F", "%50", "%51", "%52", "%53", "%54", "%55", "%56",
            "%57", "%58", "%59", "%5A", "%5B", "%5C", "%5D", "%5E", "%5F",
            "%60", "%61", "%62", "%63", "%64", "%65", "%66", "%67", "%68",
            "%69", "%6A", "%6B", "%6C", "%6D", "%6E", "%6F", "%70", "%71",
            "%72", "%73", "%74", "%75", "%76", "%77", "%78", "%79", "%7A",
            "%7B", "%7C", "%7D", "%7E", "%7F", "%80", "%81", "%82", "%83",
            "%84", "%85", "%86", "%87", "%88", "%89", "%8A", "%8B", "%8C",
            "%8D", "%8E", "%8F", "%90", "%91", "%92", "%93", "%94", "%95",
            "%96", "%97", "%98", "%99", "%9A", "%9B", "%9C", "%9D", "%9E",
            "%9F", "%A0", "%A1", "%A2", "%A3", "%A4", "%A5", "%A6", "%A7",
            "%A8", "%A9", "%AA", "%AB", "%AC", "%AD", "%AE", "%AF", "%B0",
            "%B1", "%B2", "%B3", "%B4", "%B5", "%B6", "%B7", "%B8", "%B9",
            "%BA", "%BB", "%BC", "%BD", "%BE", "%BF", "%C0", "%C1", "%C2",
            "%C3", "%C4", "%C5", "%C6", "%C7", "%C8", "%C9", "%CA", "%CB",
            "%CC", "%CD", "%CE", "%CF", "%D0", "%D1", "%D2", "%D3", "%D4",
            "%D5", "%D6", "%D7", "%D8", "%D9", "%DA", "%DB", "%DC", "%DD",
            "%DE", "%DF", "%E0", "%E1", "%E2", "%E3", "%E4", "%E5", "%E6",
            "%E7", "%E8", "%E9", "%EA", "%EB", "%EC", "%ED", "%EE", "%EF",
            "%F0", "%F1", "%F2", "%F3", "%F4", "%F5", "%F6", "%F7", "%F8",
            "%F9", "%FA", "%FB", "%FC", "%FD", "%FE", "%FF" };

public static String encode(String s) {
        StringBuffer sbuf = new StringBuffer();
        int len = s.length();
        for (int i = 0; i < len; i++) {
            int ch = s.charAt(i);
            if ('A' <= ch && ch <= 'Z') { // 'A'..'Z'
                sbuf.append((char) ch);
            } else if ('a' <= ch && ch <= 'z') { // 'a'..'z'
                sbuf.append((char) ch);
            } else if ('0' <= ch && ch <= '9') { // '0'..'9'
                sbuf.append((char) ch);
            } else if (ch == ' ') { // space
                sbuf.append('+');
            } else if (ch == '-'
                    || ch == '_' // unreserved
                    || ch == '.' || ch == '!' || ch == '~' || ch == '*'
                    || ch == '\'' || ch == '(' || ch == ')') {
                sbuf.append((char) ch);
            } else if (ch <= 0x007f) { // other ASCII
                sbuf.append(hex[ch]);
            } else if (ch <= 0x07FF) { // non-ASCII <= 0x7FF
                sbuf.append(hex[0xc0 | (ch >> 6)]);
                sbuf.append(hex[0x80 | (ch & 0x3F)]);
            } else { // 0x7FF < ch <= 0xFFFF
                sbuf.append(hex[0xe0 | (ch >> 12)]);
                sbuf.append(hex[0x80 | ((ch >> 6) & 0x3F)]);
                sbuf.append(hex[0x80 | (ch & 0x3F)]);
            }
        }
        return sbuf.toString();
    }

谢谢!

I am using the URLUTF8Encoder.java class from W3C (www.w3.org/International/URLUTF8Encoder.java).

Currently, it will encode any blank spaces ' ' into plus signs '+'.

I am having difficulty modifying the code to percent-encode the blank space into '%20'. Unfortunately, I am not too familiar with hex. Can anyone help me out? I need to modify this snippet...

else if (ch == ' ') { // space
                sbuf.append('+');

in the following code:

final static String[] hex = { "%00", "%01", "%02", "%03", "%04", "%05",
            "%06", "%07", "%08", "%09", "%0A", "%0B", "%0C", "%0D", "%0E",
            "%0F", "%10", "%11", "%12", "%13", "%14", "%15", "%16", "%17",
            "%18", "%19", "%1A", "%1B", "%1C", "%1D", "%1E", "%1F", "%20",
            "%21", "%22", "%23", "%24", "%25", "%26", "%27", "%28", "%29",
            "%2A", "%2B", "%2C", "%2D", "%2E", "%2F", "%30", "%31", "%32",
            "%33", "%34", "%35", "%36", "%37", "%38", "%39", "%3A", "%3B",
            "%3C", "%3D", "%3E", "%3F", "%40", "%41", "%42", "%43", "%44",
            "%45", "%46", "%47", "%48", "%49", "%4A", "%4B", "%4C", "%4D",
            "%4E", "%4F", "%50", "%51", "%52", "%53", "%54", "%55", "%56",
            "%57", "%58", "%59", "%5A", "%5B", "%5C", "%5D", "%5E", "%5F",
            "%60", "%61", "%62", "%63", "%64", "%65", "%66", "%67", "%68",
            "%69", "%6A", "%6B", "%6C", "%6D", "%6E", "%6F", "%70", "%71",
            "%72", "%73", "%74", "%75", "%76", "%77", "%78", "%79", "%7A",
            "%7B", "%7C", "%7D", "%7E", "%7F", "%80", "%81", "%82", "%83",
            "%84", "%85", "%86", "%87", "%88", "%89", "%8A", "%8B", "%8C",
            "%8D", "%8E", "%8F", "%90", "%91", "%92", "%93", "%94", "%95",
            "%96", "%97", "%98", "%99", "%9A", "%9B", "%9C", "%9D", "%9E",
            "%9F", "%A0", "%A1", "%A2", "%A3", "%A4", "%A5", "%A6", "%A7",
            "%A8", "%A9", "%AA", "%AB", "%AC", "%AD", "%AE", "%AF", "%B0",
            "%B1", "%B2", "%B3", "%B4", "%B5", "%B6", "%B7", "%B8", "%B9",
            "%BA", "%BB", "%BC", "%BD", "%BE", "%BF", "%C0", "%C1", "%C2",
            "%C3", "%C4", "%C5", "%C6", "%C7", "%C8", "%C9", "%CA", "%CB",
            "%CC", "%CD", "%CE", "%CF", "%D0", "%D1", "%D2", "%D3", "%D4",
            "%D5", "%D6", "%D7", "%D8", "%D9", "%DA", "%DB", "%DC", "%DD",
            "%DE", "%DF", "%E0", "%E1", "%E2", "%E3", "%E4", "%E5", "%E6",
            "%E7", "%E8", "%E9", "%EA", "%EB", "%EC", "%ED", "%EE", "%EF",
            "%F0", "%F1", "%F2", "%F3", "%F4", "%F5", "%F6", "%F7", "%F8",
            "%F9", "%FA", "%FB", "%FC", "%FD", "%FE", "%FF" };

public static String encode(String s) {
        StringBuffer sbuf = new StringBuffer();
        int len = s.length();
        for (int i = 0; i < len; i++) {
            int ch = s.charAt(i);
            if ('A' <= ch && ch <= 'Z') { // 'A'..'Z'
                sbuf.append((char) ch);
            } else if ('a' <= ch && ch <= 'z') { // 'a'..'z'
                sbuf.append((char) ch);
            } else if ('0' <= ch && ch <= '9') { // '0'..'9'
                sbuf.append((char) ch);
            } else if (ch == ' ') { // space
                sbuf.append('+');
            } else if (ch == '-'
                    || ch == '_' // unreserved
                    || ch == '.' || ch == '!' || ch == '~' || ch == '*'
                    || ch == '\'' || ch == '(' || ch == ')') {
                sbuf.append((char) ch);
            } else if (ch <= 0x007f) { // other ASCII
                sbuf.append(hex[ch]);
            } else if (ch <= 0x07FF) { // non-ASCII <= 0x7FF
                sbuf.append(hex[0xc0 | (ch >> 6)]);
                sbuf.append(hex[0x80 | (ch & 0x3F)]);
            } else { // 0x7FF < ch <= 0xFFFF
                sbuf.append(hex[0xe0 | (ch >> 12)]);
                sbuf.append(hex[0x80 | ((ch >> 6) & 0x3F)]);
                sbuf.append(hex[0x80 | (ch & 0x3F)]);
            }
        }
        return sbuf.toString();
    }

Thanks!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

茶花眉 2024-09-02 21:55:47

您可能想查看 Apache Common 的编解码器包,它可能更强大:http://commons.apache。 org/codec/ - 您使用的包大约已有 14 年历史,仅编码为一种编码类型 (www-url-form-encoded) - 需要将空格编码为“+”。如果您尝试执行标准 URL 编码(需要空格为 %20),则需要完全使用不同的包。

You might want to check out Apache Common's codec package, it's probably a lot more robust : http://commons.apache.org/codec/ - The package you're using is about 14 years old and only encodes into one type of encoding (www-url-form-encoded) - which REQUIRES spaces to be encoded as '+'. If you're trying do do standard URL encoding (which wants spaces as %20), you'll need to use a different package entirely.

咿呀咿呀哟 2024-09-02 21:55:47

为什么使用此类而不是 API 方法?

java.net.URLEncoder.encode("your string", "utf-8");

为什么空格被编码为 + 字符会出现问题?这正是 URL 安全字符编码的工作原理。

Why are you using this class instead of the API method?

java.net.URLEncoder.encode("your string", "utf-8");

And why is it a problem that spaces are encoded as + characters? That is exactly how URL safe character encoding is supposed to work.

半山落雨半山空 2024-09-02 21:55:47

我不会问你为什么这样做,直接回答你的问题。请阅读其他答案以确定您是否确实想要修改此代码。如果您只是删除代码:

else if (ch == ' ') { // space
   sbuf.append('+');
} 

它将执行您想要的操作,因为空格字符将由代码处理:

else if (ch <= 0x007f) { // other ASCII
   sbuf.append(hex[ch]);
} 

I won't ask why you're doing this, and just answer your question directly. Please read other answers to determine if you really want to be modifying this code. If you just remove the code:

else if (ch == ' ') { // space
   sbuf.append('+');
} 

It will do what you want, because the space character will be taken care of by the code:

else if (ch <= 0x007f) { // other ASCII
   sbuf.append(hex[ch]);
} 
岁吢 2024-09-02 21:55:47

只需这样做:

String str = "Hello World+You";
String encodedStr = URLEncoder.encode(str, "UTF-8");
encodedStr = encodedStr.replace("+", "%20");
System.out.println("Encoded String: " + encodedStr);

Just do this:

String str = "Hello World+You";
String encodedStr = URLEncoder.encode(str, "UTF-8");
encodedStr = encodedStr.replace("+", "%20");
System.out.println("Encoded String: " + encodedStr);
请你别敷衍 2024-09-02 21:55:47

您可以使用内置的 java.lang. net.URI 类,通常通过其静态构建器作为 URI.create("http://example.com/search?param=42") 使用,但是如果参数包含文字空间,您可以将其用作:

URI uri = new URI("http", // scheme
    null,                 // user authentication info
    "example.com",        // domain
    -1,                   // port (use -1 for default port 80)
    "/search",            // path
    "param=four and two", // one or more parameters
    null);                // fragment (appended with the # char)
System.out.println(uri)
// OUTPUT:
// http://example.com/search?param=four%20and%20two

如果你查看这个特定的URI 构造函数 你会看到 < code>-1 可用于指定默认端口(80);显式传递 80 作为构造函数值将创建一个类似于 http://example.com:80/search?param=four%20and%20two 的 URL,您可能会这样做想要。

相同的构造函数可用于仅构建 URL 的查询部分,您可以将其附加到现有字符串:

URI uri2 = new URI(null, null, null, -1, null, "param=four and two", null);
System.out.println(uri2)
// OUTPUT:
// ?param=four%20and%20two

可能值得一提的是 URI 与 URL 不同file:/// 是有效的 URI,但不是有效的 URL。

You can use the built-in java.net.URI class, which is normally used via it's static builder as URI.create("http://example.com/search?param=42") but in case when a parameter contains literal space you can use it as:

URI uri = new URI("http", // scheme
    null,                 // user authentication info
    "example.com",        // domain
    -1,                   // port (use -1 for default port 80)
    "/search",            // path
    "param=four and two", // one or more parameters
    null);                // fragment (appended with the # char)
System.out.println(uri)
// OUTPUT:
// http://example.com/search?param=four%20and%20two

If you look inside this particular URI constructor you'll see that -1 can be used to specify the default port (80); explicitly passing 80 as constructor value will create a URL like http://example.com:80/search?param=four%20and%20two which you probably do not want.

The same constructor can be used to build only the query part of the URL which you can append to an existing string:

URI uri2 = new URI(null, null, null, -1, null, "param=four and two", null);
System.out.println(uri2)
// OUTPUT:
// ?param=four%20and%20two

Might be worth mentioning that a URI is not the same as a URL: file:/// is a valid URI but not a valid URL.

鱼窥荷 2024-09-02 21:55:47

它工作正常;它应该与 + 一起使用,就像与 %20 一起使用一样。

也许尝试java.net.URLEncoder("url", "UTF-8")

It's working correctly; it should work with + as well as it would with %20.

Maybe try java.net.URLEncoder("url", "UTF-8")?

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文