使用 Crockford 的基数 32 作为 URL 中的 ID?

发布于 2024-12-13 19:44:08 字数 981 浏览 4 评论 0 原文

我想编写一些 ID 以在 Crockford's base32 中的 URL 中使用。我正在使用 base32 npm 模块

因此,例如,如果用户输入 http://domain/page/4A2A 我希望它映射到与 http://domain/page/4a2a 相同的底层 ID

这是因为我想要人类友好的 URL ,用户不必担心大写和小写字母之间的差异,或者“l”和“1”之间的差异 - 他们只会得到他们期望的页面。

但我很难实现这一点,主要是因为我太愚钝,无法理解编码的工作原理。首先我尝试:

var encoded1 = base32.encode('4a2a');
var encoded2 = base32.encode('4A2A');
console.log(encoded1, encoded2);

但它们映射到不同的底层 ID:

6hgk4r8 6h0k4g8

好的,所以也许我需要使用解码?

var encoded1 = base32.decode('4a2a');
var encoded2 = base32.decode('4A2A');
console.log(encoded1, encoded2);

不,这只是给了我空字符串:

"    " 

我做错了什么,如何让 4A2A 和 4A2A 映射到同一事物?

I'd like to write some IDs for use in URLs in Crockford's base32. I'm using the base32 npm module.

So, for example, if the user types in http://domain/page/4A2A I'd like it to map to the same underlying ID as http://domain/page/4a2a

This is because I want human-friendly URLs, where the user doesn't have to worry about the difference between upper- and lower-case letters, or between "l" and "1" - they just get the page they expect.

But I'm struggling to implement this, basically because I'm too dim to understand how encoding works. First I tried:

var encoded1 = base32.encode('4a2a');
var encoded2 = base32.encode('4A2A');
console.log(encoded1, encoded2);

But they map to different underlying IDs:

6hgk4r8 6h0k4g8

OK, so maybe I need to use decode?

var encoded1 = base32.decode('4a2a');
var encoded2 = base32.decode('4A2A');
console.log(encoded1, encoded2);

No, that just gives me empty strings:

"    " 

What am I doing wrong, and how can I get 4A2A and 4A2A to map to the same thing?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

墨小墨 2024-12-20 19:44:09

对于传入请求,您需要对 URL 片段进行解码。创建 URL 时,您将获取标识符并对其进行编码。因此,给定一个 URL http://domain/page/dnwnyub46m50,您将获取该片段并对其进行解码。例子:

<代码>#>回声“dnwnyub46m50”| base32 -d

my_id5

您链接到的库不区分大小写,因此您可以通过以下方式获得相同的结果:

回显'DNWNYUB46M50'| base32 -d

my_id5

在处理任何编码方案 (Base-16/32/64) 时,您有两个基本操作:encode,它适用于原始的位/字节流,以及解码,它采用一组编码的字节并返回原始的位/字节流。关于 Base32 编码的维基百科页面是一个很好的资源。

当您解码字符串时,您会得到原始字节:这些字节可能与 ASCIIUTF-8 或您尝试使用的其他某种编码不兼容与. 一起工作。这就是为什么您的解码示例看起来像空格:您使用的工具无法将生成的字节识别为有效字符。

如何对标识符进行编码取决于标识符的生成方式。您没有说明如何生成底层标识符,因此我无法对如何处理来自解码器的原始字节以及传递到编码器的原始字节的内容做出任何假设。

同样重要的是要提到您链接到的库与 Crockford 的 Base32 编码不兼容。该库不包括 I, L, O, S,而 Crockford 的编码不包括 I, L, O, U。如果您尝试与使用不同库的另一个系统进行互操作,这将是一个问题。如果除了您之外没有人需要解码您的 URL 片段,那么互操作性并不重要。

For an incoming request, you'll want to decode the URL fragment. When you create URLs, you will take your identifier and encode it. So, given a URL http://domain/page/dnwnyub46m50, you will take that fragment and decode it. Example:

#> echo 'dnwnyub46m50'| base32 -d

my_id5

The library you linked to is case-insensitive, so you get the same result this way:

echo 'DNWNYUB46M50'| base32 -d

my_id5

When dealing with any encoding scheme (Base-16/32/64), you have two basic operations: encode, which works on a raw stream of bits/bytes, and decode which takes an encoded set of bytes and returns the original bit/byte stream. The Wikipedia page on Base32 encoding is a great resource.

When you decode a string, you get raw bytes: it may be that those bytes are not compatible with ASCII, UTF-8, or some other encoding which you are trying to work with. This is why your decoded examples look like spaces: the tools you are using do not recognize the resulting bytes as valid characters.

How you go about encoding identifiers depends on how your identifiers are generated. You didn't say how you were generating the underlying identifiers, so I can't make any assumptions about how you should handle the raw bytes that come out of the decoder, nor about the content of the raw bytes being passed into the encoder.

It's also important to mention that the library you linked to is not compatible with Crockford's Base32 encoding. The library excludes I, L, O, S, while Crockford's encoding excludes I, L, O, U. This would be a problem if you were trying to interoperate with another system that used a different library. If no one besides you will ever need to decode your URL fragments, then interoperability doesn't matter.

多像笑话 2024-12-20 19:44:09

您感到困惑的根源在于,base64 或 base32 是表示数字的方法,而您在示例中尝试对文本字符串进行编码或解码。

将文本字符串编码和解码为 Base32 是通过首先将字符串转换为大数来完成的。在第一个示例中,您要编码“4a2a”和“4A2A”,这些字符串是具有两个不同数值的字符串,因此

当您“解码”4a2a 和 4A2A 时, 它们会转换为具有两个不同值的编码的 Base32 数字,即 6hgk4r8 6h0k4g8你会得到空字符串。然而事实并非如此,字符串不为空,它们包含解码后的数字在解释为字符串时的样子。也就是说,它看起来没什么,因为 4a2a 产生了一个无法打印的字符。它是看不见的。您想要的是向编码器提供数字,而不是字符串。

The source of your confusion is that a base64 or base32 are methods of representing numbers- whereas you are attempting in your examples to encode or decode text strings.

Encoding and decoding text strings as base32 is done by first converting the string into a large number. In your first examples, where you are encoding "4a2a" and "4A2A", those are strings with two different numeric values, that consequently translate to encoded base32 numbers with two different values, 6hgk4r8 6h0k4g8

when you "decode" 4a2a and 4A2A you say you get empty strings. However this is not true, the strings are not empty, they contain what the decoded number looks like, when interpreted as a string. Which is to say, it looks like nothing because 4a2a produces an unprintable character. It's invisible. What you want is to feed the encoder numbers, not strings.

峩卟喜欢 2024-12-20 19:44:09

JavaScript

parseInt(num, 32)

num.toString(32) 。

以与 Java 和跨 JavaScript 版本兼容的方式内置了

JavaScript has

parseInt(num, 32)

and

num.toString(32)

built in in a way that's compatible with Java and across JavaScript versions.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文