原始类型的字符串编码保留字典顺序

发布于 2024-08-11 22:54:28 字数 266 浏览 5 评论 0原文

有谁知道有一个库可以将许多基本类型(如整数、浮点数、字符串等)编码为字符串,但保留 类型的字典顺序

理想情况下,我正在寻找 C++ 库,但其他语言也可以。另外,可以假设该格式不需要在字符串本身中进行编码(也就是说,如果它是 int64/string/float 那么编码后的字符串不需要对此信息进行编码,只需对数据进行编码就足够了)。

Does anyone know of a library for encoding a number of primitive types (like integers, floats, strings, etc) into a string but preserving the lexicographical order of the types?

Ideally, I'm looking for a C++ library, but other languages are fine too. Also, one can assume that the format does not need to be encoded in the string itself (that is, if it's int64/string/float then the encoded string does not need to encode this information, only encoding the data is enough).

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

微暖i 2024-08-18 22:54:28

看一下这篇论文(“数字的高效词典编码”),它展示了如何将任何数字类型表示为字符串,使得字符串的词典顺序与基础数字的数字顺序相同。它处理任意长度的数字。

http://www.zanopha.com/docs/elen.pdf

Take a look at this paper ("Efficient Lexicographic Encoding of Numbers") which shows how to represent any numeric type as a string such the lexicographic order of the strings is the same as the numerical order of the underlying numbers. It copes with arbitrary length numbers.

http://www.zanopha.com/docs/elen.pdf

溺深海 2024-08-18 22:54:28

我遇到了将整数和长整型转换为保留顺序的字符串的问题。由于我使用 Java 工作,所以我只有签名类型。

我的算法非常简单:

  1. 翻转符号位(对于长整型,toEncode ^ Long.MAX_VALUE),否则负数大于正数。
  2. 对字节进行修改后的 Base64 编码。不幸的是,普通的 Base64 编码不保留顺序;特殊字符(+/)位于字符后面的数字后面。这与 ASCII 完全相反。我修改后的编码仅使用 ASCII 排序。 (为了清楚地表明这不是普通的 base64,我将特殊字符更改为 -_,并以 ~ 作为填充。这些是仍然可以在 URL 中使用,这是我的另一个限制。)

I had the problem of converting integers and longs to strings which preserve ordering. And since I was working in Java, I only had signed types.

My algorithm was very simple:

  1. Flip the sign bit (toEncode ^ Long.MAX_VALUE for longs) otherwise negative numbers are greater than positive numbers.
  2. Do a modified base64 encoding of the bytes. Unfortunately, the normal base64 encoding doesn't preserve ordering; the special characters (+ and /) are after the numbers which are after the characters. This is completely backwards from ASCII. My modified encoding simply uses the ASCII ordering. (To make it clear it wasn't normal base64, I changed the special chars to - and _ with ~ as the padding. These are still useable within an URL, which was another a constraint I had.)
神妖 2024-08-18 22:54:28

顺便提一句 ...
在Amazon Web Service的SimpleDB中,所有数据都存储为字符串。其 select 比较器使用字典顺序。 AWS 提供实用函数来编码各种类型。例如,在先验知道整数范围并通过补零和偏移(例如,对于负整数)进行调整的情况下对整数进行编码。当然,您可以给它尽可能最差的范围。

请参阅“查询 201:Amazon SimpleDB 查询的提示和技巧” - http://aws.amazon.com/articles /1232

http://typica.s3.amazonaws .com/com/xerox/amazonws/sdb/DataUtils.html

BTW ...
In Amazon Web Service's SimpleDB, all data are stored as strings. Its select comparators use lexicographic ordering. AWS provides utility functions to encode various types. For example, integers are encoded knowing the range of the integers apriori and adjusting via zero-padding and offsets (e.g. for negative integers). You could of course give it the worst possible range.

See "Query 201: Tips and Tricks for Amazon SimpleDB Query" - http://aws.amazon.com/articles/1232

http://typica.s3.amazonaws.com/com/xerox/amazonws/sdb/DataUtils.html

下壹個目標 2024-08-18 22:54:28

只需在固定的列宽中写入带有前导零的数值,然后像平常一样写入字符串。像这样:

0.1 -> 0000000.1000000
123 -> 0000123.0000000
foo -> foo
X   -> X

然后您可以按文本排序(例如,不带 -n 的 Unix sort)。怎么样?

Just write numeric values in a fixed column width with leading zeros, and strings as normal. So like this:

0.1 -> 0000000.1000000
123 -> 0000123.0000000
foo -> foo
X   -> X

Then you can sort as text (e.g. Unix sort without -n). How about that?

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文