约束序列到索引映射

发布于 2024-07-10 03:41:15 字数 803 浏览 11 评论 0原文

我对如何将一组序列映射到连续整数感到困惑。

所有序列都遵循这个规则：

A_0 = 1
A_n >= 1
A_n <= max(A_0 .. A_n-1) + 1

我正在寻找一个解决方案，在给定这样一个序列的情况下，计算一个整数以进行表查找，并在给定表索引的情况下生成序列。

示例：对于长度 3，有 5 个有效序列。用于执行以下映射（最好是双向）的快速函数将是一个很好的解决方案。

1,1,1   0
1,1,2   1
1,2,1   2
1,2,2   3
1,2,3   4

练习的重点是获得一个压缩表，其中有效序列和单元格之间具有 1-1 映射。
该集合的大小仅受可能的唯一序列的数量的限制。
我现在不知道序列的长度是多少，但它将是一个预先知道的小于 12 的小常数。
我迟早会谈到这个问题，但尽管如此，我还是会把它扔掉，让社区同时享受“乐趣”。

不同的有效序列，

1,1,2,3,2,1,4
1,1,2,3,1,2,4
1,2,3,4,5,6,7
1,1,1,1,2,3,2

这些是

1,2,2,4
2,
1,1,2,3,5

与此无关

原文

I'm puzzling over how to map a set of sequences to consecutive integers.

All the sequences follow this rule:

A_0 = 1
A_n >= 1
A_n <= max(A_0 .. A_n-1) + 1

I'm looking for a solution that will be able to, given such a sequence, compute a integer for doing a lookup into a table and given an index into the table, generate the sequence.

Example: for length 3, there are 5 the valid sequences. A fast function for doing the following map (preferably in both direction) would be a good solution

1,1,1   0
1,1,2   1
1,2,1   2
1,2,2   3
1,2,3   4

The point of the exercise is to get a packed table with a 1-1 mapping between valid sequences and cells.
The size of the set in bounded only by the number of unique sequences possible.
I don't know now what the length of the sequence will be but it will be a small, <12, constant known in advance.
I'll get to this sooner or later, but though I'd throw it out for the community to have "fun" with in the meantime.

these are different valid sequences

1,1,2,3,2,1,4
1,1,2,3,1,2,4
1,2,3,4,5,6,7
1,1,1,1,2,3,2

these are not

1,2,2,4
2,
1,1,2,3,5

Related to this

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

雄赳赳气昂昂 2024-07-17 03:41:15

有一个自然的序列索引，但没有那么容易计算。

让我们寻找 A_n（n>0），因为 A_0 = 1。

索引分两步完成。

第 1 部分：

按 A_n = max(A_0 .. A_n-1) + 1 的位置对序列进行分组。将这些位置称为“步骤”。

步骤上是连续的数字（2,3,4,5,...）。
在非步骤位置，我们可以输入从 1 到索引小于 k 的步骤数的数字。

每个组可以表示为二进制字符串，其中 1 为步骤，0 为非步骤。例如001001010表示具有112aa3b4c的基团，a≤2，b≤3，c≤4。因为组是用二进制数索引的，所以存在组的自然索引。从 0 到 2^length - 1。让我们调用组二进制表示的值组顺序。

第 2 部分：

组内的索引序列。由于组定义了步骤位置，因此只有非步骤位置上的数字是可变的，并且它们在定义的范围内是可变的。这样就可以很容易地按照可变位置的字典顺序索引该组内给定组的序列。

计算一组中的序列数很容易。它的形式为 1^i_1 * 2^i_2 * 3^i_3 * ....

组合：

这给出了 2 部分键：然后需要将其映射到整数。为此，我们必须找出有序数小于某个值的组中有多少个序列。为此，我们首先找出给定长度的组中有多少个序列。这可以通过遍历所有组并对序列数求和或与递归类似来计算。令T(l,n)为长度为l的序列的数量（省略A_0），其中第一个元素的最大值可以是n+1。比成立：

T(l,n) = n*T(l-1,n) + T(l-1,n+1)
T(1,n) = n

因为 l + n <= 序列长度 + 1 有 ~sequence_length^2/2 T(l,n) 值，可以轻松计算。

接下来是计算阶数小于或等于给定值的组中的序列数。这可以通过对 T(l,n) 值求和来完成。例如，顺序 <= 1001010 二进制的组中的序列数等于

T(7,1) +         # for 1000000
2^2 * T(4,2) +   # for 001000
2^2 * 3 * T(2,3) # for 010

优化：

这将给出映射，但组合关键部分的直接实现最多是>O(1)。另一方面，键的 Steps 部分很小，通过计算每个 Steps 值的 Groups 范围，查找表可以将其减少到O(1)。

我对上面的公式不是100%确定，但应该是类似的。

通过这些注释和递归，可以使函数序列 -> 索引和索引-> 顺序。但并不是那么微不足道:-)

There is a natural sequence indexing, but no so easy to calculate.

Let look for A_n for n>0, since A_0 = 1.

Indexing is done in 2 steps.

Part 1:

Group sequences by places where A_n = max(A_0 .. A_n-1) + 1. Call these places steps.

On steps are consecutive numbers (2,3,4,5,...).
On non-step places we can put numbers from 1 to number of steps with index less than k.

Each group can be represent as binary string where 1 is step and 0 non-step. E.g. 001001010 means group with 112aa3b4c, a<=2, b<=3, c<=4. Because, groups are indexed with binary number there is natural indexing of groups. From 0 to 2^length - 1. Lets call value of group binary representation group order.

Part 2:

Index sequences inside a group. Since groups define step positions, only numbers on non-step positions are variable, and they are variable in defined ranges. With that it is easy to index sequence of given group inside that group, with lexicographical order of variable places.

It is easy to calculate number of sequences in one group. It is number of form 1^i_1 * 2^i_2 * 3^i_3 * ....

Combining:

This gives a 2 part key: <Steps, Group> this then needs to be mapped to the integers. To do that we have to find how many sequences are in groups that have order less than some value. For that, lets first find how many sequences are in groups of given length. That can be computed passing through all groups and summing number of sequences or similar with recurrence. Let T(l, n) be number of sequences of length l (A_0 is omitted ) where maximal value of first element can be n+1. Than holds:

T(l,n) = n*T(l-1,n) + T(l-1,n+1)
T(1,n) = n

Because l + n <= sequence length + 1 there are ~sequence_length^2/2 T(l,n) values, which can be easily calculated.

Next is to calculate number of sequences in groups of order less or equal than given value. That can be done with summing of T(l,n) values. E.g. number of sequences in groups with order <= 1001010 binary, is equal to

T(7,1) +         # for 1000000
2^2 * T(4,2) +   # for 001000
2^2 * 3 * T(2,3) # for 010

Optimizations:

This will give a mapping but the direct implementation for combining the key parts is >O(1) at best. On the other hand, the Steps portion of the key is small and by computing the range of Groups for each Steps value, a lookup table can reduce this to O(1).

I'm not 100% sure about upper formula, but it should be something like it.

With these remarks and recurrence it is possible to make functions sequence -> index and index -> sequence. But not so trivial :-)

回复收藏 0 原文

难忘№最初的完美 2024-07-17 03:41:15

我认为没有排序的散列应该是问题。

由于 A0 总是以 0 开头，所以我认为我们可以将序列视为以 12 为基数的数字，并使用其以 10 为基数作为查找的关键字。（对此仍然不确定）。

回复收藏 0 原文

情徒 2024-07-17 03:41:15

这是一个 python 函数，假设您将这些值存储在文件中并将这些行传递给该函数，它可以为您完成这项工作

def valid_lines(lines):
    for line in lines:
        line = line.split(",")
        if line[0] == 1 and line[-1] and line[-1] <= max(line)+1:
            yield line

lines = (line for line in open('/tmp/numbers.txt'))
for valid_line in valid_lines(lines):
    print valid_line

This is a python function which can do the job for you assuming you got these values stored in a file and you pass the lines to the function

def valid_lines(lines):
    for line in lines:
        line = line.split(",")
        if line[0] == 1 and line[-1] and line[-1] <= max(line)+1:
            yield line

lines = (line for line in open('/tmp/numbers.txt'))
for valid_line in valid_lines(lines):
    print valid_line

回复收藏 0 原文