当前位置：文江博客话题详情

在超过一百万个字符串的列表中按相反顺序排列字符串对？

发布于 2024-12-07 17:31:18 字数 184 浏览 0 评论 0原文

最近在一些采访中被问到“如果存在于超过一百万个字符串的列表中，如何找到所有字符串的反转？

对于例如 str[1] =“abc”，我需要准确地检查“cba”，没有字谜。

方法1. 将所有字符串存储在哈希集中，从第一个字符串开始遍历，如果存在，则将其配对移动到下一个元素，

如果内存是限制，您能建议任何方法吗？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

乱世争霸 2024-12-14 17:31:18

如果允许，您可以对字符串进行就地排序，这样当您查找字符串的反向时，您可以进行二分搜索。

回复收藏 0 原文

心碎无痕… 2024-12-14 17:31:18

您可以使用 Bloom Filter 它会告诉您字符串是否已存在于类似哈希表的结构中，但每个桶只有 0 或 1，因此使用的空间非常少。

恰好 1 000 000 位 == 125 KB

回复收藏 0 原文

半步萧音过轻尘 2024-12-14 17:31:18

您可以选择HashTable并使用桶来减少哈希冲突。我们现在需要对特定的查询字符串做的只是将其反转、散列并在 HashTable 中查找，而不是从头到尾遍历。

回复收藏 0 原文

雪化雨蝶 2024-12-14 17:31:18

这只是我的观点：

创建一个哈希

我将使用key=character

value=以该字符开头的字符串列表< /em>

现在开始一个循环，您需要从第一个字符串开始。
反转它
取第一个字符并在散列中搜索该键，
然后在该值中搜索，它包含字符串列表并在该列表中找到该字符串

回复收藏 0 原文

一影成城 2024-12-14 17:31:18

<罢工>
使用“内存作为约束”，那么我什至不会选择 HashSet（据我所知，它也会删除原始列表中的重复字符串），因为您将使用 a 的附加结构HashSet 需要一些内存。

~~排序，也不会提高内存使用率。~~

~~我将使用原始列表（已经存在，因此不会使用额外的内存）+ 3 字节整数变量来迭代列表。 3 个字节可以迭代 2^24 = 16777216 个字符串的列表~~

如果“内存作为约束”，我会进行 2 个 for 循环。我认为类似 C 的伪代码会比我的简单英语更容易理解。

注意：

从问题中提供的示例来看，它实际上不是一个列表，而是一个数组，所以我将像它是一个数组一样对该结构进行操作
问题不清楚如何配对这个“abc”，“def” 、“cba”、“abc”。我将把第一个“abc”与“cba”配对，并将“cba”与“第二个“abc”配对（问题中的意图不清楚）
我认为我们无法修改原始列表

这是最少的记忆-我能想到的消耗代码：

// "list" holds the original list (array)
for (int i = 0; i < length(list) - 1; i++) {
    for (int j = i + 1; j < length(list); j++) {
        if (list[i] == reverse(list[j])) {
            print(list[i] + " reversed is " list[j])
        }
    }
}

关于内存使用情况，这个解决方案将采用2个整数变量（通常每个4字节）+原始列表，我认为我们无法摆脱

关于CPU使用情况（实际上，基于不相关。的问题），字符串反转的次数将为：(N*(N+1))/2，其中 N 是列表的长度

With "memory as a constraint" then I wouldn't even go for a HashSet (which, afaik will also remove the duplicated strings in the original list) because you'll be using the additional structure of a HashSet which takes some memory.

~~Sorting, wouldn't improve memory usage either.~~

~~I would use the original list (which is already there, so no extra memory will be used) + a 3 byte integer variable to iterate the list. 3 bytes can iterate over a list of 2^24 = 16777216 strings~~

With "memory as a constraint" I would go for 2 for loops. I think a C-Like pseudocode will be easier to understand that my plain english.

Notes:

From the example provided in the question, it is not actually a List but an Array, so I will operate over the structure as if it was an Array
The question is not clear whether how to pair this "abc","def","cba","abc". I will be pairing the first "abc" with "cba" and also that "cba" with "the second "abc" (the intention is unclear in the question)
I assume we can't modify the original list

Here is the least memory-consumption code I can think of:

// "list" holds the original list (array)
for (int i = 0; i < length(list) - 1; i++) {
    for (int j = i + 1; j < length(list); j++) {
        if (list[i] == reverse(list[j])) {
            print(list[i] + " reversed is " list[j])
        }
    }
}

Regarding memory usage, this solution will take 2 integer variables (usually 4 bytes each) + the original list, which I assume we can not get rid of.

Regarding CPU usage (actually, not relevant based on the question), the amount of times a strings will be reversed will be: (N*(N+1))/2 where N is the length of the list

回复收藏 0 原文

帅冕 2024-12-14 17:31:18

首先，我将使用与方向无关的散列对字符串进行散列。这可能是一个简单的字符总和，尽管肯定有更好的方案可以从两端进行散列。为了“使交易更顺利”，人们可以将字符串长度附加到哈希值中，或者以其他方式将其合并到哈希中。

然后，当您将字符串分成相同的哈希组时，进行“长手”比较。

请注意，使用此方案或仅向前或向后使用方向相关哈希的方案，要做的事情是不要立即将字符串插入哈希集中，而是检查它（如果需要，使用反向哈希）首先，如果你得到一个匹配（并且随后的长比较为真），则删除已经散列的字符串并将两者配对。第二个字符串永远不会进入集合，并且，如果所有字符串最多都匹配，则哈希集中只会有 500,000 个条目，并且，如果字符串是随机的，则可能接近 250,000 个（我没有坐过）来计算概率）。

因此，您只需要遍历一组字符串即可完成整个操作。

回复收藏 0 原文

~没有更多了~