在超过一百万个字符串的列表中按相反顺序排列字符串对?

发布于 2024-12-07 17:31:18 字数 184 浏览 0 评论 0原文

最近在一些采访中被问到“如果存在于超过一百万个字符串的列表中,如何找到所有字符串的反转?

对于例如 str[1] =“abc”,我需要准确地检查“cba”,没有字谜。

方法1. 将所有字符串存储在哈希集中,从第一个字符串开始遍历,如果存在,则将其配对移动到下一个元素,

如果内存是限制,您能建议任何方法吗?

Recently asked during some interview that "How to find reverse of all strings if exists in a list of more than million strings?

For E.g. str[1] = "abc", I need to check for "cba" exactly, no anagrams.

Method 1. Store all the strings in a hashset, start traversing from the first string and check for the reversed form exists in Hashset. if yes, then pair else move to next element.

Can you suggest any method if memory is the constraint?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

乱世争霸 2024-12-14 17:31:18

如果允许,您可以对字符串进行就地排序,这样当您查找字符串的反向时,您可以进行二分搜索。

If allowed, you could in-place sort the strings so when you look up the reverse of a string you can do a binary search.

心碎无痕… 2024-12-14 17:31:18

您可以使用 Bloom Filter 它会告诉您字符串是否已存在于类似哈希表的结构中,但每个桶只有 0 或 1,因此使用的空间非常少。

恰好 1 000 000 位 == 125 KB

You can use a Bloom Filter which will tell you if a string already exists within a hash table like structure, but each bucket is only 0 or 1 so very little space is used.

Exactly 1 000 000 bits == 125 KB

半步萧音过轻尘 2024-12-14 17:31:18

您可以选择HashTable并使用桶来减少哈希冲突。我们现在需要对特定的查询字符串做的只是将其反转、散列并在 HashTable 中查找,而不是从头到尾遍历。

You can choose HashTable and use buckets to reduce hash conflict. What we now need to do for a specific query string is just reverse it, hash it and find in the HashTable instead of traverse from start to then end.

雪化雨蝶 2024-12-14 17:31:18

这只是我的观点:

创建一个哈希

我将使用key=character

value=以该字符开头的字符串列表< /em>

  • 现在开始一个循环,您需要从第一个字符串开始。
  • 反转它
  • 取第一个字符并在散列中搜索该键,
  • 然后在该值中搜索,它包含字符串列表并在该列表中找到该字符串

This is jus my opinion:

I would create a hash with

key=character

value=List of string which start with that character

  • Now start a loop inside which you need to start at the first string.
  • reverse it
  • Take the first character and search for that key in the hash
  • then in the value of that,it contains the list of strings and find the string in that list
一影成城 2024-12-14 17:31:18

<罢工>
使用“内存作为约束”,那么我什至不会选择 HashSet(据我所知,它也会删除原始列表中的重复字符串),因为您将使用 a 的附加结构HashSet 需要一些内存。

排序,也不会提高内存使用率。

我将使用原始列表(已经存在,因此不会使用额外的内存)+ 3 字节整数变量来迭代列表。 3 个字节可以迭代 2^24 = 16777216 个字符串的列表

如果“内存作为约束”,我会进行 2 个 for 循环。我认为类似 C 的伪代码会比我的简单英语更容易理解。

注意:

  1. 从问题中提供的示例来看,它实际上不是一个列表,而是一个数组,所以我将像它是一个数组一样对该结构进行操作
  2. 问题不清楚如何配对这个“abc”,“def” 、“cba”、“abc”。我将把第一个“abc”与“cba”配对,并将“cba”与“第二个“abc”配对(问题中的意图不清楚)
  3. 我认为我们无法修改原始列表

这是最少的记忆-我能想到的消耗代码:

// "list" holds the original list (array)
for (int i = 0; i < length(list) - 1; i++) {
    for (int j = i + 1; j < length(list); j++) {
        if (list[i] == reverse(list[j])) {
            print(list[i] + " reversed is " list[j])
        }
    }
}

关于内存使用情况,这个解决方案将采用2个整数变量(通常每个4字节)+原始列表,我认为我们无法摆脱

关于CPU使用情况(实际上,基于不相关 。的问题),字符串反转的次数将为:(N*(N+1))/2,其中 N 是列表的长度


With "memory as a constraint" then I wouldn't even go for a HashSet (which, afaik will also remove the duplicated strings in the original list) because you'll be using the additional structure of a HashSet which takes some memory.

Sorting, wouldn't improve memory usage either.

I would use the original list (which is already there, so no extra memory will be used) + a 3 byte integer variable to iterate the list. 3 bytes can iterate over a list of 2^24 = 16777216 strings

With "memory as a constraint" I would go for 2 for loops. I think a C-Like pseudocode will be easier to understand that my plain english.

Notes:

  1. From the example provided in the question, it is not actually a List but an Array, so I will operate over the structure as if it was an Array
  2. The question is not clear whether how to pair this "abc","def","cba","abc". I will be pairing the first "abc" with "cba" and also that "cba" with "the second "abc" (the intention is unclear in the question)
  3. I assume we can't modify the original list

Here is the least memory-consumption code I can think of:

// "list" holds the original list (array)
for (int i = 0; i < length(list) - 1; i++) {
    for (int j = i + 1; j < length(list); j++) {
        if (list[i] == reverse(list[j])) {
            print(list[i] + " reversed is " list[j])
        }
    }
}

Regarding memory usage, this solution will take 2 integer variables (usually 4 bytes each) + the original list, which I assume we can not get rid of.

Regarding CPU usage (actually, not relevant based on the question), the amount of times a strings will be reversed will be: (N*(N+1))/2 where N is the length of the list

帅冕 2024-12-14 17:31:18

首先,我将使用与方向无关的散列对字符串进行散列。这可能是一个简单的字符总和,尽管肯定有更好的方案可以从两端进行散列。为了“使交易更顺利”,人们可以将字符串长度附加到哈希值中,或者以其他方式将其合并到哈希中。

然后,当您将字符串分成相同的哈希组时,进行“长手”比较。

请注意,使用此方案或仅向前或向后使用方向相关哈希的方案,要做的事情是不要立即将字符串插入哈希集中,而是检查它(如果需要,使用反向哈希)首先,如果你得到一个匹配(并且随后的长比较为真),则删除已经散列的字符串并将两者配对。第二个字符串永远不会进入集合,并且,如果所有字符串最多都匹配,则哈希集中只会有 500,000 个条目,并且,如果字符串是随机的,则可能接近 250,000 个(我没有坐过)来计算概率)。

因此,您只需要遍历一组字符串即可完成整个操作。

First I would hash the strings using a hash that was independent of direction. This could be a simple sum of characters, though there are certainly better schemes that would hash from both ends. And to "sweeten the deal" one could append string length to the hash value, or otherwise incorporate it in the hash.

Then when you have the strings broken into identical hash groups, do the "long hand" compare.

Note that, using either this scheme or the one where you simply use a direction dependent hash forwards or backwards, the thing to do is to not immediately insert the string into the hash set, but rather check it (with the reversed hash if needed) first, and if you get a match (and the subsequent long compare is true) remove the already-hashed string and pair the two. The second string never goes into the set, and, if all the strings have matches at most you'd only ever have 500,000 entries in the hash set, and, if the strings were random, probably closer to 250,000 (I haven't sat down to figure the probabilities).

So you'd need only one pass through the set of strings to do the whole thing.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文