使用 ASCII 字符和对字符串进行二进制搜索?

发布于 2024-12-08 16:04:56 字数 407 浏览 0 评论 0原文

我正在做一个小型项目 - 使用链表的学生数据库,这是我第一学期的一部分。规范是,用户应该能够使用名称首字母(结构中的 char[4])搜索记录。

现在有两种方法来搜索缩写,一种是通过线性搜索,这确实效率很低(我实际上并不关心这一点,因为这不会是某些公司的基本内容等),另一种是通过二分搜索。

二分搜索需要排序数组,所以我在想使用字符串的 ASCII 和进行搜索是否有意义?

例如,记录 1 的首字母为“AB”,记录 2 的首字母为“CD”。两者的 ASCII 总和为 65+66 = 131 & 67+68 = 135 并且列表使用缩写(使用 strcmp)排序。

因此,当用户输入“AB”时,我只会查找数字 131,如果存在,则显示记录?

这可能是一个非常糟糕的主意,请不要对我发火并解释为什么这是一个坏主意。

I'm doing a mini project - student database using linked list, part of my first semester. The specification is that, user should be able to search a record using name initials, which is a char[4] in the structure.

Now there are two ways to search the initials, one by linear search which is indeed inefficient (I don't care actually about that, because this is not going to be some firm's basic stuff, etc.) or by binary search.

Binary search requires sorted arrays, so I was thinking if searching using the ASCII sum of the string would make any sense?

For example, record 1 has initial = "AB" and record 2 has "CD". ASCII sums of both are 65+66 = 131 & 67+68 = 135 and the list is sorted using initials (using strcmp).

So when the user inputs "AB", I'll just look for the number 131 and if present, show the record?

This might be a very bad idea, please don't flame on me and explain why it is a bad one.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

如梦 2024-12-15 16:04:56

对我来说似乎是一个好的开始。您将如何区分“TON”和“NOT”它们的总和是否相同(“碰撞”)?您是否建议采用两层方法?首先使用 ascii-sum 搜索,然后使用某种方法来解决冲突?似乎这里有一些关于散列的好信息: http://burtleburtle.net/bob/hash/index .html

Seems like a good start to me. How will you distinguish between "TON" and "NOT" Will they sum to the same value (a 'collision')? Are you suggesting a two tiered approach? First with the ascii-sum search and second with some method to sort out the collisions? Seem like there's some good info on hashing here: http://burtleburtle.net/bob/hash/index.html

戈亓 2024-12-15 16:04:56

如果我理解正确的话,那么这将是一种非常错误的搜索缩写的方式。我看到的第一个问题是:

AD = 65+68 = 133
BC = 66+67 = 133

事实证明它们确实无法区分。但是比较两个字母,甚至可能只是连接 ASCII 值有什么问题呢?

AD = 65.68 = 6568
BC = 66.67 = 6667

没怎么睡,可能写的东西都写完了。

If i have understood correctly, then it would be a very faulty way of searching for initials. The first problem i see is:

AD = 65+68 = 133
BC = 66+67 = 133

Turns out that they really aren't distinguishable. But what is wrong with comparing two letters, or even, perhaps just concatenating the ASCII value?

AD = 65.68 = 6568
BC = 66.67 = 6667

Haven't slept a lot, perhaps what i write is all off.

阳光的暖冬 2024-12-15 16:04:56

会有很多碰撞。使用可扩展哈希:

Wikipedia

算法解释

There will be many collisions. Go for Extendible Hashing:

Wikipedia

Algorithm explained

旧伤慢歌 2024-12-15 16:04:56

如果你无论如何都要构造一个排序数组,那么计算这个(有损的、有偏差的)哈希值并在排序列表中搜索它是没有意义的——在列表上进行二分搜索来查找直接缩写。

If you're going to construct a sorted array anyway, there's no point in computing this (lossy, biased) hash value and searching for that in the sorted list - it will be just as fast to do a binary search on the list for the initials directly.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文