已排序的TStringList,排序是如何工作的?

发布于 2024-10-19 09:56:06 字数 234 浏览 9 评论 0原文

我只是很好奇,因为最近我看到了 Java 中 Hashmap 的使用,并且想知道 Delphi 的排序字符串列表是否相似。

TStringList 对象是否生成一个哈希来用作每个项目的索引?如何通过 Find 函数对照字符串列表检查搜索字符串?

我经常使用 Sorted TStringLists,我只是想更多地了解发生了什么。

请假设我不知道哈希映射是如何工作的,因为我不知道:)

谢谢

I'm simply curious as lately I have been seeing the use of Hashmaps in Java and wonder if Delphi's Sorted String list is similar at all.

Does the TStringList object generate a Hash to use as an index for each item? And how does the search string get checked against the list of strings via the Find function?

I make use of Sorted TStringLists very often and I would just like to understand what is going on a little bit more.

Please assume I don't know how a hash map works, because I don't :)

Thanks

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

九公里浅绿 2024-10-26 09:56:06

我一般性地将这个问题解释为对列表和字典的概述的请求。

  • 几乎所有人都知道,列表是一个由连续整数索引的容器。
  • 哈希映射字典关联数组是一个容器,其索引可以是任何类型。字典通常是用字符串索引的。

为了便于讨论,让我们将列表称为 L ,将字典称为 D

列表具有真正的随机访问。如果您知道某个项目的索引,则可以在恒定时间内查找该项目。字典的情况并非如此,它们通常采用基于哈希的算法来实现高效的随机访问。

当您尝试查找值时,排序列表可以执行二分搜索。查找值 V 是获取索引 I 的行为,使得 L[I]=V。二分查找非常有效。如果列表未排序,则必须执行线性搜索,效率要低得多。排序列表可以使用插入排序来维护列表的顺序 - 当添加新项目时,它会被插入到正确的位置。

您可以将字典视为 对的列表。您可以迭代所有对,但更常见的是使用索引表示法来查找给定键的值:D[Key]。请注意,这与在列表中查找值不同 - 它类似于当您知道索引 I 时读取 L[I]

在旧版本的 Delphi 中,通常会哄骗字符串列表之外的字典行为。表现很糟糕。内容上几乎没有灵活性。

在现代 Delphi 中,有 TDictionary,这是一个可以容纳任何内容的通用类。该实现使用了哈希,虽然我没有亲自测试过它的性能,但我认为它是值得尊敬的。

有一些常用的算法可以最佳地使用所有这些容器:未排序的列表、排序的列表、字典。您只需使用正确的方法来解决当前的问题即可。

I'm interpreting this question, quite generally, as a request for an overview of lists and dictionaries.

  • A list, as almost everyone knows, is a container that is indexed by contiguous integers.
  • A hash map, dictionary or associative array is a container whose index can be of any type. Very commonly, a dictionary is indexed with strings.

For sake of argument let us call our lists L and our dictionaries D.

Lists have true random access. An item can be looked-up in constant time if you know its index. This is not the case for dictionaries and they usually resort to hash-based algorithms to achieve efficient random access.

A sorted list can perform binary search when you attempt to find a value. Finding a value, V, is the act of obtaining the index, I, such that L[I]=V. Binary search is very efficient. If the list is not sorted then it must perform linear search which is much less efficient. A sorted list can use insertion sort to maintain the order of the list – when a new item is added, it is inserted at the correct location.

You can think of a dictionary as a list of <Key,Value> pairs. You can iterate over all pairs, but more commonly you use index notation to look-up a value for a given key: D[Key]. Note that this is not the same operation as finding a value in a list – it is the analogue of reading L[I] when you know the index I.

In older versions of Delphi it was common to coax dictionary behaviour out of string lists. The performance was terrible. There was little flexibility in the contents.

With modern Delphi, there is TDictionary, a generic class that can hold anything. The implementation uses a hash and although I have not personally tested its performance I understand it to be respectable.

There are commonly used algorithms that optimally use all of these containers: unsorted lists, sorted lists, dictionaries. You just need to use the right one for the problem at hand.

落花浅忆 2024-10-26 09:56:06

TStringList 将字符串保存在数组中。

如果您对未排序(Sorted 属性 = false)字符串列表调用 Sort,则会执行 QuickSort 对项目进行排序。

如果将 Sorted 设置为 true,也会发生同样的情况。

如果您在未排序的字符串列表上调用 Find(或调用 find 的 IndexOf)(Sorted 属性 = false,即使您显式调用 Sort,如果 Sorted 属性不为 true,列表也被视为未排序),则执行线性搜索比较所有字符串从头开始直到找到匹配项。

如果您在排序字符串列表上调用 Find(Sorted property = true),则会执行二分搜索(请参阅 http: //en.wikipedia.org/wiki/Binary_search 了解详细信息)。

如果将字符串添加到已排序的字符串列表中,则会执行二分搜索以确定正确的插入位置,数组中所有后续元素都会移动一位并插入新字符串。

因此,字符串列表越大,插入性能就越差。如果要将大量条目插入已排序的字符串列表中,通常最好关闭排序,插入字符串,然后将 Sorted 设置回 true 以执行快速排序。

该方法的缺点是您将无法阻止重复项的插入。

编辑:如果您想要哈希映射,请使用单元 Generics.Collections 中的 TDictionary

TStringList holds the strings in an array.

If you call Sort on an otherwise unsorted (Sorted property = false) string list then a QuickSort is performed to sort the items.

The same happens if you set Sorted to true.

If you call Find (or IndexOf which calls find) on an unsorted string list (Sorted property = false, even if you explicitly called Sort the list is considered unsorted if the Sorted property isn't true) then a linear search is performed comparing all strings from the start till a match is found.

If you call Find on a sorted string list (Sorted property = true) then a binary search is performed (see http://en.wikipedia.org/wiki/Binary_search for details).

If you add a string to a sorted string list, a binary search is performed to determine the correct insertion position, all following elements in the array are shifted by one and the new string is inserted.

Because of this insertion performance gets a lot worse the larger the string list is. If you want to insert a large number of entries into a sorted string list, it's usually better to turn sorting off, insert the strings, then set Sorted back to true which performs a quick sort.

The disadvantage of that approach is that you will not be able to prevent the insertion of duplicates.

EDIT: If you want a hash map, use TDictionary from unit Generics.Collections

生生漫 2024-10-26 09:56:06

你可以看看源代码,因为Delphi 自带了。按住 Ctrl 键并单击代码中的“排序”调用。

在非 Unicode Delphi 中,它是一种简单的字母排序,而在更高版本中,它是一种稍微复杂的 Unicode 排序。您可以为自定义排序顺序提供自己的比较。不幸的是,我没有最新版本的 Delphi,所以无法确认,但我希望在幕后有一个正确的 Unicode 感知和区域设置感知字符串比较例程。 Unicode 排序/字符串比较并不是微不足道的,稍微进行一下网络搜索就会指出一些陷阱。

当您在字符串或附加到它们的对象(Objects 属性)中具有分隔文本时,通常会提供您自己的比较例程。在这些情况下,您经常希望按对象的属性或字符串中第一个字段以外的其他内容进行排序。或者它可能就像想要对字符串进行数字排序一样简单(因此“2”出现在“1”之后而不是“19”之后)

You could look at the source code, since that comes with Delphi. Ctrl-Click on the "sort" call in your code.

It's a simple alphabetical sort in non-Unicode Delphi, and a slightly more complex Unicode one in later versions. You can supply your own comparison for custom sort orders. Unfortunately I don't have a recent version of Delphi so can't confirm, but I expect that under the hood there's a proper Unicode-aware and locale-aware string comparison routine. Unicode sorting/string comparison is not trivial and a little web searching will point out some of the pitfalls.

Supplying your own comparison routine is often done when you have delimited text in the strings or objects attached to them (the Objects property). In those cases you often wat to sort by a property of the object or something other than the first field in the string. Or it might be as simple as wanting a numerical sort on the strings (so "2" comes after "1" rather than after "19")

灯下孤影 2024-10-26 09:56:06

还有一个 THashedStringList,这可能是一个选项(尤其是在较旧的 Delphi 版本中) )。

There is also a THashedStringList, which could be an option (especially in older Delphi versions).

乄_柒ぐ汐 2024-10-26 09:56:06

顺便说一句,TStringList 的 Unicode 排序例程非常慢。如果您重写 TStringList.CompareStrings 方法,那么如果字符串仅包含 Ansi 字符(如果您只使用英语,则会出现这种情况),则可以使用自定义的 Ansi 字符串比较。我使用我自己定制的 TStringList 类来执行此操作,它比排序列表的 TStringList 类快 4 倍,无论是从列表中读取字符串还是向列表中写入字符串。

BTW, the Unicode sort routines for TStringList are quite slow. If you override the TStringList.CompareStrings method then if the strings only contain Ansi characters (which if you use English exclusively they will), you can use customised Ansi string comparisons. I use my own customised TStringList class that does this and it is 4 times faster than the TStringList class for a sorted list for both reading and writing strings from/to the list.

﹉夏雨初晴づ 2024-10-26 09:56:06

Delphi 的字典类型(在支持泛型的 Delphi 版本中)是与 Delphi 附带的哈希图最接近的类型。 THashedStringList 的查找速度比在排序字符串列表中的查找速度更快。您可以在排序的字符串列表中使用二分搜索进行查找,因此它比暴力搜索更快,但不如哈希快。

哈希的一般理论是它是无序的,但查找和插入速度非常快。排序列表的插入速度相当快,直到列表的大小变大,尽管它的插入效率不如字典。

列表的一大好处是它是有序的,而哈希查找字典则不然。

Delphi's dictionary type (in generics-enabled versions of Delphi) is the closest thing to a hashmap, that ships with Delphi. THashedStringList makes lookups faster than they would be in a sorted string list. you can do lookups using a binary search in a sorted stringlist, so it's faster than brute force searches, but not as fast as a hash.

The general theory of a hash is that it is unordered, but very fast on lookup and insertion. A sorted list is reasonably fast on insertion until the size of the list gets large, although it's not as efficient as a dictionary for insertion.

The big benefit of a list is that it is ordered but a hash-lookup dictionary is not.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文