随机播放文本文件 Delphi 源代码或其他任何内容
我有一个包含 10,000 个条目的字符串列表。我有一个随机播放程序,但访问任何项目都需要花费大量时间。浏览所有 10k 项需要花费大量时间。
我想将其保存在磁盘上,然后使用另一种方法对文件进行随机播放。
有什么建议吗?
I have a stringlist with 10,000 entries. I have a shuffle routine, but accessing any of the items is taking a lot of time. Going through all 10k items takes a huge amount of time.
I want to save it do disk and then do a shuffle on the file using another method.
Any suggestions?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
你的随机播放程序是如何实现的?特别是交换例程?如果您自己编写,那么:
它会非常慢。在字符串列表上使用交换方法。
这段代码在我相当普通(3 年)的计算机上花费了 78 毫秒:
How is your shuffle-routine implemented? Especially the exchange-routine? If you have written your own, along these lines:
it will be very slow. Use the exchange-method on the stringlist.
This code took 78 ms on my pretty average (3 year old) computer:
在内存中重新排列字符串列表的速度很慢,因此我会打乱索引列表作为初始优化。
我猜您选择 stringlist 是为了方便从磁盘加载和保存到磁盘。一种更快的方法是对索引进行洗牌。创建一个包含 10,000 个整数的数组,对它们进行打乱,然后使用临时字符串变量来保存交换元素,并使用打乱后的索引值从上到下重新排列字符串列表。
主要重写将提供更大的改进,但如果您的字符串不太大,这可能会有所帮助。
Rearranging a stringlist in memory is slow, so I'd shuffle an index list as an initial optimization.
I'm guessing you chose stringlist for the convenience of loading from and saving to disk. One quicker approach would be to shuffle an index. Make an array of 10,000 integers, shuffle those, then use a temporary string variable to hold the swap element and rearrange your stringlist from top to bottom using the shuffled index values.
Major rewrites will provide greater improvements, but this may help if your strings aren't too big.
一种简单的方法是生成随机数列表,对其进行排序,然后进行数据的成对交换。排序可以使用 o(n*log(n)) 算法来完成,而交换始终是 o(n) 算法,因此速度要快得多。
以防万一您没有想到这一点,请考虑保留数据原样,并仅保存额外的洗牌索引。
An easy way is to generate a list of random numbers, sort it, and then do pairwise swaps of data later. Sorting can be done as an o(n*log(n)) algorithm, whereas swapping always is an o(n) algorithm, thus much faster.
Just in case you haven't thought of it, consider to leave the data as it is, and just save an extra shuffled index.
我之前问过一个关于创建打乱范围的问题 - 我想要一个能够迭代返回打乱数字列表的函数,而不是生成数字列表然后打乱它们,而不需要 O(n) 内存成本:
使用 PRNG 而不是随机生成随机排列的范围
如果您为磁盘上的文件创建某种索引,那么您可以创建一个打乱版本而无需支付内存成本,这对于非常大的文件来说可能很重要。对于索引,我建议使用一些简单的方法,例如每行开头的位置(作为 32 或 64 位整数)的扁平流。这样,要从文本文件中提取第 N 行,您可以简单地在索引流中查找到 N*4(或 N*8 对于 64 位索引)以发现行开头的偏移量,然后查找文本文件流中的该位置并读出一行。
使用这种方法,您可以打乱非常大的文件,而无需支付内存成本。当然,洗牌意味着从源文件中随机提取行,这不会像内存中排序那样有效,除非文件非常小(几乎在第一次访问时适合缓存)或非常大(在这种情况下内存抖动)会比随机搜索更糟糕),或者如果您不使用机械硬盘驱动器(例如SSD)。
对于你的情况来说,10K确实不是一个大数字。大约 1000 万行的内容,可能会进入几 GB 的文本(当然取决于行长度),将更具挑战性,这就是在 32 位中需要这种方法(或类似的方法)的地方。
I asked a question before about creating a shuffled range - rather than generating a list of numbers and then shuffling them, I wanted a function that was able to iteratively return a list of shuffled numbers, without the O(n) memory cost:
Generating shuffled range using a PRNG rather than shuffling
If you create some kind of index for your file on disk, then you can create a shuffled version without paying the memory cost, which can be important for very large files. For an index, I suggest something simple, like a flat stream of the positions (as 32 or 64-bit integers) of every line start. That way, to extract the Nth line out of the text file, you can simply seek in the index stream to N*4 (or N*8 for 64-bit indexes) to discover the offset of the line start, and then seek to that position in the text file stream and read out a line.
Using this approach, you can shuffle extremely large files, without paying the in-memory cost. Of course, shuffling will mean extracting lines at random from the source file, which will not be as efficient as in-memory sorting unless the file is very small (fits in cache almost on first access) or very large (in which case memory thrashing will be worse than random seeks), or perhaps if you're not using a mechanical hard drive (e.g. SSD).
For your situation, 10K is really not a large number. Something in the region of 10 million lines, perhaps getting into several gigabytes of text (depending on line length of course), will be far more challenging, and that's where this approach (or something similar) would be necessary in 32-bit.