字符串表示:对绳索的改进?

发布于 2024-09-05 14:48:50 字数 354 浏览 4 评论 0原文

我想要一种具有快速串联和编辑操作的字符串表示形式。我已阅读论文“绳索:字符串的替代品”,但有自1995年以来这方面有任何重大改进吗?

编辑:我之前考虑过的一种可能性是使用 2-3 手指树,字符串为叶,不过我还没有对此做过详细的分析;这给出了两端的摊销常数时间添加/删除和对数(较小字符串的块数)串联,而不是绳索的反之亦然。

I want a representation for strings with fast concatenation and editing operations. I have read the paper "Ropes: an Alternative to Strings", but have there been any significant improvements in this area since 1995?

EDIT: One possibility I've considered before is using a 2-3 finger tree with strings as leaves, but I have not done a detailed analysis of this; this gives amortized constant-time addition/deletion on ends and logarithmic (in the number of chunks of the smaller string) concatenation, as opposed to vice versa for ropes.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

懷念過去 2024-09-12 14:48:50

这是一个老问题了!我想知道是否有人读过这篇文章。但它仍然很有趣。
在您的评论中,您说您寻找:

更快的渐近,或常数
因素,或更少的内存使用

好吧,绳子有 O(1) 插入和 O(n) 迭代。你没有比这更好的了。子字符串和索引显然会更加昂贵。但大型文档的大多数用例不需要编辑或随机访问。如果仅在末尾连接,一维向量/字符串列表可以改善插入时间常数。我曾经在 JavaScript 中使用它,因为它的字符串连接速度很慢。

据说内存表示的效率低于使用字符串。
我怀疑:如果您使用具有垃圾收集功能的语言,绳索允许您在多个地方使用相同的字符串片段实例。在代表 HTML 文档的绳索中,将有许多 DIVSPANLINK 元素。假设这些标签是编译时常量,并且您直接将它们添加到绳索中,这甚至可能会自动发生。即使对于这样的简短短语,绳索文档的大小也会显着减小,达到与原始字符串相同的数量级。较长的琴弦会产生净增益。

如果您还使树元素只读,则可以创建子绳(表示为绳索的较长短语),该子绳出现多次或在基于绳索的字符串之间共享。这种共享的缺点是这样的分片绳部分无法更改:要编辑它们,或者平衡树,您需要复制对象图。但如果您主要进行连接和迭代,那并不重要。在 Web 服务器中,您可以保留一个代表 CSS 样式表声明的子字符串,该声明在该服务器提供的所有 HTML 文档之间共享。

This is an old question! I wonder if anyone reads this. But still it's intrigueing.
In your comments, you say you look for:

Faster asymptotics, or constant
factors, or less memory use

Well, ropes have O(1) insertion, and O(n) iteration. You can't do better than that. Substrings and indexing is obviously going to be more costly. But most use cases for large documents don't require editing or random access. If you only concatenate at the end, a 1D vector/list of strings could improve the insertion time constant. I used to use this in JavaScript because it had such slow string concatentation.

It is said that memory representation is less efficient than using strings.
I doubt that: If you work in a language that has garbage collection, the rope allows you to use the same string fragment instance in multiple places. In a rope that represents a HTML document, there will be many DIV's, SPAN's and LINK elements. This might even happen automatically assuming these tags are compile time constants, and you add them to the rope directly. Even for such short phrases, the rope document will reduce in size significantly, to the same order of magnitude as the original string. Longer strings will produce a net gain.

If you also make the tree elemenst read only, you can create subropes (longer phrases expressed as ropes), that occur multiple times or are shared across rope based strings. The downside of this sharing is that such shard rope sections can't be changed: to edit them, or to balance the tree you need to copy the object graph. But that does not matter if you mostly concatenate and iterate. In a web server, you can keep a subrope that repesents the CSS stylesheet declaration that is shared across all HTML documents served by that server.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文