文本操作,同时保持原始位置偏移
我需要在Java中操作大字符串(删除和添加已删除的 再次字符,移动字符),但仍然想记住 原始位置偏移。 例如,单词“computer”从 offset 开始 原文中的133,然后移动到位置244,我仍然 想要知道它最初位于位置 133 的信息。 最丑陋(且资源匮乏)的解决方案是存储 每个角色的原始位置加上它的位置变化。 那里 肯定是更好的解决方案,但也更复杂。 有没有好的文本操作库可以解决 我的问题? 我不想重新发明轮子。
问候, 凯
I need to manipulate large strings in Java (deleting and adding the deleted
chars again, moving chars around), but still want to remember the
original position offsets. E.g. the word "computer" starts at offset
133 in the original text and is then moved to position 244, I still
want the info that it was originally at position 133.
The most ugly (and resource hungry) solution would be to store for
every character its original position plus it's position change. There
are surely better solutions, but also more complex ones.
Are there any good text manipulation libraries that have a solution to
my problem? I don't want to reinvent the wheel.
Regards,
Kai
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
这些弦有多大? 考虑到当今可用的内存量,暴力破解可能是最佳选择。
你谈论移动单词,但存储角色位置。 为什么不存储单词位置以及每个单词实例的历史记录。 请注意,您可以聪明地利用 flyweight 模式 来保存这些实例的多个实例对象,直到您需要为止。 即您的“字符串”对象保存一个“计算机”单词对象,但记录该单词出现在位置 133、245、667 等(加上您需要时的历史记录)
How large are these strings ? Given the quantities of memory available today, brute force may be the way to go.
You talk about moving words, but storing character positions. Why not store word positions, and a history per instance of word. Note that you could be clever and make use of the flyweight pattern to save having multiple instances of these objects until you require. i.e. your 'string' object holds one 'computer' word object, but records that that word occurs at position 133, 245, 667 etc. (plus history as and when you need it)
您提到的问题正式称为“字符串到字符串校正问题”,与 Delta 编码 和 Levenshtein 距离。 这里是计算距离的代码(Java 语言)。 所有差异代码都在那里,您只需添加跟踪步骤的代码,以便您可以反转或跟踪它们。 注意:“移动”单词或字符将是同时出现的同一单词的删除/插入对。
这应该适用于字符、单词和子字符串移动。
The problem you are referring to is officially called the "String-to-string correction problem" which is related to Delta Encoding and the Levenshtein Distance. Here is code to compute the distance (it's in Java). All the differencing code is there, you simply have to add code that keeps track of the steps so you can reverse them or track them. Note: "moving" a word or character would be a delete/insert pair of the same word that occurs together.
This should work for both character, word, and substring moves.
在强调效率之前,先进行一下粗略的计算。 当您对此感到满意并拥有代码时,您可以使用分析器/秒表进行仔细检查。
Swing 文本形式有一个现成的解决方案。 它应该可以在 Swing 上下文之外使用,尽管 IIRC 它尝试在 EDT 上引发异常(以典型的 Swing 线程敌对方式) - 可能需要检查这一点。 即使在插入和删除之后,也有一些
Position
对象可以跟踪Document
中的字符位置。 如果没有别的事,它将展示如何完成它。 据推测,Apache Harmony 实现附带了适合大多数普通人的许可证。Before getting to stressed about efficiency, do a back of an envelope calculation. When you are okay with that and have code, you can double check with a profiler/stopwatch.
There is a ready made solution in the form of Swing text. It should be usable outside of a Swing context, although IIRC it tries to fire exceptions on the EDT (in the typical Swing thread-hostile way) - might want to check on that. There are
Position
objects that keep track of character positions within aDocument
even after insertions and deletions. If nothing else, it'll show how it can be done. Presumably the Apache Harmony implementation comes with a licence suitable for most normal people.