网格数据结构
通常,“可扩展”网格表示为列表列表(行列表,每行都有单元格列表),这些列表是某种链接列表。
在这个数据结构中操作(删除、插入)行既简单又便宜,只需重新链接以前的节点即可,但是当涉及到列时,例如删除列,它会变成一个非常长的操作,我需要“循环” ' 要删除索引单元格的所有行。显然这不是一个好的行为,至少对于我来说是这样。
我这里不是在谈论数据库;而是在谈论数据库。我发现的一个很好的例子是将一些文本文件放入文本编辑器中,(据我所知)文本编辑器主要将行分成行,并且很容易删除行。我希望删除一列与删除某些行一样便宜且高效。
最后,我需要的是一些多维网格,但我认为任何二维简单网格都适用于MD,我对吗?
Usually ‘expandable’ grids are represented as list of lists (list of rows, each row has list of cells) and those lists are some kind of linked lists.
Manipulating (removing, inserting) rows in this data structure is easy and inexpensive, it’s just matter of re-linking previous nodes, but when it comes to columns, for instance removing a column it become a very long operation, I need to ‘loop’ all rows to remove that indexes cells. Clearly this isn’t good behavior, at least for my case.
I’m not talking databases here; a good example I’ve found for this is some text file into a text editor, (as I know) text editors mostly splitting lines into rows and it’s easy to remove line. I want removing a column is as inexpensive and efficient as removing some row.
Finally, what I need is some Multi-dimensional grid but I think of any 2d simple grid would be applicable for MD, Am I right?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
我知道“链表”通常从理论角度受到赞赏,但在实践中它们通常效率低下。
我建议转向随机访问容器以获得一些速度。最简单的是数组,但双端队列或索引跳跃列表/B* 树可能更好,具体取决于我们正在讨论的数据大小。
从概念上讲,它并没有太大变化,但是您可以在 O(1)(数组、双端队列)/O(log N)(跳过列表/B* 树)操作中移动到给定索引,而不是比简单链表的 O(N) 复杂度。
然后是施展魔法的时间了。
Keith 已经揭示了基本思想:您只需将其标记为已删除,然后在遍历结构时“跳转”到其上方,而不是实际删除该列。然而,哈希表需要线性遍历才能到达第 N 列。使用芬威克树将产生一种计算真实索引的有效方法,然后您可以直接跳到那里。
请注意,将行标记为已删除的一个主要好处是明显可以进行
撤消
操作。另请注意,您可能希望构建一个压缩函数,以不时消除已删除的列,而不是让它们累积。
I know that "Linked-Lists" are usually appreciated from a theoretical point of view, but in practice they are generally inefficient.
I would suggest moving toward Random Access Containers to get some speed. The most simple would be an array, but a double-ended queue or an indexed skip list / B* tree could be better, depending on the data size we are talking about.
Conceptually, it doesn't change much (yet), however you get the ability to move to a given index in O(1) (array, deque) / O(log N) (skip list / B* tree) operations, rather than O(N) with a simple linked-list.
And then it's time for magic.
Keith has already exposed the basic idea: rather than actually deleting the column, you just need to mark it as deleted and then 'jump' above it when you walk your structure. However a hash table requires a linear walk to get to the Nth column. Using a Fenwick Tree would yield an efficient way to compute the real index, and you could then jump directly there.
Note that a key benefit of marking a row as deleted is the obvious possibility of an
undo
operation.Also note that you might want to build a compacting function, to eliminate the deleted columns from time to time, and not let them accumulate.
您可以有一个二维“链接矩阵”(我忘记了正确的术语):
每个单元格都有四个邻居,如图所示。此外,您还需要行标题和列标题来指示行/列位置,以及指向每行或列中的第一个单元格。这些最容易表示为没有向上邻居的特殊单元格(对于列标题)。
在 3 和 4 之间插入新列意味着向下迭代第 3 列中的单元格 X,并插入新的右邻居 Z。这个新单元格 Z 向左链接到 X,向右链接到 Y。您还需要添加一个新的列标题,并且垂直连接新单元格。然后4之后的所有列的位置都可以重新编号(第4列变成第5列)。
插入列的插入和链接新单元格的成本为 O(n),更新列标题的成本为 O(m)。删除过程与此类似。
因为每个单元格只有四个链接,所以行插入/删除使用相同的算法。
You could have a two dimensional "linked matrix" (I forget the proper terminology):
Each cell has four neighbours, as shown. Additionally you need row and column headers that might indicate the row/column position, as well as pointing to the first cell in each row or column. These are most easily represented as special cells without an up neighbour (for column headers).
Inserting a new column between 3 and 4 means iterating down the cells X in col 3, and inserting a new right neighbour Z. This new cell Z links leftward to X and rightward to Y. You also need to add a new column header, and link the new cells vertically. Then the positions of all the columns after 4 can be renumbered (col 4 becomes col 5).
The cost of inserting a column is O(n) for inserting and linking the new cells, and again O(m) for updating the column headers. It's a similar process for deletion.
Because each cell is just four links, the same algorithms are used for row insertion/deletion.
保持现有数据结构不变。此外,在创建时为每个列指定一个唯一的 id。删除列时,只需将其 id 添加到所有已删除列 id 的哈希表中即可。每次遍历一行时,请根据哈希表检查每个元素的列 ID(需要与元素的所有其他数据一起存储),如果已删除,则将其从行中拼接出来。
如果您有每个网格元素都可以指向的每列数据结构,则哈希表和 id 是不必要的。然后你只需要在该数据结构中删除一个位。
顺便说一句,埃德蒙的计划也适合你。尽管删除长度为 n 的行或列需要 O(n) 时间,但您大概可以根据创建这 n 个元素的成本来摊销该成本,从而使删除时间摊销为 O(1) 。
Keep your existing data structure as is. In addition, give each column a unique id when it is created. When you delete the column, just add its id to a hash table of all deleted column ids. Every time you walk a row, check each element's column id (which needs to be stored along with all other data for an element) against the hash table and splice it out of the row if it has been deleted.
The hash table and ids are unnecessary if you have a per-column data structure that each grid element can point to. Then you just need a deleted bit in that data structure.
By the way, Edmund's scheme would be fine for you as well. Even though it takes O(n) time to delete a row or column of length n, you can presumably amortize that cost against the cost of creating those n elements, making the delete O(1) amortized time.