如何用Python实现数据库风格的表

发布于 2024-10-02 18:32:54 字数 707 浏览 1 评论 0原文

我正在实现一个类似于典型数据库表的类:

  • 具有命名列和未命名行,
  • 具有主键,我可以通过主键引用行,
  • 支持按主键检索和分配,并且
  • 可以要求列标题添加唯一或非唯一任何列的索引,允许快速检索在该列中具有给定值的行(或行集),
  • 删除行的速度很快,并且实现为“软删除”:该行在物理上保留,但被标记为删除并且不会出现在任何后续检索操作中
  • 添加列速度很快 行
  • 很少添加
  • 列很少被删除

我决定直接实现该类,而不是使用 sqlite 的包装器。

什么是一个好的数据结构?


举个例子,我正在考虑的一种方法是字典。它的键是表的主键列中的值;它的值是通过以下方式之一实现的行:

  1. 作为列表。列号映射到列标题(一个方向使用列表,另一个方向使用地图)。这里,检索操作首先将列标题转换为列号,然后在列表中查找相应的元素。

  2. 作为字典。列标题是该字典的键。

不确定两者的优缺点。


我想编写自己的代码的原因是:

  • 我需要跟踪行删除。也就是说,我希望能够随时报告哪些行被删除以及出于什么“原因”(“原因”被传递给我的删除方法)。
  • 我在索引期间需要一些报告(例如,在构建非唯一索引时,我想检查某些条件并报告它们是否被违反)

I am implementing a class that resembles a typical database table:

  • has named columns and unnamed rows
  • has a primary key by which I can refer to the rows
  • supports retrieval and assignment by primary key and column title
  • can be asked to add unique or non-unique index for any of the columns, allowing fast retrieval of a row (or set of rows) which have a given value in that column
  • removal of a row is fast and is implemented as "soft-delete": the row is kept physically, but is marked for deletion and won't show up in any subsequent retrieval operations
  • addition of a column is fast
  • rows are rarely added
  • columns are rarely deleted

I decided to implement the class directly rather than use a wrapper around sqlite.

What would be a good data structure to use?


Just as an example, one approach I was thinking about is a dictionary. Its keys are the values in the primary key column of the table; its values are the rows implemented in one of these ways:

  1. As lists. Column numbers are mapped into column titles (using a list for one direction and a map for the other). Here, a retrieval operation would first convert column title into column number, and then find the corresponding element in the list.

  2. As dictionaries. Column titles are the keys of this dictionary.

Not sure about the pros/cons of the two.


The reasons I want to write my own code are:

  • I need to track row deletions. That is, at any time I want to be able to report which rows where deleted and for what "reason" (the "reason" is passed to my delete method).
  • I need some reporting during indexing (e.g., while an non-unique index is being built, I want to check certain conditions and report if they are violated)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

素罗衫 2024-10-09 18:32:54

您可能需要考虑创建一个在底层使用内存中 sqlite 表的类:

import sqlite3

class MyTable(object):
    def __init__(self):
        self.conn=sqlite3.connect(':memory:')
        self.cursor=self.conn.cursor()
        sql='''\
            CREATE TABLE foo ...
        '''
        self.execute(sql)
    def execute(self,sql,args):
        self.cursor.execute(sql,args)
    def delete(self,id,reason):
        sql='UPDATE table SET softdelete = 1, reason = %s where tableid = %s'
        self.cursor.execute(sql,(reason,id,))
    def verify(self):
        # Check that certain conditions are true
        # Report (or raise exception?) if violated
    def build_index(self):
        self.verify()
        ... 

可以通过使用 softdelete 列(bool 类型)来实现软删除。
同样,您可以有一列来存储删除原因。
取消删除只涉及更新行并更改 softdelete 值。
选择尚未删除的行可以通过 SQL 条件 WHERE softdelete != 1 来实现。

您可以编写一个 verify 方法来验证数据的条件是否得到满足。您可以从 build_index 方法中调用该方法。

另一种选择是使用 numpy 结构化掩码数组。

很难说什么最快。也许唯一可靠的判断方法是为每个数据编写代码,并使用 timeit 对实际数据进行基准测试。

You might want to consider creating a class which uses an in-memory sqlite table under the hood:

import sqlite3

class MyTable(object):
    def __init__(self):
        self.conn=sqlite3.connect(':memory:')
        self.cursor=self.conn.cursor()
        sql='''\
            CREATE TABLE foo ...
        '''
        self.execute(sql)
    def execute(self,sql,args):
        self.cursor.execute(sql,args)
    def delete(self,id,reason):
        sql='UPDATE table SET softdelete = 1, reason = %s where tableid = %s'
        self.cursor.execute(sql,(reason,id,))
    def verify(self):
        # Check that certain conditions are true
        # Report (or raise exception?) if violated
    def build_index(self):
        self.verify()
        ... 

Soft-delete can be implemented by having a softdelete column (of bool type).
Similarly, you can have a column to store reason for deletion.
Undeleting would simply involve updating the row and changing the softdelete value.
Selecting rows that have not been deleted could be achieved with the SQL condition WHERE softdelete != 1.

You could write a verify method to verify conditions on your data are satisfied. And you could call that method from within your build_index method.

Another alternative is to use a numpy structured masked array.

It's hard to say what would be fastest. Perhaps the only sure way to tell would be to write code for each and benchmark on real-world data with timeit.

心安伴我暖 2024-10-09 18:32:54

我会考虑构建一个带有元组或列表键的字典。例如: my_dict(("col_2", "row_24")) 会得到这个元素。从那里开始,编写“get_col”和“get_row”方法以及前面两个方法中的“get_row_slice”和“get_col_slice”来访问您的数据库将非常容易(如果对于非常大的数据库来说不是非常快)方法。

使用这样的完整字典有两个优点。 1)获取单个元素会比你提出的两种方法更快; 2)如果您想在列中具有不同数量的元素(或缺少的元素),这将使其变得非常简单且内存高效。

只是一个想法:)我很好奇人们会建议什么包!

干杯

I would consider building a dictionary with keys that are tuples or lists. Eg: my_dict(("col_2", "row_24")) would get you this element. Starting from there, it would be pretty easy (if not extremely fast for very large databases) to write 'get_col' and 'get_row' methods, as well as 'get_row_slice' and 'get_col_slice' from the 2 preceding ones to gain access to your methods.

Using a whole dictionary like that will have 2 advantages. 1) Getting a single element will be faster than your 2 proposed methods; 2) If you want to have different number of elements (or missing elements) in your columns, this will make it extremely easy and memory efficient.

Just a thought :) I'll be curious to see what packages people will suggest!

Cheers

简单气质女生网名 2024-10-09 18:32:54

你确实应该使用 SQLite。

对于第一个原因(跟踪删除原因),您可以通过在删除时将行“移动”到第二个表来轻松实现这一点。可以在该表或您可以加入的另一个表中的附加列中跟踪原因。如果并不总是需要删除原因,那么您甚至可以在源表上使用触发器来复制要删除的行,和/或使用可以获取原因的用户定义函数。

索引原因在某种程度上受到约束等的影响,但如果没有更多细节,我无法直接解决它。

You really should use SQLite.

For your first reason (tracking deletion reasons) you can easily implement this by having a second table that you "move" rows to on deletion. The reason can be tracked in additional column in that table or another table you can join. If a deletion reason isn't always required then you can even use triggers on your source table to copy rows about to be deleted, and/or have a user defined function that can get the reason.

The indexing reason is somewhat covered by constraints etc but I can't directly address it without more details.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文