当前位置：文江博客话题详情

在python中表达berkeley db中的多列？

发布于 2024-08-24 01:15:58 字数 103 浏览 13 评论 0原文

假设我有一个简单的表，其中包含用户名、名字、姓氏。

我如何在 berkeley Db 中表达这一点？

我目前使用 bsddb 作为接口。

干杯。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

鸢与 2024-08-31 01:15:58

您必须选择一个“列”作为键（必须是唯一的；我想在您的情况下这将是“用户名”）——这是搜索可能发生的唯一方式。其他列可以通过您喜欢的任何方式成为该键的单个字符串值，从腌制到简单连接保证永远不会出现在任何列中的字符，例如对于许多类型的“\0” “可读文本字符串”。

如果您需要能够通过不同的键进行搜索，您将需要将其他补充性和单独的 bsddb 数据库设置为主表中的“索引”——这是大量的工作，并且有很多关于该主题的文献。（或者，您可以转向更高抽象的技术，例如 sqlite，它可以为您巧妙地处理索引；-）。

回复收藏 0 原文

毁梦 2024-08-31 01:15:58

tl,dr：要在像 Berkley db 这样的有序键值存储中表达多个列，您需要了解键组合。查看我关于 bsddb 的其他答案以了解更多信息。

有多种方法可以使用有序键/值存储来做到这一点。

最简单的解决方案是使用正确的键将文档存储为 json 值。

现在，您可能希望在这些列上构建索引来检索文档，而不必迭代所有哈希图来查找正确的对象。为此，您可以使用 secondaryDB 它将自动构建索引你。或者您可以自己构建索引。

如果您不想处理密钥打包（这是启动的好主意），您可以利用DB.set_bt_compare 允许您使用 cpickle、json 或msgpack 用于键和值，同时仍然具有使创建索引和执行查询有意义的顺序。这是较慢的方法，但引入了键组合的模式。

要充分利用有序键的优势，您可以使用 Cursor.set_range(key) 来设置查询开头的数据库位置。

另一种模式称为EAV 模式，它存储遵循方案（实体、属性、值） 的元组，然后通过使用该元组的排列来构建各种索引。我通过学习原子学学到了这个模式。

对于资源消耗较少的数据库，您将采用“静态类型”方式，将尽可能多的公共信息存储在“元数据”表中，并将文档（实际上是 RDBMS 表）拆分到它们自己的哈希图中。

为帮助您入门，我们提供了一个使用 bsddb 的示例数据库（但您可以使用另一个有序键/值存储（如wiredtiger 或 leveldb）来构建它），该数据库实现了 EAV 模式。在此实现中，我将 EAV 替换为 IKV，IKV 转换为唯一标识符、键、值。总体结果是您拥有一个完全索引的无模式文档数据库。我认为这是效率和易用性之间的一个很好的折衷。

import struct

from json import dumps
from json import loads

from bsddb3.db import DB
from bsddb3.db import DBEnv
from bsddb3.db import DB_BTREE
from bsddb3.db import DB_CREATE
from bsddb3.db import DB_INIT_MPOOL
from bsddb3.db import DB_LOG_AUTO_REMOVE


def pack(*values):
    def __pack(value):
        if type(value) is int:
            return '1' + struct.pack('>q', value)
        elif type(value) is str:
            return '2' + struct.pack('>q', len(value)) + value
        else:
            data = dumps(value, encoding='utf-8')
            return '3' + struct.pack('>q', len(data)) + data
    return ''.join(map(__pack, values))


def unpack(packed):
    kind = packed[0]
    if kind == '1':
        value = struct.unpack('>q', packed[1:9])[0]
        packed = packed[9:]
    elif kind == '2':
        size = struct.unpack('>q', packed[1:9])[0]
        value = packed[9:9+size]
        packed = packed[size+9:]
    else:
        size = struct.unpack('>q', packed[1:9])[0]
        value = loads(packed[9:9+size])
        packed = packed[size+9:]
    if packed:
        values = unpack(packed)
        values.insert(0, value)
    else:
        values = [value]
    return values


class TupleSpace(object):
    """Generic database"""

    def __init__(self, path):
        self.env = DBEnv()
        self.env.set_cache_max(10, 0)
        self.env.set_cachesize(5, 0)
        flags = (
            DB_CREATE |
            DB_INIT_MPOOL
        )
        self.env.log_set_config(DB_LOG_AUTO_REMOVE, True)
        self.env.set_lg_max(1024 ** 3)
        self.env.open(
            path,
            flags,
            0
        )

        # create vertices and edges k/v stores
        def new_store(name):
            flags = DB_CREATE
            elements = DB(self.env)
            elements.open(
                name,
                None,
                DB_BTREE,
                flags,
                0,
            )
            return elements
        self.tuples = new_store('tuples')
        self.index = new_store('index')
        self.txn = None

    def get(self, uid):
        cursor = self.tuples.cursor()

        def __get():
            record = cursor.set_range(pack(uid, ''))
            if not record:
                return
            key, value = record
            while True:
                other, key = unpack(key)
                if other == uid:
                    value = unpack(value)[0]
                    yield key, value
                    record = cursor.next()
                    if record:
                        key, value = record
                        continue
                    else:
                        break
                else:
                    break

        tuples = dict(__get())
        cursor.close()
        return tuples

    def add(self, uid, **properties):
        for key, value in properties.items():
            self.tuples.put(pack(uid, key), pack(value))
            self.index.put(pack(key, value, uid), '')

    def delete(self, uid):
        # delete item from main table and index
        cursor = self.tuples.cursor()
        index = self.index.cursor()
        record = cursor.set_range(pack(uid, ''))
        if record:
            key, value = record
        else:
            cursor.close()
            raise Exception('not found')
        while True:
            other, key = unpack(key)
            if other == uid:
                # remove tuple from main index
                cursor.delete()

                # remove it from index
                value = unpack(value)[0]
                index.set(pack(key, value, uid))
                index.delete()

                # continue
                record = cursor.next()
                if record:
                    key, value = record
                    continue
                else:
                    break
            else:
                break
        cursor.close()

    def update(self, uid, **properties):
        self.delete(uid)
        self.add(uid, **properties)

    def close(self):
        self.index.close()
        self.tuples.close()
        self.env.close()

    def debug(self):
        for key, value in self.tuples.items():
            uid, key = unpack(key)
            value = unpack(value)[0]
            print(uid, key, value)

    def query(self, key, value=''):
        """return `(key, value, uid)` tuples that where
        `key` and `value` are expressed in the arguments"""
        cursor = self.index.cursor()
        match = (key, value) if value else (key,)

        record = cursor.set_range(pack(key, value))
        if not record:
            cursor.close()
            return

        while True:
            key, _ = record
            other = unpack(key)
            ok = reduce(
                lambda previous, x: (cmp(*x) == 0) and previous,
                zip(match, other),
                True
            )
            if ok:
                yield other
                record = cursor.next()
                if not record:
                    break
            else:
                break
        cursor.close()


db = TupleSpace('tmp')
# you can use a tuple to store a counter
db.add(0, counter=0)

# And then have a procedure doing the required work
# to alaways have a fresh uid
def make_uid():
    counter = db.get(0)
    counter['counter'] += 1
    return counter['counter']

amirouche = make_uid()
db.add(amirouche, username="amirouche", age=30)
print(db.get(amirouche))

tl,dr: To express multiple columns in an ordered key value store like berkley db you need to learn about key composition. Look up my other answers about bsddb to learn more.

There is several ways to do that using ordered key/value store.

The simplest solution is to store documents as json values with a correct key.

Now you probably want to build index over those columns to retrieve documents without having to iterate over all the hashmap to find the correct object. For that you can use a secondaryDB that will build automatically the index for you. Or you can build the index yourself.

If you don't want to deal with key packing (and it's a good idea for starting up), you can take advantage of DB.set_bt_compare which will allow you to use cpickle, json or msgpack for both keys and values while still having an order that makes sens to create indices and doing queries. This is slower method but introduce the pattern of key composition.

To fully take advantage what ordered key is, you can make use of Cursor.set_range(key) to set the position of the db at the beginning of a query.

Another pattern, is called the EAV pattern stores tuples that follow the scheme (entity, attribute, value) and then you build various index by using permutation of that tuple. I learned this pattern studing datomic.

For less ressource hungry database, you will go the "static typed" way and store as much as possible of common information in the "metadata" table and split documents (which are really RDBMS tables) into their own hashmap.

To get you started here is an example database using bsddb (but you could build it using another ordered key/value store like wiredtiger or leveldb) that implements the EAV pattern. In this implementation I swap EAV for IKV which translates to Unique identifier, Key, Value. The overal result is that you have a fully indexed schema less document database. I think it's a good compromise between efficiency and ease-of-use.

import struct

from json import dumps
from json import loads

from bsddb3.db import DB
from bsddb3.db import DBEnv
from bsddb3.db import DB_BTREE
from bsddb3.db import DB_CREATE
from bsddb3.db import DB_INIT_MPOOL
from bsddb3.db import DB_LOG_AUTO_REMOVE


def pack(*values):
    def __pack(value):
        if type(value) is int:
            return '1' + struct.pack('>q', value)
        elif type(value) is str:
            return '2' + struct.pack('>q', len(value)) + value
        else:
            data = dumps(value, encoding='utf-8')
            return '3' + struct.pack('>q', len(data)) + data
    return ''.join(map(__pack, values))


def unpack(packed):
    kind = packed[0]
    if kind == '1':
        value = struct.unpack('>q', packed[1:9])[0]
        packed = packed[9:]
    elif kind == '2':
        size = struct.unpack('>q', packed[1:9])[0]
        value = packed[9:9+size]
        packed = packed[size+9:]
    else:
        size = struct.unpack('>q', packed[1:9])[0]
        value = loads(packed[9:9+size])
        packed = packed[size+9:]
    if packed:
        values = unpack(packed)
        values.insert(0, value)
    else:
        values = [value]
    return values


class TupleSpace(object):
    """Generic database"""

    def __init__(self, path):
        self.env = DBEnv()
        self.env.set_cache_max(10, 0)
        self.env.set_cachesize(5, 0)
        flags = (
            DB_CREATE |
            DB_INIT_MPOOL
        )
        self.env.log_set_config(DB_LOG_AUTO_REMOVE, True)
        self.env.set_lg_max(1024 ** 3)
        self.env.open(
            path,
            flags,
            0
        )

        # create vertices and edges k/v stores
        def new_store(name):
            flags = DB_CREATE
            elements = DB(self.env)
            elements.open(
                name,
                None,
                DB_BTREE,
                flags,
                0,
            )
            return elements
        self.tuples = new_store('tuples')
        self.index = new_store('index')
        self.txn = None

    def get(self, uid):
        cursor = self.tuples.cursor()

        def __get():
            record = cursor.set_range(pack(uid, ''))
            if not record:
                return
            key, value = record
            while True:
                other, key = unpack(key)
                if other == uid:
                    value = unpack(value)[0]
                    yield key, value
                    record = cursor.next()
                    if record:
                        key, value = record
                        continue
                    else:
                        break
                else:
                    break

        tuples = dict(__get())
        cursor.close()
        return tuples

    def add(self, uid, **properties):
        for key, value in properties.items():
            self.tuples.put(pack(uid, key), pack(value))
            self.index.put(pack(key, value, uid), '')

    def delete(self, uid):
        # delete item from main table and index
        cursor = self.tuples.cursor()
        index = self.index.cursor()
        record = cursor.set_range(pack(uid, ''))
        if record:
            key, value = record
        else:
            cursor.close()
            raise Exception('not found')
        while True:
            other, key = unpack(key)
            if other == uid:
                # remove tuple from main index
                cursor.delete()

                # remove it from index
                value = unpack(value)[0]
                index.set(pack(key, value, uid))
                index.delete()

                # continue
                record = cursor.next()
                if record:
                    key, value = record
                    continue
                else:
                    break
            else:
                break
        cursor.close()

    def update(self, uid, **properties):
        self.delete(uid)
        self.add(uid, **properties)

    def close(self):
        self.index.close()
        self.tuples.close()
        self.env.close()

    def debug(self):
        for key, value in self.tuples.items():
            uid, key = unpack(key)
            value = unpack(value)[0]
            print(uid, key, value)

    def query(self, key, value=''):
        """return `(key, value, uid)` tuples that where
        `key` and `value` are expressed in the arguments"""
        cursor = self.index.cursor()
        match = (key, value) if value else (key,)

        record = cursor.set_range(pack(key, value))
        if not record:
            cursor.close()
            return

        while True:
            key, _ = record
            other = unpack(key)
            ok = reduce(
                lambda previous, x: (cmp(*x) == 0) and previous,
                zip(match, other),
                True
            )
            if ok:
                yield other
                record = cursor.next()
                if not record:
                    break
            else:
                break
        cursor.close()


db = TupleSpace('tmp')
# you can use a tuple to store a counter
db.add(0, counter=0)

# And then have a procedure doing the required work
# to alaways have a fresh uid
def make_uid():
    counter = db.get(0)
    counter['counter'] += 1
    return counter['counter']

amirouche = make_uid()
db.add(amirouche, username="amirouche", age=30)
print(db.get(amirouche))

回复收藏 0 原文

~没有更多了~