We don’t allow questions seeking recommendations for software libraries, tutorials, tools, books, or other off-site resources. You can edit the question so it can be answered with facts and citations.
Closed 3 months ago.
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
接受
或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
发布评论
评论(11)
通过 sqlite3 标准库模块 使用特殊值
使用内存 SQLite 数据库怎么样:内存:
用于连接?如果您不想编写 SQL 语句,则始终可以使用 ORM,例如 SQLAlchemy ,访问内存中的 SQLite 数据库。编辑:我注意到你说这些值可能是Python对象,而且你需要避免序列化。要求将任意 Python 对象存储在数据库中也需要序列化。
如果您必须保留这两个要求,我可以提出一个实用的解决方案吗?为什么不直接使用 Python 字典作为 Python 字典集合的索引呢?听起来您对构建每个索引都有特殊的需求;找出要查询的值,然后编写一个函数来为每个值生成和索引。字典列表中一个键的可能值将是索引的键;索引的值将是一个字典列表。通过将要查找的值作为键来查询索引。
What about using an in-memory SQLite database via the sqlite3 standard library module, using the special value
:memory:
for the connection? If you don't want to write your on SQL statements, you can always use an ORM, like SQLAlchemy, to access an in-memory SQLite database.EDIT: I noticed you stated that the values may be Python objects, and also that you require avoiding serialization. Requiring arbitrary Python objects be stored in a database also necessitates serialization.
Can I propose a practical solution if you must keep those two requirements? Why not just use Python dictionaries as indices into your collection of Python dictionaries? It sounds like you will have idiosyncratic needs for building each of your indices; figure out what values you're going to query on, then write a function to generate and index for each. The possible values for one key in your list of dicts will be the keys for an index; the values of the index will be a list of dictionaries. Query the index by giving the value you're looking for as the key.
如果内存数据库解决方案最终工作量太大,您可能会发现以下一种自行过滤的方法很有用。
get_filter
函数接受参数来定义如何过滤字典,并返回一个可以传递到内置filter
函数以过滤字典列表的函数。这很容易扩展到更复杂的过滤,例如根据值是否与正则表达式匹配进行过滤:
If the in memory database solution ends up being too much work, here is a method for filtering it yourself that you may find useful.
The
get_filter
function takes in arguments to define how you want to filter a dictionary, and returns a function that can be passed into the built infilter
function to filter a list of dictionaries.This is pretty easily extensible to more complex filtering, for example to filter based on whether or not a value is matched by a regex:
我知道的唯一解决方案是几年前我在 PyPI 上偶然发现的一个软件包, PyDbLite 。没关系,但有几个问题:
__id__
下的两个整数和__版本__
。作者似乎偶尔会做这件事。我使用它时有一些新功能,包括一些用于复杂查询的漂亮语法。
假设您撕掉了酸洗(我可以告诉您我做了什么),您的示例将是(未经测试的代码):
希望它足以让您开始。
The only solution I know is a package I stumbled across a few years ago on PyPI, PyDbLite. It's okay, but there are few issues:
__id__
and__version__
.The author does seem to be working on it occasionally. There's some new features from when I used it, including some nice syntax for complex queries.
Assuming you rip out the pickling (and I can tell you what I did), your example would be (untested code):
Hopefully it will be enough to get you started.
就“身份”而言,任何可散列的东西都应该能够进行比较,以跟踪对象身份。
Zope 对象数据库 (ZODB):
http://www.zodb.org/
PyTables 效果很好:
http://www.pytables.org/moin
Metakit for Python 也运行良好:
http://equi4.com/metakit/python.html
支持列和子列,但不支持非结构化数据
研究“流处理”,如果您的数据集非常大,这可能会很有用:
http://www.trinhhaiianh.com/stream.py/
任何内存数据库,可以序列化(写入磁盘)的文件将会有您的身份问题。如果可能的话,我建议将要存储的数据表示为本机类型(列表、字典)而不是对象。
请记住,NumPy 旨在对内存数据结构执行复杂的操作,如果您决定推出自己的解决方案,它可能会成为您的解决方案的一部分。
As far as "identity" anything that is hashable you should be able to compare, to keep track of object identity.
Zope Object Database (ZODB):
http://www.zodb.org/
PyTables works well:
http://www.pytables.org/moin
Also Metakit for Python works well:
http://equi4.com/metakit/python.html
supports columns, and sub-columns but not unstructured data
Research "Stream Processing", if your data sets are extremely large this may be useful:
http://www.trinhhaianh.com/stream.py/
Any in-memory database, that can be serialized (written to disk) is going to have your identity problem. I would suggest representing the data you want to store as native types (list, dict) instead of objects if at all possible.
Keep in mind NumPy was designed to perform complex operations on in-memory data structures, and could possibly be apart of your solution if you decide to roll your own.
我编写了一个名为 Jsonstore 的简单模块,它解决了 (2) 和 (3) 问题。您的示例如下:
I wrote a simple module called Jsonstore that solves (2) and (3). Here's how your example would go:
不确定它是否符合您的所有要求,但 TinyDB(使用内存存储)也可能值得一试:
它的简单性和强大的查询引擎使其成为某些用例的非常有趣的工具。有关更多详细信息,请参阅 http://tinydb.readthedocs.io/。
Not sure if it complies with all your requirements, but TinyDB (using in-memory storage) is also probably worth the try:
Its simplicity and powerful query engine makes it a very interesting tool for some use cases. See http://tinydb.readthedocs.io/ for more details.
如果您愿意解决序列化问题,MongoDB 可以为您工作。 PyMongo 提供的界面与您所描述的几乎相同。如果您决定序列化,那么命中不会那么糟糕,因为 Mongodb 是内存映射的。
If you are willing to work around serializing, MongoDB could work for you. PyMongo provides an interface almost identical to what you describe. If you decide to serialize, the hit won't be as bad since Mongodb is memory mapped.
只需使用 isinstance()、hasattr()、getattr() 和 setattr() 就可以完成您想要做的事情。
然而,在完成之前事情会变得相当复杂!
我想可以将所有对象存储在一个大列表中,然后对每个对象运行查询,确定它是什么并查找给定的属性或值,然后将值和对象作为元组列表返回。然后你可以很容易地对你的返回值进行排序。 copy.deepcopy 将是你最好的朋友和最大的敌人。
听起来很有趣!祝你好运!
It should be possible to do what you are wanting to do with just isinstance(), hasattr(), getattr() and setattr().
However, things are going to get fairly complicated before you are done!
I suppose one could store all the objects in a big list, then run a query on each object, determining what it is and looking for a given attribute or value, then return the value and the object as a list of tuples. Then you could sort on your return values pretty easily. copy.deepcopy will be your best friend and your worst enemy.
Sounds like fun! Good luck!
我昨天开始开发一个,但尚未发布。它为您的对象建立索引并允许您运行快速查询。所有数据都保存在 RAM 中,我正在考虑智能加载和保存方法。出于测试目的,它通过 cPickle 加载和保存。
如果您仍然感兴趣,请告诉我。
I started developing one yesterday and it isn't published yet. It indexes your objects and allows you to run fast queries. All data is kept in RAM and I'm thinking about smart load and save methods. For testing purposes it is loading and saving through cPickle.
Let me know if you are still interested.
ducks 正是您所描述的。
pip install ducks
此示例使用 dict,但 ducks 适用于任何对象类型。
ducks is exactly what you are describing.
pip install ducks
This example uses dicts, but ducks works on any object type.
添加另一个选项:odex(我是作者)
与鸭子类似,这会构建索引在 Python 对象上,并且不会序列化或持久化任何内容。
与 sqlite 类似,odex 支持成熟的逻辑表达式。
Throwing another option into the mix: odex (I’m the author)
Similar to ducks, this builds indexes on Python objects and doesn't serialize or persist anything.
Similar to sqlite, odex supports full-fledged logical expressions.