在Linux系统上存储Python数据
我需要创建一个系统来在 Linux 系统上存储 python 数据结构,但可以对来自多个程序/守护进程/脚本的数据进行并发读写访问。我的第一个想法是我会创建一个 unix 套接字来侦听连接并将请求的数据作为 pickled python 数据结构提供。客户端的任何写入都会同步到磁盘(可能是批量同步,但我不希望它具有高吞吐量,因此 Linux vfs 缓存可能就可以了)。这确保只有一个进程读取和写入数据。
另一个想法是仅将腌制的数据结构保留在磁盘上,并且仅允许单个进程通过锁定文件或令牌进行访问...这要求所有访问客户端遵守锁定机制/使用访问模块。
我在看什么? SQLite 是可用的,但我想让它尽可能简单。
你会怎么办?
I have the need to create a system to store python data structures on a linux system but have concurrent read and write access to the data from multiple programs/daemons/scripts. My first thought is I would create a unix socket that would listen for connections and serve up requested data as pickled python data structures. Any writes by the clients would get synced to disk (maybe in batch, though I don't expect it to be high throughput so just Linux vfs caching would likely be fine). This ensures only a single process reads and writes to the data.
The other idea is to just keep the pickled data structure on disk and only allow a single process access through a lockfile or token... This requires all accessing clients to respect the locking mechanism / use the access module.
What am I over looking? SQLite is available, but I'd like to keep this as simple as possible.
What would you do?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
如果 SQLite 可用的话我会使用它。
请参阅此常见问题解答:http://www.sqlite.org/faq.html#q5 -- SQLite(带有 pysqlite [0])应该能够优雅地处理并发。
如果您愿意,您可以将数据保留为简单的键值对,无需对数据使用所有 BNF。
[0] http://trac.edgewall.org/wiki/PySqlite
I would just use SQLite if it's available.
See this FAQ: http://www.sqlite.org/faq.html#q5 -- SQLite (with pysqlite [0]) should be able handle your concurrency elegantly.
You can keep the data as simple key-value pairs if you like, there's no need to go all BNF on your data.
[0] http://trac.edgewall.org/wiki/PySqlite
如果您只想存储名称/值对(例如,腌制数据的文件名),您可以随时使用 Berkley DB (http://code.activestate.com/recipes/189060-using-berkeley-db-database/)。如果您的数据是面向数字的,您可能需要查看 PyTables (http://www.pytables.org/moin)。如果你真的想使用套接字(我通常会尽量避免这种情况,因为你必须担心很多细节),你可能需要看看 Twisted Python(非常适合通过 Python 处理多个连接,不需要线程)。
If you want to just store name/value pairs (e.g. filename to pickled data) you can always use Berkley DB (http://code.activestate.com/recipes/189060-using-berkeley-db-database/). If your data is numbers-oriented, you might want to check out PyTables (http://www.pytables.org/moin). If you really want to use sockets (I would generally try to avoid that, since there's a lot of minutia you have to worry about) you may want to look at Twisted Python (good for handling multiple connections via Python with no threading required).
我会使用数据库。一个真实的。这就是它们存在的原因(嗯,原因之一)。如果没有必要,不要重新发明轮子。
I'd use a database. A real one. This is why they exist (well, one of the reasons). Don't reinvent the wheel if you don't have to.
撇开后端存储不谈(这里有很多选项,包括 ConfigParser、shelf、sqlite 和 anydbm),使用单个进程处理存储和连接到它的其他进程的想法可能是有用的。我的第一个想法是 Pyro (Python 远程对象)。套接字虽然始终可用,但可能会变得棘手。
Leaving backend storage aside (plenty of options here, including ConfigParser, shelf, sqlite and anydbm), the idea with a single process handling storage and others connecting to it may be usable. My first thought for doing that is Pyro (Python remote objects). Sockets, while always available, can get tricky.
您可以使用 ConfigParser 序列化数据结构并将它们存储为值。如果您创建了自己的访问库/模块来访问数据,则可以在库中进行序列化,以便客户端代码仅发送和接收 python 对象。您还可以在库中处理并发。
You could serialize the data structures and store them as values using
ConfigParser
. If you created your own access lib/module to the access the data, you could do the serialization in the lib so the client code would just send and receive python objects. You could also handle concurrency in the lib.