Python 列表、集合或字典可以使用数据库隐形实现吗?
Python 的列表、集合和集合的本机功能字典完全摇滚。当数据变得非常大时,有没有办法继续使用原生能力?我正在解决的问题涉及非常大的列表的匹配(交集)。我还没有突破极限——实际上我真的不知道极限是什么——并且不想在数据按预期增长后对大规模重新实现感到惊讶。
部署在像 Google App Engine 这样没有实际规模限制的东西上并继续按原样永远使用本机功能而不真正考虑这一点是否合理?
是否有一些 Python 魔法可以隐藏列表、集合或字典是在 Python 管理的内存中还是在数据库中——这样数据的物理部署就可以与我在代码中所做的保持不同?
Python 超级专家先生或女士,您如何处理列表、集合和数据?随着数据量的增长而变化?
The Python native capabilities for lists, sets & dictionaries totally rock. Is there a way to continue using the native capability when the data becomes really big? The problem I'm working on involved matching (intersection) of very large lists. I haven't pushed the limits yet -- actually I don't really know what the limits are -- and don't want to be surprised with a big reimplementation after the data grows as expected.
Is it reasonable to deploy on something like Google App Engine that advertises no practical scale limit and continue using the native capability as-is forever and not really think about this?
Is there some Python magic that can hide whether the list, set or dictionary is in Python-managed memory vs. in a DB -- so physical deployment of data can be kept distinct from what I do in code?
How do you, Mr. or Ms. Python Super Expert, deal with lists, sets & dicts as data volume grows?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
我不太确定列表、集合和列表的本机功能是什么意思。字典。但是,您可以创建模拟 容器类型 和 序列类型 通过定义一些 具有特殊名称的方法。这意味着您可以创建一个行为类似于列表的类,但将其数据存储在 SQL 数据库或 GAE 数据存储中。简单来说,这就是 ORM 的作用。然而,将对象映射到数据库非常复杂,发明自己的 ORM 可能不是一个好主意,而是使用现有的 ORM。
恐怕没有一刀切的解决方案。特别是,GAE 并不是某种可以撒在代码上以使其扩展的魔法仙尘。要创建可扩展的应用程序,您必须牢记一些限制。其中一些是通用的,例如计算复杂性,其他则特定于代码运行的环境。例如,在 GAE 上
最大响应时间限制为 30 秒,查询数据存储的工作方式与其他数据库不同。在不了解您的具体问题的情况下,很难给出任何具体建议,但我怀疑 GAE 是否是正确的解决方案。
一般来说,如果您想处理大型数据集,您要么从一开始就必须牢记这一点,要么随着数据集的增长,您将不得不重新编写代码、算法和数据结构。
I'm not quite sure what you mean by native capabilities for lists, sets & dictionaries. However, you can create classes that emulate container types and sequence types by defining some methods with special names. That means that you could create a class that behaves like a list, but stores its data in a SQL database or on GAE datastore. Simply speaking, this is what an ORM does. However, mapping objects to a database is very complicated and it is probably not a good idea to invent your own ORM, but to use an existing one.
I'm afraid there is no one-size-fits-all solution. Especially GAE is not some kind of of Magic Fairy Dust you can sprinkle on your code to make it scale. There are several limitations you have to keep in mind to create an application that can scale. Some of them are general, like computational complexity, others are specific to the environment your code runs in. E.g. on GAE
the maximum response time is limited to 30 seconds andquerying the datastore works different that on other databases.It's hard to give any concrete advice without knowing your specific problem, but I doubt that GAE is the right solution.
In general, if you want to work with large datasets, you either have to keep that in mind from the start or you will have to rework your code, algorithms and data structures as the datasets grow.
你正在描述我的梦想!但是,我认为你做不到。我一直想要类似 LINQ for Python 的东西,但该语言不允许使用 Python 语法用于本机数据库操作 AFAIK。如果可能的话,您可以只使用列表编写代码,然后使用相同的代码从数据库检索数据。
我不建议您编写大量仅基于列表和集合的代码,因为将其迁移到可扩展的平台并不容易。我建议你使用 ORM 之类的东西。 GAE 甚至拥有自己的类似 ORM 的系统,您可以使用其他诸如 SQLAlchemy 和 SQL对象例如 SQLite。
不幸的是,您无法使用列表推导式等很棒的东西来过滤数据库中的数据。当然,您可以在从数据库获取数据后对其进行过滤,但您仍然需要使用某种类似 SQL 的语言构建查询来查询对象或从数据库返回大量对象。
OTOH,有 Buzhug,一个用 Python 编写的奇怪的非关系数据库系统,允许使用自然的 Python 语法。我从未使用过它,我不知道它是否可扩展,所以我不会把钱花在它上面。不过,您可以测试一下,看看它是否可以帮助您。
You are describing my dreams! However, I think you cannot do it. I always wanted something just like LINQ for Python but the language does not permit to use Python syntax for native database operations AFAIK. If it would be possible, you could just write code using lists and then use the same code for retrieving data from a database.
I would not recommend you to write a lot of code based only in lists and sets because it will not be easy to migrate it to a scalable platform. I recommend you to use something like an ORM. GAE even has its own ORM-like system and you can use other ones such as SQLAlchemy and SQLObject with e.g. SQLite.
Unfortunately, you cannot use awesome stuff such as list comprehensions to filter data from the database. Surely, you can filter data after it was gotten from the DB but you'll still need to build a query with some SQL-like language for querying objects or return a lot of objects from a database.
OTOH, there is Buzhug, a curious non-relational database system written in Python which allows the use of natural Python syntax. I have never used it and I do not know if it is scalable so I would not put my money on it. However, you can test it and see if it can help you.
您可以使用 ORM:对象关系映射:类获取一张表,对象获取一行。我喜欢 Django ORM。您也可以将其用于非网络应用程序。我从未在GAE上使用过它,但我认为这是可能的。
You can use ORM: Object Relational Mapping: A class gets a table, an objects gets a row. I like the Django ORM. You can use it for non-web apps, too. I never used it on GAE, but I think it is possible.