Python 效率:列表与元组

发布于 2024-11-08 05:26:14 字数 208 浏览 0 评论 0原文

我有中等数量的基础物体。

这些基础对象将被放入集合中,并且这些集合将被修改:排序、截断等。

不幸的是,n 足够大,内存消耗有点令人担忧,而且速度也越来越令人担忧。

我的理解是,元组的内存效率稍高一些,因为它们经过了重复数据删除。

无论如何,我想知道 Python 2.6/2.7 中列表与元组的 cpu/内存权衡是什么。

I have a medium-amount of base objects.

These base objects will be put in collections, and these collections will be munged around: sorted, truncated, etc.

Unfortunately, the n is large enough that memory consumption is slightly worrisome, and speed is getting concerning.

My understanding is that tuples are slightly more memory-efficient, since they are deduplicated.

Anyway, I would like to know what the cpu/memory tradeoffs of lists vs. tuples are in Python 2.6/2.7.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(7

紫瑟鸿黎 2024-11-15 05:26:14

如果您有一个元组和一个包含相同元素的列表,则元组占用的空间更少。由于元组是不可变的,因此您无法对它们进行排序、添加等。我建议观看 Alex Gaynor 的演讲快速介绍了何时选择 Python 中的数据结构。

更新:再考虑一下,您可能想要研究优化对象的空间使用,例如,通过 < code>__slots__ 或使用 namedtuple 实例作为代理而不是实际物体。这可能会带来更大的节省,因为您有 N 个,并且(大概)只有少数几个集合出现它们。 namedtuple 尤其是超级棒;查看Raymond Hettinger 的演讲< /a>.

If you have a tuple and a list with the same elements, the tuple takes less space. Since tuples are immutable, you can't sort them, add to them, etc. I recommend watching this talk by Alex Gaynor for a quick intro on when to choose what datastructure in Python.

UPDATE: Thinking about it some more, you may want to look into optimizing the space usage of your objects, e.g., via __slots__ or using namedtuple instances as proxies instead of the actual objects. This would likely lead to much bigger savings, since you have N of them and (presumbaly) only a few collections in which they appear. namedtuple in particular is super awesome; check out Raymond Hettinger's talk.

笨笨の傻瓜 2024-11-15 05:26:14

正如其他人提到的,元组是不可变的。对元组进行排序(例如sorted(mytuple))会返回一个列表,然后您必须将其转换回元组。

要对元组进行排序(并使其保持元组),您必须执行以下操作:

mytuple = (3,2,1)
mysortedtuple = tuple(sorted(mytuple))

要对列表进行排序,您必须执行以下操作:

mylist = [3,2,1]
mylist.sort()

因为您不是进行强制转换和重新强制转换,所以在本例中,后者是更有效率。

除非你有充分的理由,否则不要沉迷于使用元组而不是列表。如果您需要排序数据,则元组不是最佳选择,除非它们一开始就是这样创建的。当元组包含的数据不会更改时,例如在运行时加载的配置设置或已处理的数据,元组会表现出色。

考虑到您提到您正在处理大型数据集,您可能希望考虑通过列表和元组上的生成器和迭代器来使用函数式编程风格。这样,您就不必四处奔波并创建新容器,而只需链接迭代操作即可获得最终结果。

进一步阅读:

As others mentioned tuples are immutable. Sorting a tuple (e.g. sorted(mytuple)) returns a list, which you would then have to cast back to a tuple.

To sort a tuple (and keep it a tuple) you'd have to do this:

mytuple = (3,2,1)
mysortedtuple = tuple(sorted(mytuple))

To sort a list you'd have to do this:

mylist = [3,2,1]
mylist.sort()

Because you're not casting and re-casting, the latter, in this instance, is more efficient.

Don't get hung up on using tuples over lists unless you have a good justification. If you need sorted data, tuples are not the way to go unless they are created that way in the first place. Tuples excel when the data they contain DOES NOT CHANGE, such as with configuration settings that are loaded at run-time, or data that has already been processed.

Considering that you mentioned you are processing a large dataset, you might want to look at using a functional programming style by way of generators and iterators over lists and tuples. This way you're not shuttling around and creating new containers, but just chaining iteration operations to get to the end result.

Further reading:

北陌 2024-11-15 05:26:14

集合中基础对象的(平均、最小、最大)数量是多少?

元组是“去重”的,而列表不是?您认为“重复数据删除”在这种情况下意味着什么?

列表确实比元组占用更多的内存,因为额外的内存是在假设列表将会增长的情况下分配的,并且您肯定不希望每次执行large_list.append() 时都重新分配() 内存。然而,在 32 位机器上,额外列表元素的摊销成本为指针 4 个字节、元素本身 N 个字节,以及额外内存不超过 4 个字节。 N 是浮点型的 16 个字节。这意味着浮点数列表每个额外浮点数最多需要 24 个字节,而元组则需要 20 个字节。 N==100 的“基础对象”给出 108 与 104 的比较。如果在两个集合中引用一个基础对象,则为 58 与 54。您的 N 有多大?

建议:将您的收藏保留为列表。专注于:

  • 确保您的基础对象具有内存效率

  • 尽可能使用生成器和 itertools 好东西而不是临时列表

  • 如果您无法避免使用临时列表,请确保立即丢弃它们,因为不再需要它们,即不要等到创建方法返回;尽快使用显式 del

What is the (average, min, max) number of base objects in a collection?

Tuples are "deduplicated" and lists are not? What do you think that "deduplicated" means in this context?

Lists do take up more memory than tuples, because extra memory is allocated on the presumption that a list is going to grow and you definitely don't want to realloc() memory each time you do large_list.append(). However on a 32-bit machine, the amortised cost of an extra list element is 4 bytes for a pointer, N bytes for the element itself, and no more than another 4 bytes for the extra memory. N is 16 bytes for a float. That means a list of floats takes up to 24 bytes per extra float, compared with 20 bytes for a tuple. A "base object" with N==100 gives a comparison of 108 versus 104. If a based object is referred to in two collections, then 58 versus 54. How big is your N?

Advice: Leave your collections as lists. Concentrate on:

  • ensuring your base objects are memory-efficient

  • use generators and itertools goodies instead of temporary lists where possible

  • if you can't avoid having temporary lists, ensure that they are thrown away immmediately they are not needed any more i.e. don't wait till the creating method returns; use explicit del as soon as possible.

_畞蕅 2024-11-15 05:26:14

除了所有这些建议之外,您可能会发现 numpy 可以满足您的需求。如果您的对象是 numpy 默认处理的对象(整数、本机 C 类型等),那么这将是理想的选择。您也可以将 numpy 数组与自定义对象一起使用,但这可能比其价值更多的工作。

In addition to all these suggestions, you may find that numpy will fill your needs. If your objects are something that numpy handles by default (ints, native C types, etc) then that would be ideal. You can use a numpy array with custom objects as well, but that might be more work than it's worth.

日记撕了你也走了 2024-11-15 05:26:14

你不能以同样的方式使用它们。元组是不可变的,不支持追加、排序等(在元组上调用 sorted 会生成一个列表,等等)。元组与列表完全不同,因此任何性能比较都是没有意义的。

You can't use them the same way. Tuples are immutable and don't support appending, sorting, etc (calling sorted on a tuple yields a list, and so on). Tuples are totally different from lists, so any performance comparison is meaningless.

诗笺 2024-11-15 05:26:14

您无法对不可变对象进行排序 - 即,在对元组进行排序时,您始终会创建一个新对象。

You cannot sort an immutable object - i.e. when sorting a tuple you'll always create a new one.

冷夜 2024-11-15 05:26:14

至少有两个现有问题与您的问题足够相似,其答案(或其中的链接)可能对您有用。总结一下:让类型的特征(可变与不可变、异构与同质)而不是性能来指导您的决策,因为性能/效率差异很小。

Python 中列表和元组有什么区别?< /a>
列表、字典和元组之间有什么区别在 Python 中?

There are at least two existing questions that are similar enough to yours that the answers (or links within them) may be useful to you. To summarize: let the features of the type (mutable vs. immutable, heterogeneous vs. homogeneous) rather than performance guide your decision, because the performance/efficiency differences are minimal.

What's the difference between list and tuples in Python?
What are differences between List, Dictionary and Tuple in Python?

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文