当用大熊猫重新打开木木木文件时，为什么内存使用量会增加？

发布于 2025-01-22 16:04:43 字数 542 浏览 0 评论 0原文

我生成了8.481.288行和451列的熊猫数据框架，其中大多数列具有整数值。当我生成此数据框时，我的PC上的总内存消耗为（更少）的总内存的50％，但是如果我将此数据框保存到parquet格式中，请重新启动内核并读取文件，我的内存消耗就接近了99％，几乎无法使用。

更具体地说，我正在保存：

df.to_parquet('filepath.parquet')

然后，我重新启动内核并重新打开：

df = pd.read_parquet('filepath.parquet')

然后我的内存消耗爆炸。

抱歉，这个愚蠢的问题，但我找不到其他问题中的答案。如果我尝试以羽毛格式保存，也会发生类似的事情。

谢谢

编辑：还会发生两个奇怪的事情：重新打开DF（DEL DF）时，Python的内存使用量仍处于高水平，但在删除数据帧之前的一半。另外，当我重新打开数据框并将其与另一个合并时，内存使用级别返回到正常值（与保存和重新开放之前的值相似）。这证实了Juanpa.Arrivillaga的回答。

原文

I generated a Pandas dataframe of 8.481.288 rows and 451 columns, where most of the columns have integer values. When I generate this dataframe, the total memory consumption on my PC is (more o less) 50% of my total memory, but if I save this dataframe to parquet format, restart the kernel and read the file, my memory consumption comes close to 99%, making it almost unfeasible to be used.

More specifically, I am saving with:

df.to_parquet('filepath.parquet')

And then, I restart the kernel and reopen with:

df = pd.read_parquet('filepath.parquet')

Then my memory consumption explodes.

Sorry for the silly question, but I couldn't find the answer in other questions. A similar thing happens if I try to save in feather format.

Thank you

EDIT: two curious things also happen: When I delete the df after reopening it (del df), the memory usage by Python remains in a high level, but half the one before deleting the dataframe. Also, when I reopen the dataframe and merge it with another one, the memory usage level goes back to the normal value (similar to the one before saving and reopening). This corroborates juanpa.arrivillaga's answer.

分享到QQ

分享到微博