我正在建立一个模型,该模型在.csv文件(〜50GB)中使用了大型数据集。我的机器是Windows 10,带有16GB RAM。
由于我没有足够的RAM来加载整个数据集,因此我使用 dask 该文件并将它们分成较小的数据集。它运行良好,我能够将其保存到 ... ,如此 image
我已经尝试过,
!pip install dask
import dask.dataframe as dd
cat = dd.read_csv(paths.data + "cat.csv/*")
cat.head(5)
但是即使数据保持在最低限度,它也只是继续加载。
有人可以帮我吗?谢谢。
I am building a model which used large datasets in .csv files (~50Gb). My machine is a Windows 10 with 16Gb of RAM.
Since I don't have enough RAM to load the whole dataset, I used Dask to read the file and split them into smaller data sets. It worked just fine and I was able to save it into files like these. However, when I read the files, it only showed ...
in every boxes like in this image
I have tried
!pip install dask
import dask.dataframe as dd
cat = dd.read_csv(paths.data + "cat.csv/*")
cat.head(5)
but it simply kept loading even though the data is kept to a minimum.
Can anyone please help me? Thank you.
发布评论
评论(1)
...
符号是预期的,因为数据未加载在内存中。这里有一个详细的Dutorial教程: htttps://tutorial.dask.org.dask.orgg/04_dataaframe.htmll < /a>The
...
symbol is expected, since the data is not loaded in memory. There is a detailed tutorial on dask dataframes here: https://tutorial.dask.org/04_dataframe.html