简单的表操作与MLJ的汇编时间非常大

发布于 2025-02-06 02:17:45 字数 732 浏览 2 评论 0原文

我正在尝试在数据框架(30,000行x 8,000列)上使用MLJ,但是每个表操作似乎都需要大量时间进行编译,但运行速度很快。

我给了一个示例,其中有一个代码,其中生成了5 x 5000个数据框,并且卡在拆卸行(第3行)上。当我为5 x 5数据框架运行相同的代码时,第3行输出“ 2.872309秒(9.09 m分配:565.673 MIB,6.47%GC时间,99.84%的汇编时间)”。

对于看似简单的任务来说,这是一个疯狂的编译时间,我想知道如何减少这一点。 谢谢你, Jack

使用MLJ

使用dataframes

[line 1] @Time arr = [[rand(1:10)for I in 1:5] in 1:5] in 1:in 1:5] 5000];

输出:0.053668秒(200.76 K分配:11.360 MIB,22.16%GC时间,99.16%的编译时间)

[line 2] @Time DF = dationframes.dataframes.dataframe(arto) /code>

输出:0.267325秒(733.43 K分配:40.071 MIB,4.29%GC时间,98.67%的编译时间)

[line 3] @time y,x = uncack = uncack(df,==(x1));

未完成运行

I am trying to use MLJ on a DataFrame (30,000 rows x 8,000 columns) but every table operation seems to take a huge amount of time to compile but is fast to run.

I have given an example with code below in which a 5 x 5000 DataFrame is generated and it gets stuck on the unpack line (line 3). When I run the same code for a 5 x 5 DataFrame, line 3 outputs “2.872309 seconds (9.09 M allocations: 565.673 MiB, 6.47% gc time, 99.84% compilation time)”.

This is a crazy amount of compilation time for a seemingly simple task and I would like to know how I can reduce this.
Thank you,
Jack

using MLJ

using DataFrames

[line 1] @time arr = [[rand(1:10) for i in 1:5] for i in 1:5000];

output: 0.053668 seconds (200.76 k allocations: 11.360 MiB, 22.16% gc time, 99.16% compilation time)

[line 2] @time df = DataFrames.DataFrame(arr, :auto)

output: 0.267325 seconds (733.43 k allocations: 40.071 MiB, 4.29% gc time, 98.67% compilation time)

[line 3] @time y, X = unpack(df, ==(:x1));

does not finish running

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

最冷一天 2025-02-13 02:17:45

朱莉娅编译器在具有(可能)(可能)异质列类型的非常宽的数据框架中挣扎并不意外。 我不确定为什么这对此操作必须是一个问题 - 我已经与MLJ维护者进行了核对

也就是说,

y, X = df.x1, select!(df, Not(:x1))

。代码>将从基础数据中删除x1,如果要复制数据使用选择而不是)

It's not unexpected that the Julia compiler struggles with very wide DataFrames, which have (potentially) heterogeneous column types. That said I'm not sure why this has to be a problem for this operation - I've checked with MLJ maintainers who can hopefully chime in.

In the meantime you can simply do

y, X = df.x1, select!(df, Not(:x1))

which is instantaneous (Note select! will drop x1 from your underlying data, if you want to copy data use select instead)

情释 2025-02-13 02:17:45

请不要在不链接的情况下在多个网站上解决问题。

这个问题在朱莉娅论坛上得到了回答: https://discourse.julialang.org/t/simple-table-operation-has-very-large-compilation time-with-with-mlj/82503/2 。它是由MLJBase 0.20.5中固定的错误引起的。

Please don't cross-post a problem on multiple websites without linking.

The question has been answered at the Julia forum: https://discourse.julialang.org/t/simple-table-operation-has-very-large-compilation-time-with-mlj/82503/2. It was caused by a bug which is fixed in MLJBase 0.20.5.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文