简单的表操作与MLJ的汇编时间非常大
我正在尝试在数据框架(30,000行x 8,000列)上使用MLJ,但是每个表操作似乎都需要大量时间进行编译,但运行速度很快。
我给了一个示例,其中有一个代码,其中生成了5 x 5000个数据框,并且卡在拆卸行(第3行)上。当我为5 x 5数据框架运行相同的代码时,第3行输出“ 2.872309秒(9.09 m分配:565.673 MIB,6.47%GC时间,99.84%的汇编时间)”。
对于看似简单的任务来说,这是一个疯狂的编译时间,我想知道如何减少这一点。 谢谢你, Jack
使用MLJ
使用dataframes
[line 1] @Time arr = [[rand(1:10)for I in 1:5] in 1:5] in 1:in 1:5] 5000];
输出:0.053668秒(200.76 K分配:11.360 MIB,22.16%GC时间,99.16%的编译时间)
[line 2] @Time DF = dationframes.dataframes.dataframe(arto) /code>
输出:0.267325秒(733.43 K分配:40.071 MIB,4.29%GC时间,98.67%的编译时间)
[line 3] @time y,x = uncack = uncack(df,==(x1));
未完成运行
I am trying to use MLJ on a DataFrame (30,000 rows x 8,000 columns) but every table operation seems to take a huge amount of time to compile but is fast to run.
I have given an example with code below in which a 5 x 5000 DataFrame is generated and it gets stuck on the unpack line (line 3). When I run the same code for a 5 x 5 DataFrame, line 3 outputs “2.872309 seconds (9.09 M allocations: 565.673 MiB, 6.47% gc time, 99.84% compilation time)”.
This is a crazy amount of compilation time for a seemingly simple task and I would like to know how I can reduce this.
Thank you,
Jack
using MLJ
using DataFrames
[line 1] @time arr = [[rand(1:10) for i in 1:5] for i in 1:5000];
output: 0.053668 seconds (200.76 k allocations: 11.360 MiB, 22.16% gc time, 99.16% compilation time)
[line 2] @time df = DataFrames.DataFrame(arr, :auto)
output: 0.267325 seconds (733.43 k allocations: 40.071 MiB, 4.29% gc time, 98.67% compilation time)
[line 3] @time y, X = unpack(df, ==(:x1));
does not finish running
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
朱莉娅编译器在具有(可能)(可能)异质列类型的非常宽的数据框架中挣扎并不意外。 我不确定为什么这对此操作必须是一个问题 - 我已经与MLJ维护者进行了核对
也就是说,
。代码>将从基础数据中删除
x1
,如果要复制数据使用选择
而不是)It's not unexpected that the Julia compiler struggles with very wide DataFrames, which have (potentially) heterogeneous column types. That said I'm not sure why this has to be a problem for this operation - I've checked with MLJ maintainers who can hopefully chime in.
In the meantime you can simply do
which is instantaneous (Note
select!
will dropx1
from your underlying data, if you want to copy data useselect
instead)请不要在不链接的情况下在多个网站上解决问题。
这个问题在朱莉娅论坛上得到了回答: https://discourse.julialang.org/t/simple-table-operation-has-very-large-compilation time-with-with-mlj/82503/2 。它是由MLJBase 0.20.5中固定的错误引起的。
Please don't cross-post a problem on multiple websites without linking.
The question has been answered at the Julia forum: https://discourse.julialang.org/t/simple-table-operation-has-very-large-compilation-time-with-mlj/82503/2. It was caused by a bug which is fixed in MLJBase 0.20.5.