Python Loop性能太慢
我有来自数据库的许多记录。我想将数据库记录的结构转换为更像父母和子类,
因此 forecast_data
具有以下属性:
component_plan_id, region, planning_item, cfg, measure, period_str, currency, forecast_value, forecast_currency
这个想法是转换为父记录,带有属性
component_plan_id, region, planning_item, cfg, measure, currency
和每个记录的子记录父记录将是
period_str, forecast_value, forecast_currency
我在代码中所做的事情,即
1. Get the list of unique property in parent's records
2. For each record in (1), get records with the same attribute, create child record with period_str, forecast_value, forecast_currency
下面的代码已经有效,但是某种程度上它太慢了。有什么方法可以提高性能吗?
data = []
# Format the forecast data
for rec in list(set((row.component_plan_id, row.region,
row.planning_item, row.cfg, row.measure, row.currency) for row in forecast_data)):
new_rec = ComponentForecastReadDto(component_plan_id = rec[0],
region = rec[1], planning_item = rec[2],
cfg = rec[3], measure = rec[4], currency = rec[5])
# Get forecast value
new_rec.forecast = []
for rec_forecast in [x for x in forecast_data if
x.component_plan_id == new_rec.component_plan_id and
x.region == new_rec.region and
x.planning_item == new_rec.planning_item and
x.cfg == new_rec.cfg and
x.measure == new_rec.measure and
x.currency == new_rec.currency]:
new_forecast = ComponentForecastValueReadDto(period_str = rec_forecast.period_str,
forecast_value = rec_forecast.forecast_value, forecast_currency = rec_forecast.forecast_currency)
new_rec.forecast.append(new_forecast)
data.append(new_rec)
componentForeCastreAddto
和 componentForecastValueReaddto
是从pydantic中继承的。
样本输入:
| component_plan id | region | planning_item | cfg | measure | period_str | currency | forecast_value | forecast_currency |
| 1 | America | Item 1 | cfg A | unit | 2022-06 | 2 | 100 | 200 |
| 1 | America | Item 1 | cfg A | unit | 2022-07 | 2 | 150 | 300 |
| 1 | America | Item 1 | cfg A | unit | 2022-08 | 2 | 200 | 400 |
| 1 | Asia | Item 1 | cfg A | unit | 2022-06 | 3 | 150 | 450 |
输出
记录#1
component_plan_id = 1
region = America
planning_item = Item 1
cfg = cfg A
measure = unit
currency = 2
children:
1. period_str = 2022-06
forecast_value = 100
forecast_currency = 200
2. period_str = 2022-07
forecast_value = 150
forecast_currency = 300
3. period_str = 2022-08
forecast_value = 200
forecast_currency = 400
记录#2
component_plan_id = 1
region = Asia
planning_item = Item 1
cfg = cfg A
measure = unit
currency = 3
children:
1. period_str = 2022-06
forecast_value = 150
forecast_currency = 450
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我最终更改了函数,如下所示:
这将运行时间从O(n^2)减少到O(n log n),它运行非常快。
I ended up changing the function like follows:
This reduces the running time from O(N^2) to O(N log N), it runs really fast.
以CSV格式假设您的数据,并带有名称
input.csv
:我使用
pandas.dataframe.groupby
重写以下内容:如果您想要它更快,
joblib joblib < /代码>并行可能会有所帮助:
两种方式都会返回相同的结果:
Assuming your data in CSV format, with name
input.csv
:I used
pandas.DataFrame.groupby
to rewrite this:In case you want it faster,
joblib
Parallel might help a bit:Both ways will return the same result: