有效地重新创建对象列表

发布于 2025-01-10 04:31:00 字数 962 浏览 0 评论 0原文

我正在尝试转换使用 SQLAlchemy 查询的对象列表。

我遇到的问题是该过程需要大约 18-20 秒来循环、处理并将数据发送到 frontdnd。数据大约有 500 万行，这对于投入生产来说太慢了。

这是我使用的示例。

test = [
   {"id": 5, "date":"2022-01-01 00:00:00"},
   {"id": 5, "date": "2022-01-01 00:00:00"},
   {"id": 5, "date": "2022-01-01 00:00:00"},
   {"id": 5, "date": "2022-01-01 00:00:00"},
]

test_dict = {}

for i in test:
    if i["id"] not in test_dict:
        test_dict[i["id"]] = []
    test_dict[i["id"]].append(i["date"].isoformat)

预期的输出应该是，例如，

[
   {5: [date, date, date, date, date]},
   {6: [date]}
]

我完全理解这不是工作代码，我不打算修复它。我只是临时写的，但我的主要关注点是如何使用 for 循环来加快进程。

谢谢大家的帮助。

感谢大家迄今为止的回答。

提供更多信息，数据需要发送到前端，然后呈现在图表上。该数据大约每分钟更新一次，也可以在 2 个时间范围内请求。这些时间范围始终至少为 35 天，因此返回的行始终至少为 500 万行左右。为最终用户加载图表需要 20 秒，我想说这太慢了。 for 循环是造成此瓶颈的原因，但最好将 for 循环缩短到至少 5 秒。

谢谢

额外信息：

不幸的是，处理数据库端不是一个选项。数据必须在 API 内转换为正确的格式。例如，不能将数据连接成正确的格式或在查询期间转换为 JSON。

原文

I am trying to convert a list of objects which has been queried using SQLAlchemy.

The issue that I am having is that the process is taking around 18-20 seconds to loop, process and send the data to the frontdnd. The data is of around 5 million rows which is way too slow to put into production.

Here is an example of what I using.

test = [
   {"id": 5, "date":"2022-01-01 00:00:00"},
   {"id": 5, "date": "2022-01-01 00:00:00"},
   {"id": 5, "date": "2022-01-01 00:00:00"},
   {"id": 5, "date": "2022-01-01 00:00:00"},
]

test_dict = {}

for i in test:
    if i["id"] not in test_dict:
        test_dict[i["id"]] = []
    test_dict[i["id"]].append(i["date"].isoformat)

Expected output should be e.g

[
   {5: [date, date, date, date, date]},
   {6: [date]}
]

I totally understand this is not working code and I am not looking to fix it. I just wrote this on the fly but my main focus is what to do with the for loop to speed the process up.

Thank you everyone for your help.

Thank you everyone for your answers so far.

Providing more info, the data needs to be sent to the frontend which is then rendered on a graph. This data is updated around every minute or so and can also be requested between 2 time ranges. These time ranges are always a minimum of 35 days so the rows returned are always a minimum of 5 million or so. 20 seconds for a graph to load for the end user I would say is too slow. The for loop is the cause of this bottleneck but would be nice to get the for loop down to say 5 seconds at least.

Thank you

Extra info:

Processing database side is unfortunately not an option for this. The data must be converted to the correct format inside the API. For example, concat the data into the correct format or converting to JSON during query isn't an option.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

心作怪 2025-01-17 04:31:00

如果我目前了解，您可以使用 pandas dataframes

test = [
   {"id": 5, "date":"2022-01-01 00:00:00"},
   {"id": 5, "date": "2022-02-01 00:00:00"},
   {"id": 5, "date": "2022-03-01 00:00:00"},
   {"id": 5, "date": "2022-04-01 00:00:00"},
   {"id": 6, "date": "2022-05-01 00:00:00"},
]
import pandas as pd

df = pd.DataFrame.from_dict(test)
res = df.groupby("id").agg(list)
print(res)

输出：

                                                                                   date
id                                                                                      
5   [2022-01-01 00:00:00, 2022-02-01 00:00:00, 2022-03-01 00:00:00, 2022-04-01 00:00:00]
6                                                                  [2022-05-01 00:00:00]

如果您希望它作为 dict，您可以使用 res.to_dict()

If I understood currently you can use pandas dataframes

test = [
   {"id": 5, "date":"2022-01-01 00:00:00"},
   {"id": 5, "date": "2022-02-01 00:00:00"},
   {"id": 5, "date": "2022-03-01 00:00:00"},
   {"id": 5, "date": "2022-04-01 00:00:00"},
   {"id": 6, "date": "2022-05-01 00:00:00"},
]
import pandas as pd

df = pd.DataFrame.from_dict(test)
res = df.groupby("id").agg(list)
print(res)

Output :

                                                                                   date
id                                                                                      
5   [2022-01-01 00:00:00, 2022-02-01 00:00:00, 2022-03-01 00:00:00, 2022-04-01 00:00:00]
6                                                                  [2022-05-01 00:00:00]

And if you want it to be as dict you can use res.to_dict()

回复收藏 0 原文