有效地重新创建对象列表

发布于 2025-01-10 04:31:00 字数 962 浏览 0 评论 0原文

我正在尝试转换使用 SQLAlchemy 查询的对象列表。

我遇到的问题是该过程需要大约 18-20 秒来循环、处理并将数据发送到 frontdnd。数据大约有 500 万行,这对于投入生产来说太慢了。

这是我使用的示例。

test = [
   {"id": 5, "date":"2022-01-01 00:00:00"},
   {"id": 5, "date": "2022-01-01 00:00:00"},
   {"id": 5, "date": "2022-01-01 00:00:00"},
   {"id": 5, "date": "2022-01-01 00:00:00"},
]

test_dict = {}

for i in test:
    if i["id"] not in test_dict:
        test_dict[i["id"]] = []
    test_dict[i["id"]].append(i["date"].isoformat)

预期的输出应该是,例如,

[
   {5: [date, date, date, date, date]},
   {6: [date]}
]

我完全理解这不是工作代码,我不打算修复它。我只是临时写的,但我的主要关注点是如何使用 for 循环来加快进程。

谢谢大家的帮助。

感谢大家迄今为止的回答。

提供更多信息,数据需要发送到前端,然后呈现在图表上。该数据大约每分钟更新一次,也可以在 2 个时间范围内请求。这些时间范围始终至少为 35 天,因此返回的行始终至少为 500 万行左右。为最终用户加载图表需要 20 秒,我想说这太慢了。 for 循环是造成此瓶颈的原因,但最好将 for 循环缩短到至少 5 秒。

谢谢

额外信息:

不幸的是,处理数据库端不是一个选项。数据必须在 API 内转换为正确的格式。例如,不能将数据连接成正确的格式或在查询期间转换为 JSON。

I am trying to convert a list of objects which has been queried using SQLAlchemy.

The issue that I am having is that the process is taking around 18-20 seconds to loop, process and send the data to the frontdnd. The data is of around 5 million rows which is way too slow to put into production.

Here is an example of what I using.

test = [
   {"id": 5, "date":"2022-01-01 00:00:00"},
   {"id": 5, "date": "2022-01-01 00:00:00"},
   {"id": 5, "date": "2022-01-01 00:00:00"},
   {"id": 5, "date": "2022-01-01 00:00:00"},
]

test_dict = {}

for i in test:
    if i["id"] not in test_dict:
        test_dict[i["id"]] = []
    test_dict[i["id"]].append(i["date"].isoformat)

Expected output should be e.g

[
   {5: [date, date, date, date, date]},
   {6: [date]}
]

I totally understand this is not working code and I am not looking to fix it. I just wrote this on the fly but my main focus is what to do with the for loop to speed the process up.

Thank you everyone for your help.

Thank you everyone for your answers so far.

Providing more info, the data needs to be sent to the frontend which is then rendered on a graph. This data is updated around every minute or so and can also be requested between 2 time ranges. These time ranges are always a minimum of 35 days so the rows returned are always a minimum of 5 million or so. 20 seconds for a graph to load for the end user I would say is too slow. The for loop is the cause of this bottleneck but would be nice to get the for loop down to say 5 seconds at least.

Thank you

Extra info:

Processing database side is unfortunately not an option for this. The data must be converted to the correct format inside the API. For example, concat the data into the correct format or converting to JSON during query isn't an option.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

心作怪 2025-01-17 04:31:00

如果我目前了解,您可以使用 pandas dataframes

test = [
   {"id": 5, "date":"2022-01-01 00:00:00"},
   {"id": 5, "date": "2022-02-01 00:00:00"},
   {"id": 5, "date": "2022-03-01 00:00:00"},
   {"id": 5, "date": "2022-04-01 00:00:00"},
   {"id": 6, "date": "2022-05-01 00:00:00"},
]
import pandas as pd

df = pd.DataFrame.from_dict(test)
res = df.groupby("id").agg(list)
print(res)

输出:

                                                                                   date
id                                                                                      
5   [2022-01-01 00:00:00, 2022-02-01 00:00:00, 2022-03-01 00:00:00, 2022-04-01 00:00:00]
6                                                                  [2022-05-01 00:00:00]

如果您希望它作为 dict,您可以使用 res.to_dict()

If I understood currently you can use pandas dataframes

test = [
   {"id": 5, "date":"2022-01-01 00:00:00"},
   {"id": 5, "date": "2022-02-01 00:00:00"},
   {"id": 5, "date": "2022-03-01 00:00:00"},
   {"id": 5, "date": "2022-04-01 00:00:00"},
   {"id": 6, "date": "2022-05-01 00:00:00"},
]
import pandas as pd

df = pd.DataFrame.from_dict(test)
res = df.groupby("id").agg(list)
print(res)

Output :

                                                                                   date
id                                                                                      
5   [2022-01-01 00:00:00, 2022-02-01 00:00:00, 2022-03-01 00:00:00, 2022-04-01 00:00:00]
6                                                                  [2022-05-01 00:00:00]

And if you want it to be as dict you can use res.to_dict()

幸福丶如此 2025-01-17 04:31:00

您可能不应该向前端发送 500 万个对象。

通常我们使用分页、过滤器和排序元素。

然后,如果您真的愿意这样做,最快的方法可能是缓存您的数据,例如通过在服务器上创建和维护客户端将下载的 json 文件。

You should probably not send 5 millions objects to the frontend.

Usually we use pagination, filters, and sort elements.

Then if you are really willing to do so, the fastest way would probably be to cache your data, for instance by creating and maintaining a json file on your server that the clients would download.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文