提高 PostgreSQL 的 PowerBI Direct Query 使用性能

发布于 2025-01-14 15:46:53 字数 10444 浏览 2 评论 0原文

我正在尝试提高我正在构建的报告解决方案的性能。我使用的技术是:

  • 自定义构建 Python 工具提取数据并将其存储在
  • Azure 托管 PostgreSQL 中,使用 8vCores
  • PowerBI Desktop 运行 GP 计算 ->目的是在 app.powerbi.com 上发布

数据以 JSON 格式提取并存储在两个列表中(id::bigint、document::json)。然后使用基于 id 之间具有一对多关系的视图转换为多列。

问题是其中一个表有约 450 万行,每次我在 Power BI 中执行操作时,数据库都需要 5 分钟以上的时间来处理请求。通过操作,我正在执行非常简单的操作:使用单个视图(具有 4M+ 条目)作为数据源向报告添加一列。

我已经尝试了向数据库添加更多计算的明显方法,但效果却微乎其微。 之前我尝试使用ElasticSearch来存储数据,发现很难与Power BI集成并操作大量数据。

有什么建议我应该看什么吗?

稍后编辑 - 我正在运行的查询:

EXPLAIN(ANALYZE,VERBOSE,BUFFERS)
SELECT DISTINCT PROJECTS.ID,
    PROJECTS.NAME,
    PROJECTS.WEB_URL,
    USERS.NAME AS OWNER_NAME,
    USERS.EMAIL AS OWNER_EMAIL,
    JOBS.USER_NAME AS JOB_USER_NAME,
    JOBS.USER_EMAIL AS JOB_USER_EMAIL
FROM "some sch".PROJECTS AS PROJECTS
LEFT JOIN "some sch".USERS AS USERS ON PROJECTS.CREATOR_USER_ID = USERS.ID
LEFT JOIN "some sch".JOBS AS JOBS ON PROJECTS.ID = JOBS.PROJECT_ID
WHERE PROJECTS.ID = 706
    OR PROJECTS.ID = 2319
    OR PROJECTS.ID = 13118
    OR PROJECTS.ID = 5201
    OR PROJECTS.ID = 8049
    OR PROJECTS.ID = 8091;

使用查询计划:

"  Output: projects.id, projects.name, projects.web_url, ((users.document ->> 'name'::text)), ((users.document ->> 'email'::text)), (((jobs.document -> 'commit'::text) ->> 'author_name'::text)), (((jobs.document -> 'commit'::text) ->> 'committer_email'::text))"
"  Buffers: shared hit=2696055 dirtied=3763"
"  ->  Sort  (cost=1216639.41..1216984.48 rows=138030 width=200) (actual time=369683.465..369683.815 rows=13024 loops=1)"
"        Output: projects.id, projects.name, projects.web_url, ((users.document ->> 'name'::text)), ((users.document ->> 'email'::text)), (((jobs.document -> 'commit'::text) ->> 'author_name'::text)), (((jobs.document -> 'commit'::text) ->> 'committer_email'::text))"
"        Sort Key: projects.id, projects.name, projects.web_url, ((users.document ->> 'name'::text)), ((users.document ->> 'email'::text)), (((jobs.document -> 'commit'::text) ->> 'author_name'::text)), (((jobs.document -> 'commit'::text) ->> 'committer_email'::text))"
"        Sort Method: quicksort  Memory: 3792kB"
"        Buffers: shared hit=2696055 dirtied=3763"
"        ->  Hash Right Join  (cost=4835.61..1197116.61 rows=138030 width=200) (actual time=81549.985..369663.270 rows=13024 loops=1)"
"              Output: projects.id, projects.name, projects.web_url, ((users.document ->> 'name'::text)), ((users.document ->> 'email'::text)), (((jobs.document -> 'commit'::text) ->> 'author_name'::text)), (((jobs.document -> 'commit'::text) ->> 'committer_email'::text))"
"              Hash Cond: (((((jobs.document -> 'pipeline'::text) ->> 'project_id'::text))::bigint) = projects.id)"
"              Buffers: shared hit=2696055 dirtied=3763"
"              ->  Index Scan using jobs_pkey on public.jobs  (cost=0.43..1127637.25 rows=4601009 width=427) (actual time=0.078..369088.748 rows=4725016 loops=1)"
"                    Output: jobs.id, NULL::bigint, (((jobs.document -> 'pipeline'::text) ->> 'project_id'::text))::bigint, NULL::bigint, NULL::bigint, NULL::text, NULL::text, NULL::text, NULL::timestamp with time zone, NULL::timestamp with time zone, NULL::double precision, NULL::double precision, NULL::text, ((jobs.document -> 'commit'::text) ->> 'author_name'::text), ((jobs.document -> 'commit'::text) ->> 'committer_email'::text), NULL::text, NULL::text, NULL::boolean, NULL::boolean, NULL::text, NULL::text, NULL::text, NULL::boolean"
"                    Buffers: shared hit=2695341 dirtied=3763"
"              ->  Hash  (cost=4835.11..4835.11 rows=6 width=136) (actual time=52.747..52.751 rows=6 loops=1)"
"                    Output: projects.id, projects.name, projects.web_url, ((users.document ->> 'name'::text)), ((users.document ->> 'email'::text))"
"                    Buckets: 1024  Batches: 1  Memory Usage: 9kB"
"                    Buffers: shared hit=714"
"                    ->  Merge Right Join  (cost=15.70..4835.11 rows=6 width=136) (actual time=3.388..52.744 rows=6 loops=1)"
"                          Output: projects.id, projects.name, projects.web_url, ((users.document ->> 'name'::text)), ((users.document ->> 'email'::text))"
"                          Merge Cond: (users.id = projects.creator_user_id)"
"                          Buffers: shared hit=714"
"                          ->  Index Scan using users_pkey on public.users  (cost=0.29..4687.80 rows=10544 width=193) (actual time=0.042..51.642 rows=2318 loops=1)"
"                                Output: users.id, (users.document ->> 'name'::text), NULL::text, NULL::text, (users.document ->> 'email'::text), NULL::text, NULL::timestamp with time zone, NULL::timestamp with time zone, NULL::timestamp with time zone, NULL::boolean"
"                                Buffers: shared hit=678"
"                          ->  Sort  (cost=15.41..15.43 rows=6 width=80) (actual time=0.986..0.990 rows=6 loops=1)"
"                                Output: projects.id, projects.name, projects.web_url, projects.creator_user_id"
"                                Sort Key: projects.creator_user_id"
"                                Sort Method: quicksort  Memory: 25kB"
"                                Buffers: shared hit=36"
"                                ->  Subquery Scan on projects  (cost=15.26..15.34 rows=6 width=80) (actual time=0.975..0.979 rows=6 loops=1)"
"                                      Output: projects.id, projects.name, projects.web_url, projects.creator_user_id"
"                                      Buffers: shared hit=36"
"                                      ->  Sort  (cost=15.26..15.28 rows=6 width=537) (actual time=0.974..0.977 rows=6 loops=1)"
"                                            Output: projects_1.id, (((projects_1.document ->> 'creator_id'::text))::bigint), ((projects_1.document ->> 'name'::text)), NULL::text, NULL::text, ((projects_1.document ->> 'web_url'::text)), NULL::timestamp with time zone, NULL::timestamp with time zone, NULL::bigint, NULL::boolean, NULL::bigint, NULL::bigint, NULL::bigint, NULL::bigint, NULL::bigint, NULL::bigint, NULL::bigint, NULL::bigint, NULL::bigint, NULL::bigint, NULL::numeric, NULL::numeric, NULL::numeric, NULL::numeric, NULL::numeric, NULL::numeric, NULL::numeric, NULL::numeric, NULL::numeric"
"                                            Sort Key: projects_1.id"
"                                            Sort Method: quicksort  Memory: 26kB"
"                                            Buffers: shared hit=36"
"                                            ->  Bitmap Heap Scan on public.projects projects_1  (cost=8.38..15.18 rows=6 width=537) (actual time=0.285..0.965 rows=6 loops=1)"
"                                                  Output: projects_1.id, ((projects_1.document ->> 'creator_id'::text))::bigint, (projects_1.document ->> 'name'::text), NULL::text, NULL::text, (projects_1.document ->> 'web_url'::text), NULL::timestamp with time zone, NULL::timestamp with time zone, NULL::bigint, NULL::boolean, NULL::bigint, NULL::bigint, NULL::bigint, NULL::bigint, NULL::bigint, NULL::bigint, NULL::bigint, NULL::bigint, NULL::bigint, NULL::bigint, NULL::numeric, NULL::numeric, NULL::numeric, NULL::numeric, NULL::numeric, NULL::numeric, NULL::numeric, NULL::numeric, NULL::numeric"
"                                                  Recheck Cond: ((projects_1.id = 706) OR (projects_1.id = 2319) OR (projects_1.id = 13118) OR (projects_1.id = 5201) OR (projects_1.id = 8049) OR (projects_1.id = 8091))"
"                                                  Heap Blocks: exact=6"
"                                                  Buffers: shared hit=36"
"                                                  ->  BitmapOr  (cost=8.38..8.38 rows=6 width=0) (actual time=0.033..0.035 rows=0 loops=1)"
"                                                        Buffers: shared hit=12"
"                                                        ->  Bitmap Index Scan on projects_pkey  (cost=0.00..1.40 rows=1 width=0) (actual time=0.009..0.009 rows=1 loops=1)"
"                                                              Index Cond: (projects_1.id = 706)"
"                                                              Buffers: shared hit=2"
"                                                        ->  Bitmap Index Scan on projects_pkey  (cost=0.00..1.40 rows=1 width=0) (actual time=0.011..0.011 rows=1 loops=1)"
"                                                              Index Cond: (projects_1.id = 2319)"
"                                                              Buffers: shared hit=2"
"                                                        ->  Bitmap Index Scan on projects_pkey  (cost=0.00..1.40 rows=1 width=0) (actual time=0.004..0.004 rows=1 loops=1)"
"                                                              Index Cond: (projects_1.id = 13118)"
"                                                              Buffers: shared hit=2"
"                                                        ->  Bitmap Index Scan on projects_pkey  (cost=0.00..1.40 rows=1 width=0) (actual time=0.004..0.004 rows=1 loops=1)"
"                                                              Index Cond: (projects_1.id = 5201)"
"                                                              Buffers: shared hit=2"
"                                                        ->  Bitmap Index Scan on projects_pkey  (cost=0.00..1.40 rows=1 width=0) (actual time=0.004..0.004 rows=1 loops=1)"
"                                                              Index Cond: (projects_1.id = 8049)"
"                                                              Buffers: shared hit=2"
"                                                        ->  Bitmap Index Scan on projects_pkey  (cost=0.00..1.40 rows=1 width=0) (actual time=0.002..0.002 rows=1 loops=1)"
"                                                              Index Cond: (projects_1.id = 8091)"
"                                                              Buffers: shared hit=2"
"Planning Time: 0.385 ms"
"Execution Time: 369688.318 ms"

谢谢, 亚历克斯

I'm trying to improve performance of a reporting solution I'm building. The tech I'm using is:

  • Custom build Python tool extracting data and storing it in
  • Azure Hosted PostgreSQL running GP compute with 8vCores
  • PowerBI Desktop -> intention is to publish on app.powerbi.com

Data is extracted in JSON format and stored in two column tables (id::bigint, document::json). Then transformed into multi-column using views which have 1-to-many relationships between them based on ids.

Problem is one of the tables has ~4.5M rows and every time I do an action in Power BI it takes 5+ minutes for the database to process the request. By action I'm doing really simple operations: adding a column to a report using a single view (with 4M+ entries) as data source.

I've tried the obvious of adding more compute to my database, but got minimal improvement.
Beforehand I tried using ElasticSearch for storing data and found it difficult to integrate with Power BI and operate large sets of data.

Any suggestions what I should be looking at?

Later Edit - query I'm running:

EXPLAIN(ANALYZE,VERBOSE,BUFFERS)
SELECT DISTINCT PROJECTS.ID,
    PROJECTS.NAME,
    PROJECTS.WEB_URL,
    USERS.NAME AS OWNER_NAME,
    USERS.EMAIL AS OWNER_EMAIL,
    JOBS.USER_NAME AS JOB_USER_NAME,
    JOBS.USER_EMAIL AS JOB_USER_EMAIL
FROM "some sch".PROJECTS AS PROJECTS
LEFT JOIN "some sch".USERS AS USERS ON PROJECTS.CREATOR_USER_ID = USERS.ID
LEFT JOIN "some sch".JOBS AS JOBS ON PROJECTS.ID = JOBS.PROJECT_ID
WHERE PROJECTS.ID = 706
    OR PROJECTS.ID = 2319
    OR PROJECTS.ID = 13118
    OR PROJECTS.ID = 5201
    OR PROJECTS.ID = 8049
    OR PROJECTS.ID = 8091;

With the Query Plan:

"  Output: projects.id, projects.name, projects.web_url, ((users.document ->> 'name'::text)), ((users.document ->> 'email'::text)), (((jobs.document -> 'commit'::text) ->> 'author_name'::text)), (((jobs.document -> 'commit'::text) ->> 'committer_email'::text))"
"  Buffers: shared hit=2696055 dirtied=3763"
"  ->  Sort  (cost=1216639.41..1216984.48 rows=138030 width=200) (actual time=369683.465..369683.815 rows=13024 loops=1)"
"        Output: projects.id, projects.name, projects.web_url, ((users.document ->> 'name'::text)), ((users.document ->> 'email'::text)), (((jobs.document -> 'commit'::text) ->> 'author_name'::text)), (((jobs.document -> 'commit'::text) ->> 'committer_email'::text))"
"        Sort Key: projects.id, projects.name, projects.web_url, ((users.document ->> 'name'::text)), ((users.document ->> 'email'::text)), (((jobs.document -> 'commit'::text) ->> 'author_name'::text)), (((jobs.document -> 'commit'::text) ->> 'committer_email'::text))"
"        Sort Method: quicksort  Memory: 3792kB"
"        Buffers: shared hit=2696055 dirtied=3763"
"        ->  Hash Right Join  (cost=4835.61..1197116.61 rows=138030 width=200) (actual time=81549.985..369663.270 rows=13024 loops=1)"
"              Output: projects.id, projects.name, projects.web_url, ((users.document ->> 'name'::text)), ((users.document ->> 'email'::text)), (((jobs.document -> 'commit'::text) ->> 'author_name'::text)), (((jobs.document -> 'commit'::text) ->> 'committer_email'::text))"
"              Hash Cond: (((((jobs.document -> 'pipeline'::text) ->> 'project_id'::text))::bigint) = projects.id)"
"              Buffers: shared hit=2696055 dirtied=3763"
"              ->  Index Scan using jobs_pkey on public.jobs  (cost=0.43..1127637.25 rows=4601009 width=427) (actual time=0.078..369088.748 rows=4725016 loops=1)"
"                    Output: jobs.id, NULL::bigint, (((jobs.document -> 'pipeline'::text) ->> 'project_id'::text))::bigint, NULL::bigint, NULL::bigint, NULL::text, NULL::text, NULL::text, NULL::timestamp with time zone, NULL::timestamp with time zone, NULL::double precision, NULL::double precision, NULL::text, ((jobs.document -> 'commit'::text) ->> 'author_name'::text), ((jobs.document -> 'commit'::text) ->> 'committer_email'::text), NULL::text, NULL::text, NULL::boolean, NULL::boolean, NULL::text, NULL::text, NULL::text, NULL::boolean"
"                    Buffers: shared hit=2695341 dirtied=3763"
"              ->  Hash  (cost=4835.11..4835.11 rows=6 width=136) (actual time=52.747..52.751 rows=6 loops=1)"
"                    Output: projects.id, projects.name, projects.web_url, ((users.document ->> 'name'::text)), ((users.document ->> 'email'::text))"
"                    Buckets: 1024  Batches: 1  Memory Usage: 9kB"
"                    Buffers: shared hit=714"
"                    ->  Merge Right Join  (cost=15.70..4835.11 rows=6 width=136) (actual time=3.388..52.744 rows=6 loops=1)"
"                          Output: projects.id, projects.name, projects.web_url, ((users.document ->> 'name'::text)), ((users.document ->> 'email'::text))"
"                          Merge Cond: (users.id = projects.creator_user_id)"
"                          Buffers: shared hit=714"
"                          ->  Index Scan using users_pkey on public.users  (cost=0.29..4687.80 rows=10544 width=193) (actual time=0.042..51.642 rows=2318 loops=1)"
"                                Output: users.id, (users.document ->> 'name'::text), NULL::text, NULL::text, (users.document ->> 'email'::text), NULL::text, NULL::timestamp with time zone, NULL::timestamp with time zone, NULL::timestamp with time zone, NULL::boolean"
"                                Buffers: shared hit=678"
"                          ->  Sort  (cost=15.41..15.43 rows=6 width=80) (actual time=0.986..0.990 rows=6 loops=1)"
"                                Output: projects.id, projects.name, projects.web_url, projects.creator_user_id"
"                                Sort Key: projects.creator_user_id"
"                                Sort Method: quicksort  Memory: 25kB"
"                                Buffers: shared hit=36"
"                                ->  Subquery Scan on projects  (cost=15.26..15.34 rows=6 width=80) (actual time=0.975..0.979 rows=6 loops=1)"
"                                      Output: projects.id, projects.name, projects.web_url, projects.creator_user_id"
"                                      Buffers: shared hit=36"
"                                      ->  Sort  (cost=15.26..15.28 rows=6 width=537) (actual time=0.974..0.977 rows=6 loops=1)"
"                                            Output: projects_1.id, (((projects_1.document ->> 'creator_id'::text))::bigint), ((projects_1.document ->> 'name'::text)), NULL::text, NULL::text, ((projects_1.document ->> 'web_url'::text)), NULL::timestamp with time zone, NULL::timestamp with time zone, NULL::bigint, NULL::boolean, NULL::bigint, NULL::bigint, NULL::bigint, NULL::bigint, NULL::bigint, NULL::bigint, NULL::bigint, NULL::bigint, NULL::bigint, NULL::bigint, NULL::numeric, NULL::numeric, NULL::numeric, NULL::numeric, NULL::numeric, NULL::numeric, NULL::numeric, NULL::numeric, NULL::numeric"
"                                            Sort Key: projects_1.id"
"                                            Sort Method: quicksort  Memory: 26kB"
"                                            Buffers: shared hit=36"
"                                            ->  Bitmap Heap Scan on public.projects projects_1  (cost=8.38..15.18 rows=6 width=537) (actual time=0.285..0.965 rows=6 loops=1)"
"                                                  Output: projects_1.id, ((projects_1.document ->> 'creator_id'::text))::bigint, (projects_1.document ->> 'name'::text), NULL::text, NULL::text, (projects_1.document ->> 'web_url'::text), NULL::timestamp with time zone, NULL::timestamp with time zone, NULL::bigint, NULL::boolean, NULL::bigint, NULL::bigint, NULL::bigint, NULL::bigint, NULL::bigint, NULL::bigint, NULL::bigint, NULL::bigint, NULL::bigint, NULL::bigint, NULL::numeric, NULL::numeric, NULL::numeric, NULL::numeric, NULL::numeric, NULL::numeric, NULL::numeric, NULL::numeric, NULL::numeric"
"                                                  Recheck Cond: ((projects_1.id = 706) OR (projects_1.id = 2319) OR (projects_1.id = 13118) OR (projects_1.id = 5201) OR (projects_1.id = 8049) OR (projects_1.id = 8091))"
"                                                  Heap Blocks: exact=6"
"                                                  Buffers: shared hit=36"
"                                                  ->  BitmapOr  (cost=8.38..8.38 rows=6 width=0) (actual time=0.033..0.035 rows=0 loops=1)"
"                                                        Buffers: shared hit=12"
"                                                        ->  Bitmap Index Scan on projects_pkey  (cost=0.00..1.40 rows=1 width=0) (actual time=0.009..0.009 rows=1 loops=1)"
"                                                              Index Cond: (projects_1.id = 706)"
"                                                              Buffers: shared hit=2"
"                                                        ->  Bitmap Index Scan on projects_pkey  (cost=0.00..1.40 rows=1 width=0) (actual time=0.011..0.011 rows=1 loops=1)"
"                                                              Index Cond: (projects_1.id = 2319)"
"                                                              Buffers: shared hit=2"
"                                                        ->  Bitmap Index Scan on projects_pkey  (cost=0.00..1.40 rows=1 width=0) (actual time=0.004..0.004 rows=1 loops=1)"
"                                                              Index Cond: (projects_1.id = 13118)"
"                                                              Buffers: shared hit=2"
"                                                        ->  Bitmap Index Scan on projects_pkey  (cost=0.00..1.40 rows=1 width=0) (actual time=0.004..0.004 rows=1 loops=1)"
"                                                              Index Cond: (projects_1.id = 5201)"
"                                                              Buffers: shared hit=2"
"                                                        ->  Bitmap Index Scan on projects_pkey  (cost=0.00..1.40 rows=1 width=0) (actual time=0.004..0.004 rows=1 loops=1)"
"                                                              Index Cond: (projects_1.id = 8049)"
"                                                              Buffers: shared hit=2"
"                                                        ->  Bitmap Index Scan on projects_pkey  (cost=0.00..1.40 rows=1 width=0) (actual time=0.002..0.002 rows=1 loops=1)"
"                                                              Index Cond: (projects_1.id = 8091)"
"                                                              Buffers: shared hit=2"
"Planning Time: 0.385 ms"
"Execution Time: 369688.318 ms"

Thanks,
Alex

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

一个人的旅程 2025-01-21 15:46:53

您面临 PostGreSQL 最大的性能问题,与其他 RDBMS 相比,其 COUNT 类型查询极其慢。

阅读我写的有关它的文章,该文章显示最多比 Microsoft SQL Server 慢 1500 倍,平均慢 114 倍...

PostGreSQL 与 Microsoft SQL Server – 比较第 2 部分:COUNT 性能

提高此类查询性能的唯一方法是创建物化视图...PostGreSQL 没有垂直索引(又名列存储),也没有通常用于 BI 的数据压缩

You face the biggest performance problem of PostGreSQL whose COUNT type queries are extremely slow compared to other RDBMSs.

Read the article I wrote about it which shows a gap at most 1500 times slower than Microsoft SQL Server and on average 114 times...

PostGreSQL vs Microsoft SQL Server – Comparison part 2 : COUNT performances

The only way to improve performances of such queries is to create materialized views... PostGreSQL does not have vertical indexes (aka columnstore), nor data compression that are usually used for BI

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文