提高 PostgreSQL 的 PowerBI Direct Query 使用性能

发布于 2025-01-14 15:46:53 字数 10444 浏览 2 评论 0原文

我正在尝试提高我正在构建的报告解决方案的性能。我使用的技术是：

自定义构建 Python 工具提取数据并将其存储在
Azure 托管 PostgreSQL 中，使用 8vCores
PowerBI Desktop 运行 GP 计算 ->目的是在 app.powerbi.com 上发布

数据以 JSON 格式提取并存储在两个列表中（id::bigint、document::json）。然后使用基于 id 之间具有一对多关系的视图转换为多列。

问题是其中一个表有约 450 万行，每次我在 Power BI 中执行操作时，数据库都需要 5 分钟以上的时间来处理请求。通过操作，我正在执行非常简单的操作：使用单个视图（具有 4M+ 条目）作为数据源向报告添加一列。

我已经尝试了向数据库添加更多计算的明显方法，但效果却微乎其微。之前我尝试使用ElasticSearch来存储数据，发现很难与Power BI集成并操作大量数据。

有什么建议我应该看什么吗？

稍后编辑 - 我正在运行的查询：

EXPLAIN(ANALYZE,VERBOSE,BUFFERS)
SELECT DISTINCT PROJECTS.ID,
    PROJECTS.NAME,
    PROJECTS.WEB_URL,
    USERS.NAME AS OWNER_NAME,
    USERS.EMAIL AS OWNER_EMAIL,
    JOBS.USER_NAME AS JOB_USER_NAME,
    JOBS.USER_EMAIL AS JOB_USER_EMAIL
FROM "some sch".PROJECTS AS PROJECTS
LEFT JOIN "some sch".USERS AS USERS ON PROJECTS.CREATOR_USER_ID = USERS.ID
LEFT JOIN "some sch".JOBS AS JOBS ON PROJECTS.ID = JOBS.PROJECT_ID
WHERE PROJECTS.ID = 706
    OR PROJECTS.ID = 2319
    OR PROJECTS.ID = 13118
    OR PROJECTS.ID = 5201
    OR PROJECTS.ID = 8049
    OR PROJECTS.ID = 8091;

使用查询计划：

"  Output: projects.id, projects.name, projects.web_url, ((users.document ->> 'name'::text)), ((users.document ->> 'email'::text)), (((jobs.document -> 'commit'::text) ->> 'author_name'::text)), (((jobs.document -> 'commit'::text) ->> 'committer_email'::text))"
"  Buffers: shared hit=2696055 dirtied=3763"
"  ->  Sort  (cost=1216639.41..1216984.48 rows=138030 width=200) (actual time=369683.465..369683.815 rows=13024 loops=1)"
"        Output: projects.id, projects.name, projects.web_url, ((users.document ->> 'name'::text)), ((users.document ->> 'email'::text)), (((jobs.document -> 'commit'::text) ->> 'author_name'::text)), (((jobs.document -> 'commit'::text) ->> 'committer_email'::text))"
"        Sort Key: projects.id, projects.name, projects.web_url, ((users.document ->> 'name'::text)), ((users.document ->> 'email'::text)), (((jobs.document -> 'commit'::text) ->> 'author_name'::text)), (((jobs.document -> 'commit'::text) ->> 'committer_email'::text))"
"        Sort Method: quicksort  Memory: 3792kB"
"        Buffers: shared hit=2696055 dirtied=3763"
"        ->  Hash Right Join  (cost=4835.61..1197116.61 rows=138030 width=200) (actual time=81549.985..369663.270 rows=13024 loops=1)"
"              Output: projects.id, projects.name, projects.web_url, ((users.document ->> 'name'::text)), ((users.document ->> 'email'::text)), (((jobs.document -> 'commit'::text) ->> 'author_name'::text)), (((jobs.document -> 'commit'::text) ->> 'committer_email'::text))"
"              Hash Cond: (((((jobs.document -> 'pipeline'::text) ->> 'project_id'::text))::bigint) = projects.id)"
"              Buffers: shared hit=2696055 dirtied=3763"
"              ->  Index Scan using jobs_pkey on public.jobs  (cost=0.43..1127637.25 rows=4601009 width=427) (actual time=0.078..369088.748 rows=4725016 loops=1)"
"                    Output: jobs.id, NULL::bigint, (((jobs.document -> 'pipeline'::text) ->> 'project_id'::text))::bigint, NULL::bigint, NULL::bigint, NULL::text, NULL::text, NULL::text, NULL::timestamp with time zone, NULL::timestamp with time zone, NULL::double precision, NULL::double precision, NULL::text, ((jobs.document -> 'commit'::text) ->> 'author_name'::text), ((jobs.document -> 'commit'::text) ->> 'committer_email'::text), NULL::text, NULL::text, NULL::boolean, NULL::boolean, NULL::text, NULL::text, NULL::text, NULL::boolean"
"                    Buffers: shared hit=2695341 dirtied=3763"
"              ->  Hash  (cost=4835.11..4835.11 rows=6 width=136) (actual time=52.747..52.751 rows=6 loops=1)"
"                    Output: projects.id, projects.name, projects.web_url, ((users.document ->> 'name'::text)), ((users.document ->> 'email'::text))"
"                    Buckets: 1024  Batches: 1  Memory Usage: 9kB"
"                    Buffers: shared hit=714"
"                    ->  Merge Right Join  (cost=15.70..4835.11 rows=6 width=136) (actual time=3.388..52.744 rows=6 loops=1)"
"                          Output: projects.id, projects.name, projects.web_url, ((users.document ->> 'name'::text)), ((users.document ->> 'email'::text))"
"                          Merge Cond: (users.id = projects.creator_user_id)"
"                          Buffers: shared hit=714"
"                          ->  Index Scan using users_pkey on public.users  (cost=0.29..4687.80 rows=10544 width=193) (actual time=0.042..51.642 rows=2318 loops=1)"
"                                Output: users.id, (users.document ->> 'name'::text), NULL::text, NULL::text, (users.document ->> 'email'::text), NULL::text, NULL::timestamp with time zone, NULL::timestamp with time zone, NULL::timestamp with time zone, NULL::boolean"
"                                Buffers: shared hit=678"
"                          ->  Sort  (cost=15.41..15.43 rows=6 width=80) (actual time=0.986..0.990 rows=6 loops=1)"
"                                Output: projects.id, projects.name, projects.web_url, projects.creator_user_id"
"                                Sort Key: projects.creator_user_id"
"                                Sort Method: quicksort  Memory: 25kB"
"                                Buffers: shared hit=36"
"                                ->  Subquery Scan on projects  (cost=15.26..15.34 rows=6 width=80) (actual time=0.975..0.979 rows=6 loops=1)"
"                                      Output: projects.id, projects.name, projects.web_url, projects.creator_user_id"
"                                      Buffers: shared hit=36"
"                                      ->  Sort  (cost=15.26..15.28 rows=6 width=537) (actual time=0.974..0.977 rows=6 loops=1)"
"                                            Output: projects_1.id, (((projects_1.document ->> 'creator_id'::text))::bigint), ((projects_1.document ->> 'name'::text)), NULL::text, NULL::text, ((projects_1.document ->> 'web_url'::text)), NULL::timestamp with time zone, NULL::timestamp with time zone, NULL::bigint, NULL::boolean, NULL::bigint, NULL::bigint, NULL::bigint, NULL::bigint, NULL::bigint, NULL::bigint, NULL::bigint, NULL::bigint, NULL::bigint, NULL::bigint, NULL::numeric, NULL::numeric, NULL::numeric, NULL::numeric, NULL::numeric, NULL::numeric, NULL::numeric, NULL::numeric, NULL::numeric"
"                                            Sort Key: projects_1.id"
"                                            Sort Method: quicksort  Memory: 26kB"
"                                            Buffers: shared hit=36"
"                                            ->  Bitmap Heap Scan on public.projects projects_1  (cost=8.38..15.18 rows=6 width=537) (actual time=0.285..0.965 rows=6 loops=1)"
"                                                  Output: projects_1.id, ((projects_1.document ->> 'creator_id'::text))::bigint, (projects_1.document ->> 'name'::text), NULL::text, NULL::text, (projects_1.document ->> 'web_url'::text), NULL::timestamp with time zone, NULL::timestamp with time zone, NULL::bigint, NULL::boolean, NULL::bigint, NULL::bigint, NULL::bigint, NULL::bigint, NULL::bigint, NULL::bigint, NULL::bigint, NULL::bigint, NULL::bigint, NULL::bigint, NULL::numeric, NULL::numeric, NULL::numeric, NULL::numeric, NULL::numeric, NULL::numeric, NULL::numeric, NULL::numeric, NULL::numeric"
"                                                  Recheck Cond: ((projects_1.id = 706) OR (projects_1.id = 2319) OR (projects_1.id = 13118) OR (projects_1.id = 5201) OR (projects_1.id = 8049) OR (projects_1.id = 8091))"
"                                                  Heap Blocks: exact=6"
"                                                  Buffers: shared hit=36"
"                                                  ->  BitmapOr  (cost=8.38..8.38 rows=6 width=0) (actual time=0.033..0.035 rows=0 loops=1)"
"                                                        Buffers: shared hit=12"
"                                                        ->  Bitmap Index Scan on projects_pkey  (cost=0.00..1.40 rows=1 width=0) (actual time=0.009..0.009 rows=1 loops=1)"
"                                                              Index Cond: (projects_1.id = 706)"
"                                                              Buffers: shared hit=2"
"                                                        ->  Bitmap Index Scan on projects_pkey  (cost=0.00..1.40 rows=1 width=0) (actual time=0.011..0.011 rows=1 loops=1)"
"                                                              Index Cond: (projects_1.id = 2319)"
"                                                              Buffers: shared hit=2"
"                                                        ->  Bitmap Index Scan on projects_pkey  (cost=0.00..1.40 rows=1 width=0) (actual time=0.004..0.004 rows=1 loops=1)"
"                                                              Index Cond: (projects_1.id = 13118)"
"                                                              Buffers: shared hit=2"
"                                                        ->  Bitmap Index Scan on projects_pkey  (cost=0.00..1.40 rows=1 width=0) (actual time=0.004..0.004 rows=1 loops=1)"
"                                                              Index Cond: (projects_1.id = 5201)"
"                                                              Buffers: shared hit=2"
"                                                        ->  Bitmap Index Scan on projects_pkey  (cost=0.00..1.40 rows=1 width=0) (actual time=0.004..0.004 rows=1 loops=1)"
"                                                              Index Cond: (projects_1.id = 8049)"
"                                                              Buffers: shared hit=2"
"                                                        ->  Bitmap Index Scan on projects_pkey  (cost=0.00..1.40 rows=1 width=0) (actual time=0.002..0.002 rows=1 loops=1)"
"                                                              Index Cond: (projects_1.id = 8091)"
"                                                              Buffers: shared hit=2"
"Planning Time: 0.385 ms"
"Execution Time: 369688.318 ms"

谢谢，亚历克斯

原文

I'm trying to improve performance of a reporting solution I'm building. The tech I'm using is:

Custom build Python tool extracting data and storing it in
Azure Hosted PostgreSQL running GP compute with 8vCores
PowerBI Desktop -> intention is to publish on app.powerbi.com

Data is extracted in JSON format and stored in two column tables (id::bigint, document::json). Then transformed into multi-column using views which have 1-to-many relationships between them based on ids.

Problem is one of the tables has ~4.5M rows and every time I do an action in Power BI it takes 5+ minutes for the database to process the request. By action I'm doing really simple operations: adding a column to a report using a single view (with 4M+ entries) as data source.

I've tried the obvious of adding more compute to my database, but got minimal improvement.
Beforehand I tried using ElasticSearch for storing data and found it difficult to integrate with Power BI and operate large sets of data.

Any suggestions what I should be looking at?

Later Edit - query I'm running:

EXPLAIN(ANALYZE,VERBOSE,BUFFERS)
SELECT DISTINCT PROJECTS.ID,
    PROJECTS.NAME,
    PROJECTS.WEB_URL,
    USERS.NAME AS OWNER_NAME,
    USERS.EMAIL AS OWNER_EMAIL,
    JOBS.USER_NAME AS JOB_USER_NAME,
    JOBS.USER_EMAIL AS JOB_USER_EMAIL
FROM "some sch".PROJECTS AS PROJECTS
LEFT JOIN "some sch".USERS AS USERS ON PROJECTS.CREATOR_USER_ID = USERS.ID
LEFT JOIN "some sch".JOBS AS JOBS ON PROJECTS.ID = JOBS.PROJECT_ID
WHERE PROJECTS.ID = 706
    OR PROJECTS.ID = 2319
    OR PROJECTS.ID = 13118
    OR PROJECTS.ID = 5201
    OR PROJECTS.ID = 8049
    OR PROJECTS.ID = 8091;

With the Query Plan:

"  Output: projects.id, projects.name, projects.web_url, ((users.document ->> 'name'::text)), ((users.document ->> 'email'::text)), (((jobs.document -> 'commit'::text) ->> 'author_name'::text)), (((jobs.document -> 'commit'::text) ->> 'committer_email'::text))"
"  Buffers: shared hit=2696055 dirtied=3763"
"  ->  Sort  (cost=1216639.41..1216984.48 rows=138030 width=200) (actual time=369683.465..369683.815 rows=13024 loops=1)"
"        Output: projects.id, projects.name, projects.web_url, ((users.document ->> 'name'::text)), ((users.document ->> 'email'::text)), (((jobs.document -> 'commit'::text) ->> 'author_name'::text)), (((jobs.document -> 'commit'::text) ->> 'committer_email'::text))"
"        Sort Key: projects.id, projects.name, projects.web_url, ((users.document ->> 'name'::text)), ((users.document ->> 'email'::text)), (((jobs.document -> 'commit'::text) ->> 'author_name'::text)), (((jobs.document -> 'commit'::text) ->> 'committer_email'::text))"
"        Sort Method: quicksort  Memory: 3792kB"
"        Buffers: shared hit=2696055 dirtied=3763"
"        ->  Hash Right Join  (cost=4835.61..1197116.61 rows=138030 width=200) (actual time=81549.985..369663.270 rows=13024 loops=1)"
"              Output: projects.id, projects.name, projects.web_url, ((users.document ->> 'name'::text)), ((users.document ->> 'email'::text)), (((jobs.document -> 'commit'::text) ->> 'author_name'::text)), (((jobs.document -> 'commit'::text) ->> 'committer_email'::text))"
"              Hash Cond: (((((jobs.document -> 'pipeline'::text) ->> 'project_id'::text))::bigint) = projects.id)"
"              Buffers: shared hit=2696055 dirtied=3763"
"              ->  Index Scan using jobs_pkey on public.jobs  (cost=0.43..1127637.25 rows=4601009 width=427) (actual time=0.078..369088.748 rows=4725016 loops=1)"
"                    Output: jobs.id, NULL::bigint, (((jobs.document -> 'pipeline'::text) ->> 'project_id'::text))::bigint, NULL::bigint, NULL::bigint, NULL::text, NULL::text, NULL::text, NULL::timestamp with time zone, NULL::timestamp with time zone, NULL::double precision, NULL::double precision, NULL::text, ((jobs.document -> 'commit'::text) ->> 'author_name'::text), ((jobs.document -> 'commit'::text) ->> 'committer_email'::text), NULL::text, NULL::text, NULL::boolean, NULL::boolean, NULL::text, NULL::text, NULL::text, NULL::boolean"
"                    Buffers: shared hit=2695341 dirtied=3763"
"              ->  Hash  (cost=4835.11..4835.11 rows=6 width=136) (actual time=52.747..52.751 rows=6 loops=1)"
"                    Output: projects.id, projects.name, projects.web_url, ((users.document ->> 'name'::text)), ((users.document ->> 'email'::text))"
"                    Buckets: 1024  Batches: 1  Memory Usage: 9kB"
"                    Buffers: shared hit=714"
"                    ->  Merge Right Join  (cost=15.70..4835.11 rows=6 width=136) (actual time=3.388..52.744 rows=6 loops=1)"
"                          Output: projects.id, projects.name, projects.web_url, ((users.document ->> 'name'::text)), ((users.document ->> 'email'::text))"
"                          Merge Cond: (users.id = projects.creator_user_id)"
"                          Buffers: shared hit=714"
"                          ->  Index Scan using users_pkey on public.users  (cost=0.29..4687.80 rows=10544 width=193) (actual time=0.042..51.642 rows=2318 loops=1)"
"                                Output: users.id, (users.document ->> 'name'::text), NULL::text, NULL::text, (users.document ->> 'email'::text), NULL::text, NULL::timestamp with time zone, NULL::timestamp with time zone, NULL::timestamp with time zone, NULL::boolean"
"                                Buffers: shared hit=678"
"                          ->  Sort  (cost=15.41..15.43 rows=6 width=80) (actual time=0.986..0.990 rows=6 loops=1)"
"                                Output: projects.id, projects.name, projects.web_url, projects.creator_user_id"
"                                Sort Key: projects.creator_user_id"
"                                Sort Method: quicksort  Memory: 25kB"
"                                Buffers: shared hit=36"
"                                ->  Subquery Scan on projects  (cost=15.26..15.34 rows=6 width=80) (actual time=0.975..0.979 rows=6 loops=1)"
"                                      Output: projects.id, projects.name, projects.web_url, projects.creator_user_id"
"                                      Buffers: shared hit=36"
"                                      ->  Sort  (cost=15.26..15.28 rows=6 width=537) (actual time=0.974..0.977 rows=6 loops=1)"
"                                            Output: projects_1.id, (((projects_1.document ->> 'creator_id'::text))::bigint), ((projects_1.document ->> 'name'::text)), NULL::text, NULL::text, ((projects_1.document ->> 'web_url'::text)), NULL::timestamp with time zone, NULL::timestamp with time zone, NULL::bigint, NULL::boolean, NULL::bigint, NULL::bigint, NULL::bigint, NULL::bigint, NULL::bigint, NULL::bigint, NULL::bigint, NULL::bigint, NULL::bigint, NULL::bigint, NULL::numeric, NULL::numeric, NULL::numeric, NULL::numeric, NULL::numeric, NULL::numeric, NULL::numeric, NULL::numeric, NULL::numeric"
"                                            Sort Key: projects_1.id"
"                                            Sort Method: quicksort  Memory: 26kB"
"                                            Buffers: shared hit=36"
"                                            ->  Bitmap Heap Scan on public.projects projects_1  (cost=8.38..15.18 rows=6 width=537) (actual time=0.285..0.965 rows=6 loops=1)"
"                                                  Output: projects_1.id, ((projects_1.document ->> 'creator_id'::text))::bigint, (projects_1.document ->> 'name'::text), NULL::text, NULL::text, (projects_1.document ->> 'web_url'::text), NULL::timestamp with time zone, NULL::timestamp with time zone, NULL::bigint, NULL::boolean, NULL::bigint, NULL::bigint, NULL::bigint, NULL::bigint, NULL::bigint, NULL::bigint, NULL::bigint, NULL::bigint, NULL::bigint, NULL::bigint, NULL::numeric, NULL::numeric, NULL::numeric, NULL::numeric, NULL::numeric, NULL::numeric, NULL::numeric, NULL::numeric, NULL::numeric"
"                                                  Recheck Cond: ((projects_1.id = 706) OR (projects_1.id = 2319) OR (projects_1.id = 13118) OR (projects_1.id = 5201) OR (projects_1.id = 8049) OR (projects_1.id = 8091))"
"                                                  Heap Blocks: exact=6"
"                                                  Buffers: shared hit=36"
"                                                  ->  BitmapOr  (cost=8.38..8.38 rows=6 width=0) (actual time=0.033..0.035 rows=0 loops=1)"
"                                                        Buffers: shared hit=12"
"                                                        ->  Bitmap Index Scan on projects_pkey  (cost=0.00..1.40 rows=1 width=0) (actual time=0.009..0.009 rows=1 loops=1)"
"                                                              Index Cond: (projects_1.id = 706)"
"                                                              Buffers: shared hit=2"
"                                                        ->  Bitmap Index Scan on projects_pkey  (cost=0.00..1.40 rows=1 width=0) (actual time=0.011..0.011 rows=1 loops=1)"
"                                                              Index Cond: (projects_1.id = 2319)"
"                                                              Buffers: shared hit=2"
"                                                        ->  Bitmap Index Scan on projects_pkey  (cost=0.00..1.40 rows=1 width=0) (actual time=0.004..0.004 rows=1 loops=1)"
"                                                              Index Cond: (projects_1.id = 13118)"
"                                                              Buffers: shared hit=2"
"                                                        ->  Bitmap Index Scan on projects_pkey  (cost=0.00..1.40 rows=1 width=0) (actual time=0.004..0.004 rows=1 loops=1)"
"                                                              Index Cond: (projects_1.id = 5201)"
"                                                              Buffers: shared hit=2"
"                                                        ->  Bitmap Index Scan on projects_pkey  (cost=0.00..1.40 rows=1 width=0) (actual time=0.004..0.004 rows=1 loops=1)"
"                                                              Index Cond: (projects_1.id = 8049)"
"                                                              Buffers: shared hit=2"
"                                                        ->  Bitmap Index Scan on projects_pkey  (cost=0.00..1.40 rows=1 width=0) (actual time=0.002..0.002 rows=1 loops=1)"
"                                                              Index Cond: (projects_1.id = 8091)"
"                                                              Buffers: shared hit=2"
"Planning Time: 0.385 ms"
"Execution Time: 369688.318 ms"

Thanks,
Alex

分享到QQ

分享到微博