Greenplum 与 PostgreSQL

发布于 2024-10-22 04:25:08 字数 474 浏览 11 评论 0原文

支持和反对使用 Greenplum 而不是 PostgreSQL 在 web 应用程序 (django) 环境中?

我的直觉反应是更喜欢 PostgreSQL 的开源方法和庞大的知识库。

我的配置(尽管我很想了解任何其他配置)是一家中型企业,拥有 2 台 Web 服务器和(目前)2 台数据库服务器。

需要对比的领域是二进制数据处理复制中的节点数量以及我个人最喜欢的:社区支持和熟练的工程师支持。

使用 Greenplum 代替 PostgreSQL 有何优缺点?

What are the arguments for and against using Greenplum instead of PostgreSQL in a webapp (django) environment?

My gut reaction is to prefer PostgreSQL's open-source approach and huge knowledgebase.

My configuration (though I'd love to hear about any other configuration) is a medium-sized business with 2 web servers and (at the moment) 2 database servers.

Areas to contrast are binary data crunching, number of nodes in the replication and my personal favorite: communitiy support and skilled engineer support.

What are the pros and cons of using Greenplum instead of PostgreSQL?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(7

浮萍、无处依 2024-10-29 04:25:08

我对 Greenplum 了解不多,只是快速浏览一下您发送的链接。数据仓库与事务操作数据存储不同。前者用于即席查询、统计分析、维度分析、以读取为主的历史数据访问。后者用于实时读取/写入操作数据。他们是免费的。

我猜你想要 PostgreSQL。

谁在向你推销 Greenplum?为什么?如果它被作为替代方案提出,我会更深入地挖掘并反驳这个论点。

I don't know much about Greenplum, except for quickly skimming the link you send. A data warehouse is not the same thing as a transactional operational data store. The former is for ad hoc queries, statistical analysis, dimensional analysis, read-mostly access to historical data. The latter is for real-time, read/write of operational data. They're complimentary.

I'm guessing that you want PostgreSQL.

Who is pushing Greenplum on you and why? If it's being presented as an alternative, I'd dig deeper and rebut the argument.

绅刃 2024-10-29 04:25:08

Greenplum 是 PostgreSQL 的 MPP 改编版。它针对大量数据的仓储和/或分析进行了优化,但在事务环境中表现不佳。如果您需要大型 DW 环境,请考虑 Greenplum。如果您需要 OLTP 或更小的数据库大小(10TB 以下),那么请考虑 PostgreSQL。

Greenplum is an MPP adaption of PostgreSQL. It's optimized for warehousing and/or analytics on large sets of data and would not perform that well in a transactional environment. If you need a large DW environment, look at Greenplum. If you need OLTP or smaller DB sizes (under 10TB) then look at PostgreSQL.

陌路终见情 2024-10-29 04:25:08

Greenplum 是一个 MPP 分析 (OLAP) DBMS。 PostgreSQL 是一个 OLTP DBMS。而且总的来说,市场上没有一个解决方案可以同时擅长 OLAP 和 OLTP,你可以找到我的想法 此处

WebApp 后端将始终创建 OLTP 工作负载。 Greenplum 的事务处理开销很大,因为它是一个分布式系统,因此不要指望它能为您提供超过 500-600 TPS。相比之下,Postgres 通过正确的调整可以达到数十万的 TPS。

相比之下,当您需要 OLAP 工作负载时,Postgres 只能为您提供单主机处理、无动态分区消除分区、无压缩、无列式存储。虽然 Greenplum 能够在集群上并行处理数据。

因此,您正在寻找的解决方案是一个典型的数据仓库案例 - 使用 OLTP 解决方案来处理高事务工作负载,使用 ETL/ELT 将数据提取到 DWH,然后在其上运行复杂的数据处理查询。

目前 PostgreSQL 和 Greenplum 都在开源产品,所以你可以自由选择其中任何一个,但当然 PostgreSQL 社区是更大的 ATM

Greenplum is an MPP analytical (OLAP) DBMS. PostgreSQL is an OLTP DBMS. And in general, there is not a single solution on the market that can be good at both OLAP and OLTP at the same time, you can find my thoughts on it here

The WebApp backend will always create OLTP workload. Greenplum has a big overhead for transaction processing as it is a distributed system, so don't expect this to deliver you more than 500-600 TPS. Postgres in contrast can go to hundreds of thousands of TPS with the right tuning.

In contrast, when you need a OLAP workload, Postgres can offer you only a single host processing, no partitioning with dynamic partition elimination, no compression, no columnar store. While Greenplum would be able to crunch your data in parallel on the cluster.

So the solution you are looking for is a typical data warehouse case - use OLTP solution for high transactional workload, extract the data to the DWH with ETL/ELT, and then run complex data crunching queries on it

At the moment both PostgreSQL and Greenplum are open source products, so you are free to chose any of them, but of cause PostgreSQL community is bigger ATM

过去的过去 2024-10-29 04:25:08

由于 Greenplum 使用并行处理,因此运行大量微小读取查询会产生开销,因为主节点需要与底层数据节点通信以检索所有这些查询的答案。对于耗时数毫秒的查询,预计 Greenplum 的性能会降低一个数量级。

Since Greenplum utilizes parallel processing, there will be overhead with running lots of tiny read queries as the master node needs to communicate with the underlying data nodes to retrieve an answers to all these queries. For a query taking milliseconds, expect an order of magnitude slower performance for Greenplum.

离去的眼神 2024-10-29 04:25:08

如果您正在寻找基于 PostgreSQL 的数据仓库解决方案,我还会考虑 GridSQL。它是多个 PostgreSQL 实例上的并行化层,并且免费且开源。

就像其他评论中提到的那样,它对于许多小毫秒查询来说效果不佳,但对于长时间运行的查询将有很大帮助。 GridSQL 也不包括 DW 优化,例如 Greenplum 的列式存储,但您可以利用约束排除分区(例如:按日期范围划分子表)与并行性相结合,更快地获得查询结果。

您甚至还可以在单​​个多核服务器上使用它,因为 PostgreSQL 在处理查询时仅使用单个核。

If you are looking for a PostgreSQL-based data warehousing solution, I would also look at GridSQL. It is a parallelization layer over multiple PostgreSQL instances, and is free and open source.

Like mentioned in other comments, it will not perform well for many small millisecond queries, but will help you greatly for long running queries. GridSQL also will not include DW optimizations like columnar storage that Greenplum has, but you can take advantage of constraint exclusion partitioning (ex: subtables by date range) combined with parallelism to get your query results faster.

You can also even use it on a single multi-core server, as PostgreSQL will only use a single core when processing a query.

温馨耳语 2024-10-29 04:25:08

我认为 Greenplum 更好地利用了并行处理。不过它是基于 PostgreSQL 的。

Greenplum 有一个免费社区版。您始终可以在自己的环境中下载和测试。

I think Greenplum takes better advantage of parallel processing. It's based on PostgreSQL, though.

Greenplum has a free community edition. You can always download and test in your own environment.

夜夜流光相皎洁 2024-10-29 04:25:08

如果任何数据处理时间超过一个小时,您添加的每个核心都会获得线性性能提升。对于需要更少时间来完成的事情来说,真的不值得付出努力。

If any data crunching takes longer than an hour, you'll get linear performance boosts for every core you add. It's not really worth the effort for anything that takes less time to crunch through.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文