I work in the telecom industry. We deal with large data-sets and complex EDW(enterprise data warehouse) models.We started with Teradata and it was good for few years. Then the data increased exponentially, and as you know expansion in Teradata is expensive. So, we evaluated EMCs namely green plum, oracle exadata, hp Vertica and IBM netteza.
In speed, generation of 20 reports went like this: 1. Vertica, 2. Netteza, 3. green plum, 4. oracle
In compression ratio: Vertica had a natural advantage. Among others IBM is good too. The worst as per the benchmarks is emc and oracle. As always expected as its both want to sell ton of storage and hardware.
Scalability: All do scale well.
Loading time: emc is the best here, others (teradata , Vertica, oracle , IBM) are good too.
Concurrent user query :Vertica, emc, green plum, then only IBM. Oracle exadata is slow in any type of query case comparatively but much better than its old school 10g.
Price: Teradata > Oracle > IBM > HP > EMC
Note: Need to compare apple to apple, same no of cores ,ram,data volume, and reports
We chose Vertica for hardware independent pricing model, lower pricing and good performance. Now all 40+ users are happy to generate reports without waiting and it all fit in the low cost hp dl380 servers. it is great for olap /edw use case.
All this analysis is only for edw/analytics/olap case. I am still an oracle fan boy for all oltp, rich plsql, connectivity etc on any hardware or system. Exadata gives a decent mixed workload, but unreasonable in Price/performance ratio and still need to migrate 10g code to exadata best practice (sort of MMP like, bulk processing etc, and its time consuming than what they claim.
We've been working in Hadoop for 4 years, and Vertica for 2. We had massive loading and indexing problems with our tables in MySQL. We were running on fumes with our home-grown sharding solution. We could have invested heavily in developing a more sophisticated sharding solution, which would have been quite painful, imo. We could have thought harder about what data we absolutely needed to keep in a SQL database.
But at the end of the day, switching from MySQL to Vertica was what we chose. Vertica performance patterns are quite different from MySQL's, which comes with its own headaches. But it can load a lot of data very quickly, and it is good at heavy duty queries that would make MySQL's head spin.
The way I see it, Vertica is a solution when you are already invested in SQL and need a heavier duty SQL database. I'm not an expert, so I couldn't tell you what a transition to Oracle or DB2 would have been like compared to Vertica, neither in terms of integration effort or monetary cost.
Vertica offers a lot of features we've barely looked into. Those might be very attractive to others with use cases different to ours.
I'm a Vertica DBA and prior to that was a developer with Vertica. Michael Stonebreaker (the guy behind Ingres, Vertica, and other databases) has some critiques of NoSQL that are worth listening to.
Basically, here are the advantages of Vertica as I see them:
it's rather fast on large amounts of data
it's performance is similar (so I can gather) to other data warehousing solutions but it's advantage is clustering and commodity hardware. So you can scale by adding more commodity hardware. It looks cheap in terms of overall cost per TB. (Going from memory not an exact quote.)
Again, it's for data warehousing.
You get to use traditional SQL and tables. It's under the hood that's different.
I can't speak to the other products, but I'm sure a lot of them are fine too.
云支持:Pivotal 计划打包其 Cloud Foundry 软件,以便它也可以用于在其他云上托管 Pivotal,包括 Amazon Web Services 的 EC2。 Pivotal 数据管理将可在各种云环境中使用,并且不依赖于专有的 VMware 系统。将针对 OpenStack、vSphere、vCloud Director 或自有品牌。 IBM 宣布其 PaaS 已实现 Cloud Foundry 标准化。汇合页面。
两种硬件“设备”产品:Isilon NAS 和 Isilon NAS Greenplum DCA。
Pivotal, formerly Greenplum, is the well-funded spinoff from EMC, VMware and GE. Pivotal's market are enterprises (and Homeland Cybersecurity agencies) with multi-Petabyte size databases needing complex analytics and high speed ETL. Greenplum’s origin is a PostgreSQL DB redesigned for Map Reduced MPP, with later additions for columnar-support and HDFS. It marries the best of SQL + NoSQL making NewSQL.
Features:
In 2015H1 most of their code, including Greenplum DB & HAWQ, will go Open Source. Some advanced management & performance features at the top of the stack will remain proprietary.
MPP (Massively Parallel Processing) share-nothing RDBMS database designed for multi-terrabyte to multi-petabyte environments.
Full SQL Compliance - supporting all versions of SQL: ‘92, ‘99, 2003 OLAP, etc. 100% compatible with PostgreSQL 8.2. •Only SQL over HADOOP capable of handling all 99 queries used by the TPC-DS benchmark standard without rewriting. The competition cannot do many of them and are significantly slower. SIGMON whitepaper.
ACID compliance.
Supports data stored in HDFS, Hive, HBase, Avro, ProtoBuf, Delimited Text and Sequence Files.
Solr/Lucene integration for multi-lingual full-text search embedded in the SQL.
Incorporates Open Source Software: Spring, Cloud Foundry, Redis.io, RabbitMQ, Grails, Groovy, Open Chorus, Pig, ZooKeeper, Mahout, MADlib, MapR. Some of these are used at EBSCO.
Native connectivity to HBase, which is a popular column-store-like technology for Hadoop.
VMware's participation in $150m investment in MongoDB will likely lead to integration of petabyte-scale XML files.
Table-by-table specification of distribution keys allow you to design your table schemas to take advantage of node-local joins and group bys, but will perform will even without this.
Row and/or Column-oriented data storage. It is the only database where a table can be polymorphic with both columnar and row-based partitions as defined by the DBA.
A column-store table can have a different compression algorithm per column because different datatypes have different compression characteristics to optimize their storage.
Advanced Map-Reduce-like CBO Query Optimizer – queries can be run on hundreds of thousands of nodes.
It is the only database with a dynamic distributed pipeline execution model for query processing. While older databases rely on materialized execution Greenplum doesn't have to write data to disk with every intermediate query step. It streams data to the next stage of a query plan in memory, and never has to materialize the data to disk, so it's much faster than what anybody has demonstrated on Hadoop.
Complex queries on large data sets are solved in seconds or even sub-seconds.
Data management – provides table statistics, table security.
Deep analytics – including data mining or machine learning algorithms using MADlib. Deep Semantic Textual Analytics using GPText.
Graphical Analysis - billion edge distributed in-memory graph database and algorithms using GraphLab.
Integration of SQL, Solr indexes, GPText, MADlib and GraphLab in a single query for massive syntactical parsing and graph/matrix affinity analysis for deep search analytics.
Fully ODBC/JDBC compliant.
Distributed ETL rate of 16 TB/hr!! Integration with Talend available.
Cloud support: Pivotal plans to package its Cloud Foundry software so that it can be used to host Pivotal atop other clouds as well, including Amazon Web Services' EC2. Pivotal data management will be available for use in a variety of cloud settings and will not be dependent on a proprietary VMware system. Will target OpenStack, vSphere, vCloud Director, or private brands. IBM announced it has standardized on Cloud Foundry for its PaaS. Confluence page.
Two hardware "appliance" offerings: Isilon NAS & Greenplum DCA.
There is a lot of confusion about when to use a row database like MySQL or Oracle or a columnar DB like Infobright or Vertica or a NoSQL variant or Hadoop. We wrote a white paper to try to help sort out which technologies are best suited for which use cases - you can download Emerging Database Landscape (scroll half way down) or watch an on-demand webinar on the same topic.
发布评论
评论(6)
Cassandra、Greenplum 和 Vertica 都处理大量数据,但处理方式却截然不同。
一些虚构的用例中,每个数据库都有其优势:
使用 cassandra 用于:
使用 greenplum 用于:
使用 Vertica 用于:
Cassandra, Greenplum and Vertica all handle huge amounts of data but in very different ways.
Some made up usecases where each database has its strengths:
Use cassandra for:
Use greenplum for:
Use Vertica for:
我在电信行业工作。我们处理大型数据集和复杂的 EDW(企业数据仓库)模型。我们从 Teradata 开始,几年来效果很好。然后数据呈指数级增长,正如您所知,Teradata 中的扩展非常昂贵。因此,我们评估了 EMC,即 Green Plum、Oracle Exadata、HP Vertica 和 IBM Netteza。
速度快,生成 20 份报告
是这样的:1.Vertica,2.Netteza,3.青梅,4.oracle
在压缩比上:Vertica有天然的优势。其中IBM也不错。
根据基准测试,最差的是 emc 和 oracle。正如人们所期望的那样,他们都希望出售大量的存储和硬件。
可扩展性:所有这些都可以很好地扩展。
加载时间: emc 是最好的,其他(teradata、Vertica、oracle、IBM)也不错。
并发用户查询:Vertica、emc、青梅,然后只有IBM。 Oracle Exadata 在任何类型的查询情况下都比较慢,但比其老式 10g 好得多。
价格:Teradata>甲骨文> IBM>生命值> EMC
注:需要对苹果进行比较,相同的内核数、RAM、数据量和报告
我们选择 Vertica 是因为硬件独立定价模型、较低的定价和良好的性能。现在,所有 40 多个用户都乐于生成报告,无需等待,这一切都适合低成本的 hp dl380 服务器。它非常适合 olap /edw 用例。
所有这些分析仅适用于 edw/analytics/olap 情况。我仍然是任何硬件或系统上所有 oltp、丰富的 plsql、连接等的 Oracle 粉丝。 Exadata 提供了不错的混合工作负载,但性价比不合理,并且仍然需要将 10g 代码迁移到 Exadata 最佳实践(类似于 MMP、批量处理等),并且比他们声称的更耗时。
I work in the telecom industry. We deal with large data-sets and complex EDW(enterprise data warehouse) models.We started with Teradata and it was good for few years. Then the data increased exponentially, and as you know expansion in Teradata is expensive. So, we evaluated EMCs namely green plum, oracle exadata, hp Vertica and IBM netteza.
In speed, generation of 20 reports
went like this: 1. Vertica, 2. Netteza, 3. green plum, 4. oracle
In compression ratio: Vertica had a natural advantage. Among others IBM is good too.
The worst as per the benchmarks is emc and oracle. As always expected as its both want to sell ton of storage and hardware.
Scalability: All do scale well.
Loading time: emc is the best here, others (teradata , Vertica, oracle , IBM) are good too.
Concurrent user query :Vertica, emc, green plum, then only IBM. Oracle exadata is slow in any type of query case comparatively but much better than its old school 10g.
Price: Teradata > Oracle > IBM > HP > EMC
Note: Need to compare apple to apple, same no of cores ,ram,data volume, and reports
We chose Vertica for hardware independent pricing model, lower pricing and good performance. Now all 40+ users are happy to generate reports without waiting and it all fit in the low cost hp dl380 servers. it is great for olap /edw use case.
All this analysis is only for edw/analytics/olap case. I am still an oracle fan boy for all oltp, rich plsql, connectivity etc on any hardware or system. Exadata gives a decent mixed workload, but unreasonable in Price/performance ratio and still need to migrate 10g code to exadata best practice (sort of MMP like, bulk processing etc, and its time consuming than what they claim.
我们在 Hadoop 上工作了 4 年,在 Vertica 上工作了 2 年。我们在 MySQL 中的表存在大量加载和索引问题。我们的本土分片解决方案让我们陷入困境。我们本可以投入巨资开发更复杂的分片解决方案,但在我看来,这将是相当痛苦的。我们可以更仔细地考虑一下我们绝对需要在 SQL 数据库中保存哪些数据。
但最终,我们选择了从 MySQL 切换到 Vertica。 Vertica 的性能模式与 MySQL 的性能模式有很大不同,MySQL 也有自己令人头痛的问题。但它可以非常快速地加载大量数据,并且擅长于让 MySQL 头晕的重负载查询。
在我看来,当您已经投资于 SQL 并且需要更重型的 SQL 数据库时,Vertica 是一个解决方案。我不是专家,所以我无法告诉您与 Vertica 相比,迁移到 Oracle 或 DB2 会是什么样子,无论是在集成工作还是金钱成本方面。
Vertica 提供了许多我们几乎没有研究过的功能。这些对于其他用例与我们不同的人来说可能非常有吸引力。
We've been working in Hadoop for 4 years, and Vertica for 2. We had massive loading and indexing problems with our tables in MySQL. We were running on fumes with our home-grown sharding solution. We could have invested heavily in developing a more sophisticated sharding solution, which would have been quite painful, imo. We could have thought harder about what data we absolutely needed to keep in a SQL database.
But at the end of the day, switching from MySQL to Vertica was what we chose. Vertica performance patterns are quite different from MySQL's, which comes with its own headaches. But it can load a lot of data very quickly, and it is good at heavy duty queries that would make MySQL's head spin.
The way I see it, Vertica is a solution when you are already invested in SQL and need a heavier duty SQL database. I'm not an expert, so I couldn't tell you what a transition to Oracle or DB2 would have been like compared to Vertica, neither in terms of integration effort or monetary cost.
Vertica offers a lot of features we've barely looked into. Those might be very attractive to others with use cases different to ours.
我是一名 Vertica DBA,在此之前我是 Vertica 的开发人员。 Michael Stonebreaker(Ingres、Vertica 和其他数据库的开发者)对 NoSQL 的一些批评值得一听。
基本上,我认为 Vertica 的优点如下:
我无法谈论其他产品,但我确信其中很多也都很好。
编辑:这是 Stonebreaker 的演讲:http://www.slideshare.net/Dataversity/newsql-vs-nosql-for-new-oltp-michael-stonebraker-voltdb
I'm a Vertica DBA and prior to that was a developer with Vertica. Michael Stonebreaker (the guy behind Ingres, Vertica, and other databases) has some critiques of NoSQL that are worth listening to.
Basically, here are the advantages of Vertica as I see them:
I can't speak to the other products, but I'm sure a lot of them are fine too.
Edit: Here's a talk from Stonebreaker: http://www.slideshare.net/Dataversity/newsql-vs-nosql-for-new-oltp-michael-stonebraker-voltdb
Pivotal(原名 Greenplum)是资金雄厚的 EMC、VMware 和 GE 的衍生公司。 Pivotal 的市场是拥有需要复杂分析和高速 ETL 的多 PB 大小数据库的企业(以及国土网络安全机构)。 Greenplum 的起源是针对 Map Reduced MPP 重新设计的 PostgreSQL DB,后来添加了列支持和 HDFS。它结合了 SQL + NoSQL 的优点,打造了 NewSQL。
特点:
开源。一些先进的管理和技术性能特点在
堆栈顶部将保持专有。
•只有SQL over HADOOP 能够处理TPC-DS 基准标准使用的所有99 个查询,而无需重写。竞争对手无法完成其中的许多任务,并且速度明显较慢。西格蒙白皮书。
Pivotal, formerly Greenplum, is the well-funded spinoff from EMC, VMware and GE. Pivotal's market are enterprises (and Homeland Cybersecurity agencies) with multi-Petabyte size databases needing complex analytics and high speed ETL. Greenplum’s origin is a PostgreSQL DB redesigned for Map Reduced MPP, with later additions for columnar-support and HDFS. It marries the best of SQL + NoSQL making NewSQL.
Features:
Open Source. Some advanced management & performance features at the
top of the stack will remain proprietary.
•Only SQL over HADOOP capable of handling all 99 queries used by the TPC-DS benchmark standard without rewriting. The competition cannot do many of them and are significantly slower. SIGMON whitepaper.
关于何时使用 MySQL 或 Oracle 等行数据库、Infobright 或 Vertica 等列式数据库、NoSQL 变体或 Hadoop,存在很多困惑。我们编写了一份白皮书,试图帮助找出哪些技术最适合哪些用例 - 您可以下载 新兴数据库格局(向下滚动一半)或观看同一主题的点播网络研讨会。
希望对您有用
There is a lot of confusion about when to use a row database like MySQL or Oracle or a columnar DB like Infobright or Vertica or a NoSQL variant or Hadoop. We wrote a white paper to try to help sort out which technologies are best suited for which use cases - you can download Emerging Database Landscape (scroll half way down) or watch an on-demand webinar on the same topic.
Hope either is useful for you