The main difference between RDBMs databases and Hive is specialization. While MySQL is general purpose database suited both for transactional processing (OLTP) and for analytics (OLAP), Hive is built for the analytics only. Technically the main difference is lack of update/delete functioality. Data can only by be added and selected. In the same time Hive is capable of processing data volumes which can not be processed by MySQL or other conventional RDBMS (in shy budget). MPP (massive parallel proecssing) databases are closest to the Hive by their functionality - while they have full SQL support they are scalable up to hundreds of computers. Another serious different - is query language. Hive do not support full SQL even in select because of it's implementation. In my view main difference is lack of join for any condition other then equal. Hive query language sintax is also a bit different so you can not connect report generation software right to Hive.
基本上,hive 是一种基于 MapReduce 构建的类似 sql 的脚本语言。当您发出命令时,命令将被解释并在分布式系统上运行。由于被处理的文件是扁平的,因此相当于在 Hadoop 中运行等效代码并收集数据。整个流程比使用 Mysql 慢得多。
Basically, hive is a sql-like scripting language built on MapReduce. When you issue commands, the commands are interpreted and ran over the distributed system. Since the files being crunched are flat, it is equivalent to running an equivalent code in Hadoop, and gathering the data. The whole flow is much slower than it would be if you used Mysql.
Hive vs Traditional database Hive --> Schema on READ - it's does not verify the schema while it's loaded the data Traditional database ---> Schema on WRITE - table schema is enforced at data load time i.e if the data being loaded does't conformed on schema in that case it will rejected
Hive -->It's very easily scalable at low cost Traditional database ---> Not much Scalable, costly scale up.
Hive -->It's based on hadoop notation that is Write once and read many times Traditional database ---> In traditional database we can read and write many time Hive -->Record level updates is not possible in Hive Traditional database ---> Record level updates, insertions and deletes, transactions and indexes are possible
Hive -->OLTP (On-line Transaction Processing) is not yet supported in Hive but it's supported OLAP (On-line Analytical Processing) Traditional database --->Both OLTP (On-line Transaction Processing) and OLAP (On-line Analytical Processing) are supported in RDBMS.
This not quite a response to the original question, but it appeared to exceed the maximum comment size by 47 characters.
When you use an OLAP data warehouse using HDFS and Hive, you are not quite barred from updating the fact data. You can do it in the very same way as many good RDBS-based data warehouses do - by exchanging partitions between the stage and the warehouse. Table partitions in Hive are implemented as HDFS directories, so exchanging partitions is (almost) instantaneous: it's the time needed to rename a HDFS directory. Well, you'll have to call HDFS directly, bypassing the Hive interface and you would likely employ straight MapReduce to maintain stage, but in the datawarehouses developed by the company I work for, it proved to be a good approach.
Hive is invented at Facebook and its just like Sql but with little support for inner queries. It allows you to use all types of Joins, Group functions as in Sql also provide User Defined Functions(UDFs) which can be written in Java or any other language and can be used in Hive.
Hive mainly used when data is large so that partition or clustering can be done and its not generally used for single row insert or update as we done in Sql.
发布评论
评论(5)
RDBM 数据库和 Hive 之间的主要区别在于专业化。 MySQL 是通用数据库,既适合事务处理 (OLTP) 又适合分析 (OLAP),而 Hive 则仅为分析而构建。从技术上讲,主要区别是缺乏更新/删除功能。只能添加和选择数据。同时,Hive 能够处理 MySQL 或其他传统 RDBMS 无法处理的数据量(预算有限)。
MPP(大规模并行处理)数据库的功能最接近 Hive - 虽然它们具有完整的 SQL 支持,但可以扩展到数百台计算机。
另一个严重的不同 - 是查询语言。
Hive 不支持完整的 SQL,即使在 select 中也是如此,因为它的实现。在我看来,主要区别在于除了相等之外的任何条件都缺乏连接。
Hive 查询语言语法也有点不同,因此您无法将报告生成软件直接连接到 Hive。
The main difference between RDBMs databases and Hive is specialization. While MySQL is general purpose database suited both for transactional processing (OLTP) and for analytics (OLAP), Hive is built for the analytics only. Technically the main difference is lack of update/delete
functioality. Data can only by be added and selected. In the same time Hive is capable of processing data volumes which can not be processed by MySQL or other conventional RDBMS (in shy budget).
MPP (massive parallel proecssing) databases are closest to the Hive by their functionality - while they have full SQL support they are scalable up to hundreds of computers.
Another serious different - is query language.
Hive do not support full SQL even in select because of it's implementation. In my view main difference is lack of join for any condition other then equal.
Hive query language sintax is also a bit different so you can not connect report generation software right to Hive.
基本上,hive 是一种基于 MapReduce 构建的类似 sql 的脚本语言。当您发出命令时,命令将被解释并在分布式系统上运行。由于被处理的文件是扁平的,因此相当于在 Hadoop 中运行等效代码并收集数据。整个流程比使用 Mysql 慢得多。
Basically, hive is a sql-like scripting language built on MapReduce. When you issue commands, the commands are interpreted and ran over the distributed system. Since the files being crunched are flat, it is equivalent to running an equivalent code in Hadoop, and gathering the data. The whole flow is much slower than it would be if you used Mysql.
Hive 与传统数据库
蜂巢 -->读取模式 - 加载数据时不会验证模式
传统数据库--->写入模式 - 表模式在数据加载时强制执行,即如果正在加载的数据不符合模式,在这种情况下它将拒绝
Hive -->它非常容易以低成本进行扩展
传统数据库--->可扩展性不大,扩展成本高昂。
Hive -->它基于hadoop表示法,即一次写入,多次读取
传统数据库--->在传统数据库中我们可以多次读写
Hive --> Hive 中不可能进行记录级别更新
传统数据库--->记录级更新、插入和
删除、事务和索引都是可能的
Hive --> Hive 尚不支持 OLTP(在线事务处理),但支持 OLAP(在线分析处理)
传统数据库--->RDBMS同时支持OLTP(在线事务处理)和OLAP(在线分析处理)。
否则请检查以下 URL
https:// /sensaran.wordpress.com/2016/01/30/comparison-with-hive-with-traditional-database/
Hive vs Traditional database
Hive --> Schema on READ - it's does not verify the schema while it's loaded the data
Traditional database ---> Schema on WRITE - table schema is enforced at data load time i.e if the data being loaded does't conformed on schema in that case it will rejected
Hive -->It's very easily scalable at low cost
Traditional database ---> Not much Scalable, costly scale up.
Hive -->It's based on hadoop notation that is Write once and read many times
Traditional database ---> In traditional database we can read and write many time
Hive -->Record level updates is not possible in Hive
Traditional database ---> Record level updates, insertions and
deletes, transactions and indexes are possible
Hive -->OLTP (On-line Transaction Processing) is not yet supported in Hive but it's supported OLAP (On-line Analytical Processing)
Traditional database --->Both OLTP (On-line Transaction Processing) and OLAP (On-line Analytical Processing) are supported in RDBMS.
or else please check the below URL
https://sensaran.wordpress.com/2016/01/30/comparison-with-hive-with-traditional-database/
这并不完全是对原始问题的回应,但它似乎超出了最大评论大小 47 个字符。
当您使用使用 HDFS 和 Hive 的 OLAP 数据仓库时,您并不完全无法更新事实数据。您可以采用与许多优秀的基于 RDBS 的数据仓库相同的方式来完成此操作 - 通过在阶段和仓库之间交换分区。 Hive 中的表分区被实现为 HDFS 目录,因此交换分区(几乎)是瞬时的:这是重命名 HDFS 目录所需的时间。好吧,你必须直接调用 HDFS,绕过 Hive 接口,并且你可能会直接使用 MapReduce 来维护阶段,但在我工作的公司开发的数据仓库中,这被证明是一个很好的方法。
This not quite a response to the original question, but it appeared to exceed the maximum comment size by 47 characters.
When you use an OLAP data warehouse using HDFS and Hive, you are not quite barred from updating the fact data. You can do it in the very same way as many good RDBS-based data warehouses do - by exchanging partitions between the stage and the warehouse. Table partitions in Hive are implemented as HDFS directories, so exchanging partitions is (almost) instantaneous: it's the time needed to rename a HDFS directory. Well, you'll have to call HDFS directly, bypassing the Hive interface and you would likely employ straight MapReduce to maintain stage, but in the datawarehouses developed by the company I work for, it proved to be a good approach.
Hive 是 Facebook 发明的,它就像 Sql 一样,但对内部查询的支持很少。它允许您使用所有类型的连接、组函数,如 Sql 中一样,还提供用户定义函数(UDF),可以用 Java 或任何其他语言编写,并可以在 Hive 中使用。
Hive主要用于数据量较大的情况,以便可以进行分区或聚类,一般不用于像Sql那样的单行插入或更新。
Hive is invented at Facebook and its just like Sql but with little support for inner queries. It allows you to use all types of Joins, Group functions as in Sql also provide User Defined Functions(UDFs) which can be written in Java or any other language and can be used in Hive.
Hive mainly used when data is large so that partition or clustering can be done and its not generally used for single row insert or update as we done in Sql.