如何从.NET 连接到 Hadoop/Hive
我正在开发一个解决方案,其中我将有一个运行 Hive 的 Hadoop 集群,并且我想从 .NET 应用程序发送作业和 Hive 查询进行处理,并在完成时收到通知。除了直接从 Java 应用程序连接之外,我找不到任何与 Hadoop 交互的解决方案,是否有一个我可以访问但我没有找到的 API?
I am working on a solution where I will have a Hadoop cluster with Hive running and I want to send jobs and hive queries from a .NET application to be processed and get notified when they are done. I can't find any solutions for interfacing with Hadoop other than directly from a Java app, is there an API I can access that I am just not finding?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(7)
显然,可以使用非 Java 解决方案连接到 Hadoop - 请参阅 我是否必须在其中编写应用程序Java?
Apparently it is possible to connect to Hadoop with non-Java solutions - see Do I have to write my application in Java?
对于 Hadoop:没有直接的方法从 C# 连接,因为 Hadoop 通信层仅使用 java,而不是跨平台。这可能是可能的,但以非常重要的方式。
我知道有一个补丁可以添加对 Hadoop 的 Protocol Buffers 支持,但在撰写本文时(2011 年 8 月)尚未发布。
使用 Hive 情况会更好,因为 Hive 有支持 C# 的 Thrift 接口。您可以下载 Hive Thrift 接口并自行生成 C# 客户端,但请注意,它需要对生成的代码进行一些修改。相反,我建议您从 https://bitbucket.org 下载 dll /vadim/hive-sharp/downloads/hive-sharp-lib.dll 或使用 Nuget 包管理器,搜索“hive”: http://nuget.org/List/Packages/Hive.Sharp.Lib
免责声明:我是作者。
With Hadoop: there is no straight way to connect from C# because Hadoop communication tier is working with java only and is not cross platform. It is probably possible but in very non-trivial ways.
I know there is a patch to add Protocol Buffers support for Hadoop but at the moment of writing (Aug 2011) is is not released yet.
With Hive situation is better because Hive has Thrift interface which supports C#. You can download Hive Thrift interfaces and generate C# client on your own but beware that it requires some hacking of generated code. Instead I would recommend you downloading dll from https://bitbucket.org/vadim/hive-sharp/downloads/hive-sharp-lib.dll or use Nuget package manager, search for "hive": http://nuget.org/List/Packages/Hive.Sharp.Lib
Disclaimer: I'm the author.
有 Hortonworks ODBC 驱动程序。我个人没有使用过它,但它可以让您像使用任何其他 ODBC 数据源一样使用 hive。一旦安装了 ODBC 驱动程序,您就可以使用 OdbcConnection 类连接到 Hive。
正如其他答案中所述 - 您可以使用 Thrift api。为此,您需要从接口定义文件生成 C# 类,您可以从 Hive 源存储库下载该文件。这种方法对我有用。
您可以使用 IKVM 将 hadoop 客户端 Java 库转换为可以在 C# 中使用的 .Net 程序集。我没有将 IKVM 与 Hive 客户端一起使用,但我使用了一些其他 hadoop 客户端库,令人惊讶的是它有效。
编辑:
There is Hortonworks ODBC driver. I havn't used it personally, but it shall let you work with hive as with any other ODBC datasource. You can use OdbcConnection class to connect to Hive once ODBC driver is installed.
As noted in other answers - you can use Thrift api. For that you need to generate C# classes from interface definition files, which you can download from Hive source repository. This approach works for me.
You can use IKVM, to convert hadoop client java libraries into .Net assemblies which you can use from C#. I havn't used IKVM with Hive client, but I've IKVMed some other hadoop client library and surprisingly it worked.
EDIT:
可以通过 Microsoft 的 ODBC 连接器使用 C# 访问 Hive。下载“Microsoft.Hadoop.Hive”的 Nuget 包并按照 http://msdn.microsoft.com/en-us/library/dn749834.aspx
技巧在于构建连接字符串以与其连接。我想出的最好方法是下载 Microsoft Hive ODBC 驱动程序 (http://www.microsoft.com/en-us/download/details.aspx?id=40886),安装它,然后使用 Visual Studio 内的服务器资源管理器添加新连接,然后构建我的连接字符串。为此,我使用了以下步骤:
使用此数据源或复制为您构建的连接字符串并在您的应用程序中使用它。
It is possible to access Hive utilizing C# by making use of Microsoft's ODBC connector. Download the Nuget package for "Microsoft.Hadoop.Hive" and follow the example provided at http://msdn.microsoft.com/en-us/library/dn749834.aspx
The trick lies in building the connection string to connect with it. The best way I came up with was to download the Microsoft Hive ODBC Driver (http://www.microsoft.com/en-us/download/details.aspx?id=40886), install it, then use the Server Explorer inside Visual Studio to add a new connection, then build the connection string for me. To do this, I used the following steps:
Either utilize this data source or copy the connection string that it's built for you and use it within your application.
Thrift API也是其他语言访问hdfs和hive的另一种方式
Thrift API is also another way for other language to access hdfs and hive
看看这是否有帮助。我尝试通过 C# 连接到 Hadoop
如何使用.NET/C#通过Hive与Hadoop通信
See if this helps. I have tried to connect to Hadoop via C#
How to communicate to Hadoop via Hive using .NET/C#
使用 https://hbasenet.codeplex.com/ 中的 Hbase.Net 库
然后你可以连接到 hbase/hive如下所示:
仅供参考,我们使用的是hortonworks沙箱,它连接得很好。
在上面的示例中,10.20.14.179 是主机,9090 是端口。
另外,以下可能会有所帮助 https://community.hortonworks.com/questions/25101/is-there-a-way-to-connect-to-hbase-using-c.html
没有本机 C# HBase 客户端。但是,有多种选项可以通过 C# 与 HBase 进行交互。
C# HBase Thrift 客户端 - Thrift 允许定义服务端点
和数据模型采用通用格式并使用代码生成器
创建特定于语言的绑定。 HBase 提供了一个 Thirft 服务器和
定义。网上有很多创建 C# HBase 的示例
节俭客户。
Marlin - Marlin 是一个 C# 客户端,用于与 Stargate (HBase
REST API)最终成为 hbase-sdk-for-net。我没有
个人针对 HBase 1.x+ 进行了测试,但考虑到它使用
星际之门,我希望它应该有效。如果您打算使用
Stargate 并实现你自己的客户端,我会推荐它
Thrift,确保使用 protobufs 以避免 JSON 序列化
开销。使用基于 HTTP 的方法也使得更容易
通过多个网关的负载平衡请求。
Phoenix 查询服务器 - Phoenix 是 HBase 上的 SQL 皮肤。凤凰查询
Server 是一个 REST API,用于向 Phoenix 提交 SQL 查询。这是
一些示例代码,但是我还没有测试过。
Simba HBase ODBC 驱动程序 - 使用 ODBC 连接到 HBase。我听说过
对这种方法的积极反馈,尤其是来自诸如
画面。这不是开源的,需要购买许可证。
Use Hbase.Net library from https://hbasenet.codeplex.com/
Then you can connect to hbase/hive as shown below:
FYI, we are using hortonworks sandbox and it connects fine.
In above example, 10.20.14.179 is host and 9090 is port.
Also, below might help from https://community.hortonworks.com/questions/25101/is-there-a-way-to-connect-to-hbase-using-c.html
There is no native C# HBase client. however, there are several options for interacting with HBase from C#.
C# HBase Thrift client - Thrift allows for defining service endpoints
and data models in a common format and using code generators to
create language specific bindings. HBase provides a Thirft server and
definitions. There are many examples online for creating a C# HBase
Thrift Client.
Marlin - Marlin is a C# client for interacting with Stargate (HBase
REST API) that ultimately became hbase-sdk-for-net. I have not
personally tested this against HBase 1.x+, but considering it uses
Stargate, I expect it should work. If you are planning to use
Stargate and implement your own client, which I would recommend over
Thrift, make sure to use protobufs to avoid the JSON serialization
overhead. Using a HTTP based approach also makes it much easier to
load balance requests over multiple gateways.
Phoenix Query Server - Phoenix is a SQL skin on HBase. Phoenix Query
Server is a REST API for submitting SQL queries to Phoenix. Here is
some example code, however, I have not yet tested it.
Simba HBase ODBC Driver - Using ODBC to connect to HBase. I've heard
positive feedback on this approach, especially from tools like
Tableau. This is not open source and requires purchasing a license.