用sqlalchemy odbc.ini文件连接到蜂巢
我想使用sqlalchemy和odbc.ini文件
Hive
[Hive]
Description = ODBC Hive connection to Hadoop cluster
Driver = /opt/cloudera/hiveodbc/lib/64/libclouderahiveodbc64.so
HOST = host1,host2,host3
Schema = default
ServiceDiscoveryMode = 1
ZKNamespace = hiveserver2
HiveServerType = 2
AuthMech = 1
ThriftTransport = 1
UseNativeQuery = 0
KrbHostFQDN = _HOST
KrbServiceName = hive
ServicePrincipalCanonicalization = 1
SSL = 0
TwoWaySSL = 0
连接
import sqlalchemy
import pandas as pd
query_test = """SELECT * FROM my_table limit 1;"""
engine = sqlalchemy.create_engine("hive://@Hive")
conn = engine.connect()
print(pd.read_sql(query_test, conn))
到 ttransportException:无法解决用于Hive的sockaddr:10000
,
~/folder/lib64/python3.6/site-packages/thrift/transport/TSocket.py in open(self)
125 msg = 'failed to resolve sockaddr for ' + str(self._address)
126 logger.exception(msg)
--> 127 raise TTransportException(type=TTransportException.NOT_OPEN, message=msg, inner=gai)
128 for family, socktype, _, _, sockaddr in addrs:
129 handle = self._do_open(family, socktype)
我被告知这是因为我们有多个主机,并且必须用一种叫做Zookeeper的东西,
我无法理解的是,当我使用pyodbc时,没有问题。以下工作正常,
pyodbc.autocommit = True
cnxn = pyodbc.connect(dsn='Hive', autocommit=True)
data = pd.read_sql(query, cnxn, params=params)
cnxn.close()
我应该如何配置我的sqlalchemy代码以使用我的odbc.ini文件?
我有
PyHive-0.6.5
thrift-0.16.0
thrift_sasl-0.4.3
SQLAlchemy-1.4.36
sasl-0.3.1
pyodbc-4.0.32
I'd like to connect to Hive using sqlalchemy and odbc.ini file
I have a odbc.ini file that looks like
[Hive]
Description = ODBC Hive connection to Hadoop cluster
Driver = /opt/cloudera/hiveodbc/lib/64/libclouderahiveodbc64.so
HOST = host1,host2,host3
Schema = default
ServiceDiscoveryMode = 1
ZKNamespace = hiveserver2
HiveServerType = 2
AuthMech = 1
ThriftTransport = 1
UseNativeQuery = 0
KrbHostFQDN = _HOST
KrbServiceName = hive
ServicePrincipalCanonicalization = 1
SSL = 0
TwoWaySSL = 0
I tried to connect with
import sqlalchemy
import pandas as pd
query_test = """SELECT * FROM my_table limit 1;"""
engine = sqlalchemy.create_engine("hive://@Hive")
conn = engine.connect()
print(pd.read_sql(query_test, conn))
which produces the error
TTransportException: failed to resolve sockaddr for Hive:10000
in
~/folder/lib64/python3.6/site-packages/thrift/transport/TSocket.py in open(self)
125 msg = 'failed to resolve sockaddr for ' + str(self._address)
126 logger.exception(msg)
--> 127 raise TTransportException(type=TTransportException.NOT_OPEN, message=msg, inner=gai)
128 for family, socktype, _, _, sockaddr in addrs:
129 handle = self._do_open(family, socktype)
I've been told that this is because we have multiple host and have to go by something called zookeeper
What I cant understand is that when I use pyodbc there is no problem. The following works just fine
pyodbc.autocommit = True
cnxn = pyodbc.connect(dsn='Hive', autocommit=True)
data = pd.read_sql(query, cnxn, params=params)
cnxn.close()
How should I configure my sqlalchemy code to work with my odbc.ini file?
I have
PyHive-0.6.5
thrift-0.16.0
thrift_sasl-0.4.3
SQLAlchemy-1.4.36
sasl-0.3.1
pyodbc-4.0.32
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您似乎想使用Cloudera的Hive ODBC连接器与Hive连接,但是您使用
hive://
uri,这意味着Sqlalchemy将尝试使用Pyhive,这是不了解的odbc.ini
。要通过sqlalchemy连接ODBC,您需要使用
<+pyodbc://
uri,例如mssql+pyodbc:// code>
,
mysql+pyodbc:// pyodbc://
=“ https://docs.sqlalchemy.org/en/14/dialects/sybase.html#dialect-sybase-pyodbc-connect” rel =“ nofollow noreferrer”>sybase+sybase+sybase+pyodbc:// 。
Hive + ODBC没有等效方言,因此,即使您通过Sqlalchemy中的PyodBC管理Hive连接,也不知道该如何与Hive说话。我发现此repo aureliengalicher/pyhiveodbc 似乎包含a
pyhive文档指向连接sqlalchemy > URI。
因此,您要么需要开发hive + odbc方言,要么使用pyhive,该方言根本不使用ODBC(因此不使用odbc.ini dsn)。
You seem to want to use the Hive ODBC Connector from Cloudera to connect to Hive, but then you use a
hive://
URI, which mean SQLAlchemy is going to try to use pyHive, which is unaware ofodbc.ini
.To connect through ODBC from SQLAlchemy you need to use a
<dialect>+pyodbc://
URI, such asmssql+pyodbc://
,mysql+pyodbc://
orsybase+pyodbc://
.There is no equivalent dialect for Hive + ODBC, so even if you manage a Hive connection through pyODBC in SQLAlchemy, it would not know how to speak to Hive. I found this repo AurelienGalicher/PyHiveODBC which seems to contain a first draft of a Hive + ODBC dialect, but I have not tested it.
The pyHive documentation points to connecting SQLAlchemy directly through a
hive://
URI.So either you need to develop a Hive + ODBC dialect, or use pyHive, which does not use ODBC at all (and therefore no odbc.ini DSN).