使用PRESTO设置EMR中JDBC连接的基本密码身份验证
我的用例很简单。我使用AWS数据目录作为元存储,通过CDK运行PRESTO部署了一个EMR群集。集群将仅具有默认用户运行查询。默认情况下,主用户是 hadoop
,我可以使用它通过JDBC连接到群集并运行查询。但是,我可以在没有密码的情况下建立上述连接。我已经阅读了Presto文档,他们提到了LDAP,Kerberos和基于文件的身份验证。我只希望这种行为像一个MySQL数据库一样,在该数据库中我必须通过用户名和密码进行连接。但是,对于我的一生,我找不到可以打开密码的配置。这些是我到目前为止的设置:
{
classification: 'spark-hive-site',
configurationProperties: {
'hive.metastore.client.factory.class': 'com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory',
},
},
{
classification: 'emrfs-site',
configurationProperties: {
'fs.s3.maxConnections': '5000',
'fs.s3.maxRetries': '200',
},
},
{
classification: 'presto-connector-hive',
configurationProperties: {
'hive.metastore.glue.datacatalog.enabled': 'true',
'hive.parquet.use-column-names': 'true',
'hive.max-partitions-per-writers': '7000000',
'hive.table-statistics-enabled': 'true',
'hive.metastore.glue.max-connections': '20',
'hive.metastore.glue.max-error-retries': '10',
'hive.s3.use-instance-credentials': 'true',
'hive.s3.max-error-retries': '200',
'hive.s3.max-client-retries': '100',
'hive.s3.max-connections': '5000',
},
},
我可以使用哪种设置来设置 hadoop
密码?对于此简单的用例,Kerberos,LDAP和文件基于文件似乎过于复杂。我想念明显的东西吗?
编辑 在阅读了无数页的文档和与AWS支持的交谈之后,我决定搬到Trino,但遇到了更多问题。这些是我的CDK部署上的当前配置:
configurations: [
{
classification: 'spark-hive-site',
configurationProperties: {
'hive.metastore.client.factory.class': 'com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory',
},
},
{
classification: 'emrfs-site',
configurationProperties: {
'fs.s3.maxConnections': '5000',
'fs.s3.maxRetries': '200',
},
},
{
classification: 'presto-connector-hive',
configurationProperties: {
'hive.metastore.glue.datacatalog.enabled': 'true',
'hive.parquet.use-column-names': 'true',
'hive.max-partitions-per-writers': '7000000',
'hive.table-statistics-enabled': 'true',
'hive.metastore.glue.max-connections': '20',
'hive.metastore.glue.max-error-retries': '10',
'hive.s3.use-instance-credentials': 'true',
'hive.s3.max-error-retries': '200',
'hive.s3.max-client-retries': '100',
'hive.s3.max-connections': '5000',
},
},
{
classification: 'trino-config',
configurationProperties: {
'query.max-memory-per-node': `${instanceMemory * 0.15}GB`, // 25% of a node
'query.max-total-memory-per-node': `${instanceMemory * 0.5}GB`, // 50% of a node
'query.max-memory': `${instanceMemory * 0.5 * coreInstanceGroupNodeCount}GB`, // 50% of the cluster
'query.max-total-memory': `${instanceMemory * 0.8 * coreInstanceGroupNodeCount}GB`, // 80% of the cluster
'query.low-memory-killer.policy': 'none',
'task.concurrency': vcpuCount.toString(),
'task.max-worker-threads': (vcpuCount * 4).toString(),
'http-server.authentication.type': 'PASSWORD',
'http-server.http.enabled': 'false',
'internal-communication.shared-secret': 'abcdefghijklnmopqrstuvwxyz',
'http-server.https.enabled': 'true',
'http-server.https.port': '8443',
'http-server.https.keystore.path': '/home/hadoop/fullCert.pem',
},
},
{
classification: 'trino-password-authenticator',
configurationProperties: {
'password-authenticator.name': 'file',
'file.password-file': '/home/hadoop/password.db',
'file.refresh-period': '5s',
'file.auth-token-cache.max-size': '1000',
},
},
],
我从这里开始: https://trino.io/docs/current/current/security/security/security/tls.htls.htmls.html
我正在使用这种方法:
“直接保护Trino服务器。这需要您获取有效的证书,并将其添加到Trino协调员的配置中。”
我从公司那里获得了内部通配符证书。这让我:
- 证书文本
- 从
- 证书链
这里开始: https:htttps:htttps:htttps: ///trino.io/docs/current/security/inspect-pem.html
看来我需要将这3个文件插入一个,然后我要做:
-----BEGIN RSA PRIVATE KEY-----
Content of private key
-----END RSA PRIVATE KEY-----
-----BEGIN CERTIFICATE-----
Content of certificate text
-----END CERTIFICATE-----
-----BEGIN CERTIFICATE-----
First content of chain
-----END CERTIFICATE-----
-----BEGIN CERTIFICATE-----
Second content of chain
-----END CERTIFICATE-----
然后从Bootstrap操作中,我将文件放入所有人中节点。这样我就可以完整以下内容:使用这些配置:
'http-server.https.enabled': 'true',
'http-server.https.port': '8443',
'http-server.https.keystore.path': '/home/hadoop/fullCert.pem',
我确定文件已将文件部署到节点。然后我继续这样做: https:///trino.io/current of /security/password-file.html
我也知道特定的部分有效,因为如果我直接在主节点上使用错误的密码在主节点上使用Trino CLI,我会遇到凭据错误。
现在,我目前正在执行此操作:
[hadoop@ip-10-0-10-245 ~]$ trino-cli --server https://localhost:8446 --catalog awsdatacatalog --user hadoop --password --insecure
trino> select 1;
Query 20220701_201620_00001_9nksi failed: Insufficient active worker nodes. Waited 5.00m for at least 1 workers, but only 0 workers are active
来自/var/log/trino/server.log
我看到:
2022-07-01T21:30:12.966Z WARN http-client-node-manager-51 io.trino.metadata.RemoteNodeState Error fetching node state from https://ip-10-0-10-245.ec2.internal:8446/v1/info/state: Failed communicating with server: https://ip-10-0-10-245.ec2.internal:8446/v1/info/state
2022-07-01T21:30:13.902Z ERROR Announcer-0 io.airlift.discovery.client.Announcer Service announcement failed after 8.11ms. Next request will happen within 1000.00ms
2022-07-01T21:30:14.913Z ERROR Announcer-1 io.airlift.discovery.client.Announcer Service announcement failed after 10.35ms. Next request will happen within 1000.00ms
2022-07-01T21:30:15.921Z ERROR Announcer-3 io.airlift.discovery.client.Announcer Service announcement failed after 8.40ms. Next request will happen within 1000.00ms
2022-07-01T21:30:16.930Z ERROR Announcer-0 io.airlift.discovery.client.Announcer Service announcement failed after 8.59ms. Next request will happen within 1000.00ms
2022-07-01T21:30:17.938Z ERROR Announcer-1 io.airlift.discovery.client.Announcer Service announcement failed after 8.36ms. Next request will happen within 1000.00ms
也有:
[hadoop@ip-10-0-10-245 ~]$ trino-cli --server https://localhost:8446 --catalog awsdatacatalog --user hadoop --password
trino> select 1;
Error running command: javax.net.ssl.SSLHandshakeException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
trino>
即使我关注此信息以将.pem文件作为资产上传到s3:
我错了,说这简单的事情不应该这么复杂吗?我真的很感谢这里的任何帮助。
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
基于您从Trino收到的消息,
活动的工作工人节点不足
,身份验证系统正在正常工作,现在您遇到安全的内部通信。具体而言,这些机器互相交谈问题。我将首先禁用内部TLS,验证一切都在起作用,然后才能实现这一点(假设您需要在环境中需要此功能)。要禁用TLS,请使用:然后重新固定所有机器。您不应该看到
服务公告失败
。当机器启动时,可能会有几个,但是一旦建立通信,错误消息就应该停止。Based on the message you are getting from Trino,
Insufficient active worker nodes
, the authentication system is working, and you are now having problems with secure internal communication. Specifically, the machines are having problems talking to each other. I would start by disabling internal TLS, verifying that everything is working, and only then work on enabling that (assuming you need this in your environment). To disable TLS, use:Then restar all your machines. You should not see
Service announcement failed
. There might be a couple of these when the machines are starting up, but once they establish communication the error messages should stop.