创建 SLURM 多集群设置时 munge 凭据无效
我正在构建一个 SLURM 多集群设置,其中有一个本地托管的 slurmdbd 和 Oracle 云中的 slurmctld 节点。 slurmctld 能够连接到 slurmdbd,但当我尝试以任何方式连接到数据库时,会收到此错误消息:
sacct: error: slurm_persist_conn_open: Something happened with the receiving/processing of the persistent connection init message to <IP_ADDRESS>: Failed to unpack SLURM_PERSIST_INIT message
sacct: error: slurmdbd: Sending PersistInit msg: No error
JobID JobName Partition Account AllocCPUS State ExitCode
------------ ---------- ---------- ---------- ---------- ---------- --------
sacct: error: slurm_persist_conn_open: Something happened with the receiving/processing of the persistent connection init message to <IP_ADDRESS>: Failed to unpack SLURM_PERSIST_INIT message
sacct: error: slurmdbd: Sending PersistInit msg: No error
sacct: error: slurmdbd: DBD_GET_JOBS_COND failure: Unspecified error
查看 slurmdbd 集群上的 /var/log/slurm/slurmdbd.log 文件,它记录了此错误:
[2022-03-11T08:29:47.541] error: Munge decode failed: Invalid credential
[2022-03-11T08:29:47.541] auth/munge: _print_cred: ENCODED: Wed Dec 31 19:00:00 1969
[2022-03-11T08:29:47.541] auth/munge: _print_cred: DECODED: Wed Dec 31 19:00:00 1969
[2022-03-11T08:29:47.541] error: slurm_unpack_received_msg: auth_g_verify: REQUEST_PERSIST_INIT has authentication error: Unspecified error
[2022-03-11T08:29:47.541] error: slurm_unpack_received_msg: Protocol authentication error
[2022-03-11T08:29:47.551] error: CONN:10 Failed to unpack SLURM_PERSIST_INIT message
为了确保我的凭据有效,我已通过 SCP 将 slurmdbd 的 MUNGE 密钥复制到 slurmctld,确保 UID 和 GID所有节点上的 slurm 和 munge 用户的数量都是相同的,并确保时钟全部同步。当我在任一服务器上进行 munge 和 unmunge 操作时,它都会成功解码加密消息。但是,当我尝试使用 echo foo | 从一台服务器向另一台服务器验证凭据时ssh 用户@服务器 munge | unmunge 命令,它给我一个响应 unmunge: error: invalid credential
。我可以做什么才能仍然收到此回复?我应该怎样做才能确保我的凭证有效?
I am building a SLURM multi-cluster setup, with a slurmdbd hosted on-premises and a slurmctld node in Oracle Cloud. The slurmctld is able to connect to the slurmdbd, but receives this error message when I try to connect to the database in any way:
sacct: error: slurm_persist_conn_open: Something happened with the receiving/processing of the persistent connection init message to <IP_ADDRESS>: Failed to unpack SLURM_PERSIST_INIT message
sacct: error: slurmdbd: Sending PersistInit msg: No error
JobID JobName Partition Account AllocCPUS State ExitCode
------------ ---------- ---------- ---------- ---------- ---------- --------
sacct: error: slurm_persist_conn_open: Something happened with the receiving/processing of the persistent connection init message to <IP_ADDRESS>: Failed to unpack SLURM_PERSIST_INIT message
sacct: error: slurmdbd: Sending PersistInit msg: No error
sacct: error: slurmdbd: DBD_GET_JOBS_COND failure: Unspecified error
Looking at the /var/log/slurm/slurmdbd.log file on my slurmdbd cluster, it records this error:
[2022-03-11T08:29:47.541] error: Munge decode failed: Invalid credential
[2022-03-11T08:29:47.541] auth/munge: _print_cred: ENCODED: Wed Dec 31 19:00:00 1969
[2022-03-11T08:29:47.541] auth/munge: _print_cred: DECODED: Wed Dec 31 19:00:00 1969
[2022-03-11T08:29:47.541] error: slurm_unpack_received_msg: auth_g_verify: REQUEST_PERSIST_INIT has authentication error: Unspecified error
[2022-03-11T08:29:47.541] error: slurm_unpack_received_msg: Protocol authentication error
[2022-03-11T08:29:47.551] error: CONN:10 Failed to unpack SLURM_PERSIST_INIT message
To ensure that my credentials are valid, I have copied the slurmdbd's MUNGE key to the slurmctld via SCP, ensured that the UID and GID of the slurm and munge users on all nodes are identical, and made sure that the clocks are all in sync. When I munge and unmunge on either server, it successfully decodes the encrypted message. However, when I try to authenticate the credential from one server to the other using the echo foo | ssh user@server munge | unmunge
command, it gives me a response of unmunge: error: invalid credential
. What could I be doing to still receive this response? What should I do to make sure that my credential is valid?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论