普罗米修斯Longhorn = 错误的卷大小

发布于 2025-01-12 02:46:50 字数 996 浏览 3 评论 0原文

我不太确定这是普罗米修斯的问题,还是只是长角牛的问题,或者可能是两者的组合。

设置:

  • Kubernetes K3s v1.21.9+k3s1
  • Rancher Longhorn Storage Provider 1.2.2
  • Prometheus Helm Chart 32.2.1 和图像:quay.io/prometheus/prometheus:v2.33.1

问题:

Longhorn 中的 PV 无限增长,甚至超过定义的最大大小。目前在 50G 卷上使用 75G。

描述:

我有一个非常小的 3 节点集群,没有运行太多部署。目前只有一个“真正的”应用程序,其余的只是 kubernetes 系统的东西。
除了 etcd 之外,我正在使用所有默认的抓取规则。
PV 每天会填满 1 GB 多一点,这对我来说似乎很好。

问题是,无论出于何种原因,longhorn 内部使用的数据都在无限增长。我已为 helm 图表配置了 retention: 7dretentionSize: 25GB 的保留规则,因此无论如何都不应该达到retentionSize。
当我登录容器 shell 并在 /prometheus 中执行 du -sh 时,它显示正在使用约 8.7GB,这对我来说看起来也不错。
问题是,当我查看 longhorn UI 时,使用的间距一直在增长。该 PV 现已存在约 20 天,目前使用了近 75GB(定义的最大 50GB)。当我查看 Kubernetes 节点本身并检查 longhorn 用于存储其 PV 数据的文件夹时,我看到使用的空间值与 Longhorn UI 中相同,而在 prometheus 容器内,一切对我来说看起来都很好。

我希望有人知道问题可能是什么。到目前为止,我还没有在任何其他部署中遇到过这个问题,所有其他部署都很好,并且当容器内的某些内容被删除时,使用的大小确实减少了。

I am not really sure, if this is a prometheus issue, or just Longhorn, or maybe a combination of the two.

Setup:

  • Kubernetes K3s v1.21.9+k3s1
  • Rancher Longhorn Storage Provider 1.2.2
  • Prometheus Helm Chart 32.2.1 and image: quay.io/prometheus/prometheus:v2.33.1

Problem:

Infinitely growing PV in Longhorn, even over the defined max size. Currently using 75G on a 50G volume.

Description:

I have a really small 3 node cluster with not too many deployments running. Currently only one "real" application and the rest is just kubernetes system stuff so far.
Apart from etcd, I am using all the default scraping rules.
The PV is filling up a bit more than 1 GB per day, which seems fine to me.

The problem is, that for whatever reason, the data used inside longhorn is infinitely growing. I have configured retention rules for the helm chart with a retention: 7d and retentionSize: 25GB, so the retentionSize should never be reached anyway.
When I log into the containers shell and do a du -sh in /prometheus, it shows ~8.7GB being used, which looks good to me as well.
The problem is that when I look at the longhorn UI, the used spaced is growing all the time. The PV does exist now for ~20 days and is currently using almost 75GB of a defined max of 50GB. When I take a look at the Kubernetes node itself and inspect the folder, which longhorn uses to store its PV data, I see the same values of space being used as in the Longhorn UI, while inside the prometheus container, everything looks good to me.

I hope someone has an idea what the problem could be. I have not experienced this issue with any other deployment so far, all others are good and really decrease in size used, when something inside the container gets deleted.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

还在原地等你 2025-01-19 02:46:50

我最近遇到了同样的问题,这是因为 Longhorn 不会自动回收您的应用程序(即 Prometheus)释放的块。这会导致卷的大小无限增长,超出 PVC 的配置大小。 Longhorn 卷对此进行了解释实际尺寸文档。您可以使用 修剪文件系统触发 Longhorn 回收这些块 功能,它应该将尺寸缩小到您可以在容器内看到的大小。您可以将其设置为按计划运行,也可以随着时间的推移对其进行维护。

回复晚了,但希望它可以帮助将来遇到同样问题的其他人。

I had the same problem recently and it was because Longhorn does not automatically reclaim blocks that are freed by your application, i.e. Prometheus. This causes the volume's size to grow indefinitely, beyond the configured size of the PVC. This is explained in the Longhorn Volume Actual Size documentation. You can trigger Longhorn to reclaim these blocks by using the Trim Filesystem feature, which should bring the size down to what you can see is used within the Container. You can set this up to run on a schedule as well to maintain it over time.

Late response, but hopefully it helps anyone else faced with the same issue in the future.

痴意少年 2025-01-19 02:46:50

快照可能是大小增加的原因吗?
据我了解,longhorn 会拍摄快照,如果快照中的数据与卷中的当前数据不同,它们会添加到节点上使用的总实际大小中,这种情况在您的情况下会发生,因为旧指标被删除,新指标被删除已收到。

请参阅此评论和此一个
知道我回答晚了,但遇到了同样的问题,也许这对某人有帮助。

Can the snapshots be the reason for the increasing size?
As I understand it, longhorn takes snapshots and they are added to the total actual size used on the node, if data in the snapshot is different to the current data in the volume, which happens in your case because old metrics are deleted and new ones are received.

See this comment and this one.
Know I'm answering late but came across the same issues and maybe it helps someone.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文