如何在 Kubernetes 中设计分布式日志系统?
我正在设计一个分布式应用程序,由多个将与 Kubernetes 一起部署的 Spring 微服务组成。它是一个批处理应用程序,一个典型的请求可能需要几分钟的处理时间,并且使用 Kafka 作为消息代理,处理分布在各个服务之间。
该项目的一个要求是每个请求都会生成一个日志文件,该文件需要存储在应用程序文件存储中以供检索。当前的设计是,所有处理服务将日志消息(带有关联的唯一请求 ID)写入 Kafka,并且有一个专用的日志微服务读取这些消息,进行一些格式化,并将它们保存到与 Kafka 关联的日志文件中。给定请求 ID。
我非常不熟悉文件应该如何存储在网络应用程序中。我应该将这些日志文件存储到本地文件系统吗?如果是这样,是不是意味着这个“日志服务”无法扩展?例如,如果我将日志服务扩展到2个实例,那么理论上每个实例只能访问一半的日志文件。如果用户请求检索日志文件,则无法保证所请求的日志文件也将位于 Kubernetes 负载均衡器路由它们的任何日志服务实例中。
目前公认的在分布式应用程序中使用文件系统的“最佳实践”是什么?或者我应该接受日志记录服务永远无法扩展的事实?
我能想到的一个可能的解决方案是将文本日志文件作为 TEXT 行存储在我们的 MySQL 数据库中,从而使日志记录服务有效地无状态。如果有人能指出任何潜在的问题,我们将不胜感激?
I'm designing a distributed application, comprised of several Spring microservices that will be deployed with Kubernetes. It is a batch processing app, and a typical request could take several minutes of processing, with the processing getting distributed across the services, using Kafka as a message broker.
A requirement of the project is that each request will generate a log file, which will need to be stored on the application file store for retrieval. The current design is, all the processing services write log messages (with the associated unique request ID) to Kafka, and there is a dedicated logging microservice that reads these messages down, does some formatting and should persist them to the log file associated with the given request ID.
I'm very unfamiliar with how files should be stored in web applications. Should I be storing these log files to the local file system? If so, wouldn't that mean this "logging service" couldn't be scaled? For example, if I scaled the log service to 2 instances, then each instance would only have access to half of the log files in theory. And if a user makes a request to retrieve a log file, there is no guarantee that the requested log file will be at whatever log service instance the Kubernetes load balancer routed them too.
What is the currently accepted "best practice" for having a file system in a distributed application? Or should I just accept that the logging service can never be scaled up?
A possible solution I can think of would just store the text log files in our MySQL database as TEXT rows, making the logging service effectively stateless. If someone could point out any potential issues with this that would be much appreciated?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
不要这样做。使用 Fluentd / Filebeat / promtail / Splunk 转发器侧车从容器进程收集标准输出。
或者让您的服务写入 kafka 日志主题而不是创建文件。
无论选择哪种方式,都可以使用 Elasticsearch、Grafana Loki 或 Splunk 等收集器
https://kubernetes.io/docs/concepts/cluster-administration/logging/#sidecar-container-with-a-logging-agent
不,这些服务中的每一项都旨在进行扩展
当然,但是 Elasticsearch 或 Solr 是专门为收集和搜索明文而不是 MySQL 而构建的。
不要将日志视为特定于应用程序的内容。换句话说,您的解决方案不应该是 Spring 所独有的
Don't do this. Use a Fluentd / Filebeat / promtail / Splunk forwarder side car that gathers stdout from the container processes.
Or have your services write to a kafka logs topic rather than create files.
With either option, use a collector like Elasticsearch, Grafana Loki, or Splunk
https://kubernetes.io/docs/concepts/cluster-administration/logging/#sidecar-container-with-a-logging-agent
No, each of these services are designed to be scaled
Sure, but Elasticsearch or Solr are purpose-built for gathering and searching plaintext, not MySQL.
Don't treat logs as something application specific. In other words, your solution shouldn't be unique to Spring