对于审计/日志数据库来说,什么是好的 NoSQL 和非关系数据库解决方案

发布于 2024-08-31 09:25:51 字数 797 浏览 4 评论 0原文

什么是适合以下的数据库?我对您使用非关系型 NoSQL 系统的体验特别感兴趣。 它们对于这种用途有什么好处吗?您使用过哪个系统并且会推荐哪个系统,或者我应该使用普通的关系数据库(DB2)?

我需要从一堆来源收集审计跟踪/日志类型信息到 集中式服务器,我可以在其中有效地生成报告并检查系统中发生的情况。

通常,审核/日志记录事件始终包含一些必填字段,例如

  • 全局唯一 ID(由生成此事件的程序生成的某种方式)
  • 时间戳
  • 事件类型(即用户登录、发生错误等)
  • 有关源(server1、 server2)

此外,事件可以包含 0-N 个键值对,其中值可能高达几千字节的文本。

  • 它必须在Linux服务器上运行
  • 它应该处理大量数据(例如100GB)
  • 它应该支持某种有效的全文搜索
  • 它应该允许并发读写
  • 它应该灵活地添加新的事件类型和添加/删除键-新事件的值对。灵活=不需要对数据库模式进行任何更改,生成事件的应用程序只需根据需要添加新的事件类型/新字段。
  • 对数据库进行查询应该是有效的。用于报告和探索发生的事情。例如:
    • 某个时间段内发生了多少个 type=X 的事件。
    • 获取字段 A 值为 Y 的所有事件。
    • 获取类型为 X 且字段 A 值为 1 且字段 B 不为 2 且事件发生在过去 24 小时内的所有事件

What would be suitable database for following? I am especially interested about your experiences with non-relational NoSQL systems.
Are they any good for this kind of usage, which system you have used and would recommend, or should I go with normal relational database (DB2)?

I need to gather audit trail/logging type information from bunch of sources to a
centralized server where I could generate reports efficiently and examine what is happening in the system.

Typically a audit/logging event would consist always of some mandatory fields, for example

  • globally unique id (some how generated by program that generated this event)
  • timestamp
  • event type (i.e. user logged in, error happened etc)
  • some information about source (server1, server2)

Additionally the event could contain 0-N key-value pairs, where value might be up to few kilobytes of text.

  • It must run on Linux server
  • It should work with high amount of data (100GB for example)
  • it should support some kind of efficient full text search
  • It should allow concurrent reading and writing
  • It should be flexible to add new event types and add/remove key-value pairs to new events. Flexible=no changes should be required to database schema, application generating the events can just add new event types/new fields as needed.
  • it should be efficient to make queries against database. For reporting and exploring what happened. For example:
    • How many events with type=X occurred in some time period.
    • Get all events where field A has value Y.
    • Get all events with type X and field A has value 1 and field B is not 2 and event occurred in last 24h

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

日久见人心 2024-09-07 09:25:51

我见过成功使用的两个是 MongoDB卡桑德拉

The two I've seen used successfully are MongoDB and Cassandra.

烟柳画桥 2024-09-07 09:25:51

我应该使用普通关系数据库(DB2)吗?

是的,你应该!如果您只想存储内容并扫描它,您不妨写入文件。速度非常快,没有任何开销!但是,一旦您想要汇总一段时间内的数据(过去 24 小时,或时间 t 到 t+1 之间),您就越关心数据而不是文本行,毫无疑问,合适的 RDBMS 是您的朋友。

should I go with normal relational database (DB2)?

Yes, you should! If you just want to store stuff and scan it, you might as well write to a file. Very fast, no overhead! But the minute you want to summarize data over time (last 24h, or between time t and t+1), the more you care about the data as something other than lines of text, no question a proper RDBMS is your friend.

嘴硬脾气大 2024-09-07 09:25:51

我们使用 Redis 为 mflow.com 上的所有应用程序服务器进行所有集中日志记录。它非常快,基于这些基准,它每秒大约执行 110000 次 SET,大约每秒 81000 次 GET第二。它有一个虚拟机实现(如果您的数据集超出可用内存),可以将不频繁的值交换到磁盘。

它是一种先进的数据结构服务器,可以存储任何二进制安全数据,并原生支持字符串、列表、集合、排序集合和哈希。根据邮件列表上的讨论,很多人大量使用它来存储分析数据。

We used Redis to do all our centralized logging for all our app servers at mflow.com. It is very fast, which based on these benchmarks it does about 110000 SETs per second, about 81000 GETs per second. It has a VM implementation (if your dataset exceeds available memory) which swaps out un-frequented values out to disk.

It's an advanced data-structures server that can store any binary-safe data with native support for strings, lists, sets, sorted sets and hashes. Based on discussions on the mailing list it is heavily used by a lot of people to store analytics.

∞梦里开花 2024-09-07 09:25:51

我建议查看我使用的日志数据库 - VictoriaLogs。它满足以下要求:

  • 它接受具有任意数量字段的日志行。它不需要为此进行任何配置。
  • 它允许跨所有字段进行快速全文搜索,而无需配置任何内容。
  • 它在 Linux amd64 上运行(以及其他操作系统和体系结构)。
  • 它针对在单个节点上高效存储和处理 TB 级日志进行了优化。它压缩存储的日志,因此它们比原始未压缩的日志占用的磁盘空间要少得多。
  • 它支持并发数据摄取和查询。
  • 它与用于日志分析的传统命令行工具很好地集成 - head、less、grep、awk、jq 等。请参阅 这些文档
  • 它提供易于使用的日志查询语言 - LogsQL

I'd recommend taking a look at logs database I work on - VictoriaLogs. It fits the following requirements:

  • It accepts log lines with arbitrary number of fields. It doesn't need any configuration for this.
  • It allows fast full-text search across all the fields without the need to configure anything.
  • It runs on Linux amd64 (as well as on other operating systems and architectures).
  • It is optimized for efficient storing and processing terabytes of logs on a single node. It compresses stored logs, so they occupy much lower amounts of disk space than the original uncompressed logs.
  • It supports concurrent data ingestion and querying.
  • It integrates well with traditional command-line tools for logs analysis - head, less, grep, awk, jq, etc. See these docs.
  • It provides easy to use query language for logs - LogsQL.
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文