对于审计/日志数据库来说,什么是好的 NoSQL 和非关系数据库解决方案
什么是适合以下的数据库?我对您使用非关系型 NoSQL 系统的体验特别感兴趣。 它们对于这种用途有什么好处吗?您使用过哪个系统并且会推荐哪个系统,或者我应该使用普通的关系数据库(DB2)?
我需要从一堆来源收集审计跟踪/日志类型信息到 集中式服务器,我可以在其中有效地生成报告并检查系统中发生的情况。
通常,审核/日志记录事件始终包含一些必填字段,例如
- 全局唯一 ID(由生成此事件的程序生成的某种方式)
- 时间戳
- 事件类型(即用户登录、发生错误等)
- 有关源(server1、 server2)
此外,事件可以包含 0-N 个键值对,其中值可能高达几千字节的文本。
- 它必须在Linux服务器上运行
- 它应该处理大量数据(例如100GB)
- 它应该支持某种有效的全文搜索
- 它应该允许并发读写
- 它应该灵活地添加新的事件类型和添加/删除键-新事件的值对。灵活=不需要对数据库模式进行任何更改,生成事件的应用程序只需根据需要添加新的事件类型/新字段。
- 对数据库进行查询应该是有效的。用于报告和探索发生的事情。例如:
- 某个时间段内发生了多少个 type=X 的事件。
- 获取字段 A 值为 Y 的所有事件。
- 获取类型为 X 且字段 A 值为 1 且字段 B 不为 2 且事件发生在过去 24 小时内的所有事件
What would be suitable database for following? I am especially interested about your experiences with non-relational NoSQL systems.
Are they any good for this kind of usage, which system you have used and would recommend, or should I go with normal relational database (DB2)?
I need to gather audit trail/logging type information from bunch of sources to a
centralized server where I could generate reports efficiently and examine what is happening in the system.
Typically a audit/logging event would consist always of some mandatory fields, for example
- globally unique id (some how generated by program that generated this event)
- timestamp
- event type (i.e. user logged in, error happened etc)
- some information about source (server1, server2)
Additionally the event could contain 0-N key-value pairs, where value might be up to few kilobytes of text.
- It must run on Linux server
- It should work with high amount of data (100GB for example)
- it should support some kind of efficient full text search
- It should allow concurrent reading and writing
- It should be flexible to add new event types and add/remove key-value pairs to new events. Flexible=no changes should be required to database schema, application generating the events can just add new event types/new fields as needed.
- it should be efficient to make queries against database. For reporting and exploring what happened. For example:
- How many events with type=X occurred in some time period.
- Get all events where field A has value Y.
- Get all events with type X and field A has value 1 and field B is not 2 and event occurred in last 24h
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
我见过成功使用的两个是 MongoDB 和 卡桑德拉。
The two I've seen used successfully are MongoDB and Cassandra.
是的,你应该!如果您只想存储内容并扫描它,您不妨写入文件。速度非常快,没有任何开销!但是,一旦您想要汇总一段时间内的数据(过去 24 小时,或时间 t 到 t+1 之间),您就越关心数据而不是文本行,毫无疑问,合适的 RDBMS 是您的朋友。
Yes, you should! If you just want to store stuff and scan it, you might as well write to a file. Very fast, no overhead! But the minute you want to summarize data over time (last 24h, or between time t and t+1), the more you care about the data as something other than lines of text, no question a proper RDBMS is your friend.
我们使用 Redis 为 mflow.com 上的所有应用程序服务器进行所有集中日志记录。它非常快,基于这些基准,它每秒大约执行 110000 次 SET,大约每秒 81000 次 GET第二。它有一个虚拟机实现(如果您的数据集超出可用内存),可以将不频繁的值交换到磁盘。
它是一种先进的数据结构服务器,可以存储任何二进制安全数据,并原生支持字符串、列表、集合、排序集合和哈希。根据邮件列表上的讨论,很多人大量使用它来存储分析数据。
We used Redis to do all our centralized logging for all our app servers at mflow.com. It is very fast, which based on these benchmarks it does about 110000 SETs per second, about 81000 GETs per second. It has a VM implementation (if your dataset exceeds available memory) which swaps out un-frequented values out to disk.
It's an advanced data-structures server that can store any binary-safe data with native support for strings, lists, sets, sorted sets and hashes. Based on discussions on the mailing list it is heavily used by a lot of people to store analytics.
我建议查看我使用的日志数据库 - VictoriaLogs。它满足以下要求:
I'd recommend taking a look at logs database I work on - VictoriaLogs. It fits the following requirements: