处理内存中的实时数据的最佳选择是什么?
客户端向服务器发送一些实时数据。服务器会利用这些数据做简单的分析。它只查找特定范围内的数据,或者对某些数据进行排序。大多数数据在分析后就会被丢弃,因此不需要将它们保存在磁盘中。
我想使用一些内存数据库来处理它们。 MYSQL的内存引擎是一个好的选择吗?如果我使用一些键值内存缓存引擎(例如Redis)怎么样?因为我需要比较数据,也许纯键值存储不能满足我的要求。
Clients send some real time data to the server. The server will do simple analysis with these data. It only finds data from a specific range, or sort some data. Most of data will be abandoned after the analysis, so it's no need to save them in disk.
I want to use some memory DB to handle with them. Is the memory engine of MYSQL a good choice? How about if I use some key-value memory cache engine such as Redis? Because I need to compare the data, maybe pure key-value store can't meet my requirement.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
对我来说,这听起来好像没有数据库会更好,但这取决于数据的结构以及必须执行的操作类型。
如果结构简单且操作简单,那么您可能应该将数据存储在您正在使用的编程平台的数据结构中。
To me that sounds as if it were better off without a database, but that depends on the structure of your data and what kind of operations you have to perform.
If the structure is simple and the operations easy then you should probably store the data in data structures of the programming platform you are using.
Redis 支持高级数据结构,这使其成为非常方便的基于键值的数据存储,但是如果您的数据需要复杂的关系,那么您可能应该查看 MongoDB、OrientDB 或 <一个href="http://wiki.basho.com/Memory.html" rel="nofollow">Riak 应该都支持基于内存的存储引擎。
Redis supports advanced data structures which makes it a pretty handy key-value based data store, however if your data requires complex relationships then you should probably check out MongoDB, OrientDB or Riak which should all support memory based storage engines.
如果您打算使用 MySQL 的内存引擎,则会遇到一些问题:
默认情况下,索引是使用哈希表而不是 Btree 来实现的。如果您需要对数据进行排序或范围支持,那么使用 btree 可能更有趣。
锁定粒度是表。有一个 R/W 锁来防止并发 DML 操作。虽然原始性能还不错,但当同时有许多写入者时,可伸缩性就不是很好。
所有行都有固定宽度(如果您需要存储 varchar,请注意...)
此外,与大多数其他 RDBMS 一样, MySQL 协议是同步的。每次客户端写入数据库时,他们都会等待回复。如果您有大量数据,则几乎必须进行批处理写入操作才能获得良好的性能。
这实际上取决于数量、客户端数量和吞吐量。如果要求较低,那么任何存储解决方案(包括 MySQL)都可以正常工作。现在,如果需要更高的性能或更高的可扩展性,那么其他解决方案可能会更好。
您想要编写的可能是一个 DIRT 应用程序(数据密集型实时)。好的存储解决方案是 MongoDB(更新插入支持、写入操作的单向协议等)和 Redis(内存中、O(1) 操作、管道等)。
根据您的需求,由于 btree 索引和 map/reduce 支持,使用 MongoDB 可以更轻松地进行数据建模和处理。使用 Redis 可能会更复杂一些,但如果选择正确的数据结构,最终将获得更具确定性的性能。
最后,您可能还希望通过动态处理来避免存储数据。您可以使用流引擎(例如高速交易平台上使用的引擎)来实现此目的。例如,如果您准备好使用 Java 进行编码,ESPER 是一个出色的 CEP 解决方案,用于处理数据流和/或使用类似 SQL 的语言在流之间建立关联。
If you plan to use the memory engine of MySQL, there are a few gotchas:
by default, indexes are implemented using hash tables rather than btrees. If you need to sort the data, or range support, using btrees may be more interesting.
locking granularity is the table. There is a R/W lock to protect against concurrent DML operations. While raw performance is not bad, scalability is not very good when you have many writers at the same time.
all rows have a fixed width (beware if you need to store varchars ...)
Furthermore, like most others RDBMS, MySQL protocol is synchronous. Each time the clients will write into the database, they will wait for a reply. If you have a lot of data, batching writes operations is almost mandatory to get good performance.
It really depends on the volume, number of clients, and throughput. If the requirements are low, then any storage solution (including MySQL) will work fine. Now if more performance or more scalability are required, then other solutions will likely be better.
What you want to write is probably a DIRT application (data intensive real time). Good storage solutions for this are MongoDB (upserts support, oneway protocol for write operations, etc ...) and Redis (in-memory, O(1) operations, pipelining, etc ...).
Depending on your needs, data modeling and processing will be arguably easier with MongoDB due to btree indexes and map/reduce support. It will probably be a bit more complex with Redis, but if you choose the correct data structure, you will end up with more deterministic performance.
Finally, you might also want to avoid storing the data by processing them on the fly. You can achieve this with a streaming engine such as the ones used on high-speed trading platforms. For instance if you are ready to code in Java, ESPER is an excellent CEP solution to process data streams and/or establish correlations between streams using a SQL-like language.