(如何/应该做什么)我实现一个可扩展到每秒数万个请求的数据库?
通过每秒数万个请求,我希望看到 60,000 -> +90,000 个请求/秒。
我的设置包括以下内容:
user ---> 网络应用程序 --> 消息队列--> 解析器 --> 数据库?
我应该提到的是,解析器目前可以使用 COPY 解析/填充大约 18750 条记录/秒,因此我们在这方面受到限制,直到我们开始添加更多解析器 - 现在这对我来说并不是一个大问题。
我有一个系统,需要能够尽可能快地批量上传尽可能多的记录。 这个相同的系统(或者它可能会有所不同,具体取决于您如何处理它)应该能够响应分析类型查询,例如:
wonq = "select sum(amount) from actions where player = '@player' and " + "(type = 'award' or type = 'return') and hand = hand_num" lostq = "select sum(amount) from actions where player = 'player' and " + "type != 'award' and type != 'return' and hand = hand_num"
.....10-15,000 次(每个用户),因为它们被锁定另一张桌子。 不用说,我们现在对这些结果进行分页,每页 10 个。
我查看了以下内容:(假设这些都在同一台服务器上)
mysql(reg. run of the mill rdbms)——能够达到 15-20,000 个请求/秒的范围; 在当前条件下,如果我们尝试横向扩展,每次需要扩展时,我们都需要一个单独的主机/数据库——这是不可行的
couchdb(面向文档的数据库)——每秒不会中断 700 个请求; 我真的希望这能拯救我们的屁股——没有机会!
vertica(面向列的数据库)——每秒达到 60000 个请求,闭源,非常昂贵; 这仍然是一个选择,但我个人根本不喜欢它
tokyocabinet(基于哈希的数据库)——目前的速度为每秒 45,000 次插入和每秒 66,000 次选择; 昨天,当我写这篇文章时,我使用的是基于 FFI 的适配器,该适配器的执行速度约为每秒 5555 个请求; 这是迄今为止我见过的最快、最棒的数据库!!
terracotta --(vm cluster)目前正在与 jmaglev 一起评估这一点(等不及 maglev 本身的出现)--这是最慢的!
也许我只是错误地处理了这个问题,但我总是听说 RDBMS 慢得要命——那么我听说过的这些超快系统在哪里呢?
测试条件::
只是为了让大家知道我的开发盒上的规格是:
dual 3.2ghz intel, 1 gig ram
Mysql mysql.cnf 编辑是:
key_buffer = 400M # was 16M innodb_log_file_size = 100M # non existent before innodb_buffer_pool_size = 200M # non existent before
更新::
事实证明,terracotta 可能有一席之地我们的应用程序结构,但它不会很快取代我们的数据库,因为它的速度很糟糕,而且堆利用率很糟糕。
另一方面,我很高兴看到 tokyocabinet 的 NON-FFI ruby 库(意思是 tyrant/cabinet)非常快,现在是第一名。
By Upper tens of thousands requests/second I want to see 60,000 -> +90,000 requests/second.
My Setup consists of the following:
user ---> web app --> message queue --> parser --> database?
I should mention that the parser currently can parse/stuff around 18750 records/second using COPY so we are limited on that end until we start adding more parsers -- this isn't a huge concern for me now.
I have a system that requires the ability to bulk upload as fast as I can as many records as I can. This same system (or it can be different depending on how you would approach it) should be able to respond to analytical type queries such as this:
wonq = "select sum(amount) from actions where player = '@player' and " + "(type = 'award' or type = 'return') and hand = hand_num" lostq = "select sum(amount) from actions where player = 'player' and " + "type != 'award' and type != 'return' and hand = hand_num"
.....10-15 thousand times (PER USER) since they are keyed off to another table. Needless to say we paginate these results at 10/page for now.
I've looked at the following: (assuming these are all on the same server)
mysql (reg. run of the mill rdbms) -- was able to get into the 15-20 thousand requests/second range; under current conditions if we try to scale this out we need a seperate host/database everytime we need to scale -- this is not doable
couchdb (document oriented db) -- didn't break 700 requests/second; I was really hoping this was going to save our ass -- not a chance!
vertica (columnar oriented db) -- was hitting 60000 request/second, closed source, very pricey; this is still an option but I personally did not like it at all
tokyocabinet (hash based db) -- is currently weighing in at 45,000 inserts/second and 66,000 selects/second; yesterday when I wrote this I was using a FFI based adapater that was performing at around 5555 requests/second; this is by-far THE fastest most awesome database I've seen yet!!
terracotta -- (vm cluster) currently evaluating this along with jmaglev (can't wait until maglev itself comes out) -- this is THE SLOWEST!
maybe I'm just approaching this problem wrong but I've ALWAYS heard that RDBMS were slow as all hell -- so where are these super fast systems that I've heard about?
Testing Conditions::
Just so ppl know my specs on my dev box are:
dual 3.2ghz intel, 1 gig ram
Mysql mysql.cnf edits were:
key_buffer = 400M # was 16M innodb_log_file_size = 100M # non existent before innodb_buffer_pool_size = 200M # non existent before
UPDATE::
It turns out that terracotta might have a place in our application structure but it flat out WILL NOT be replacing our database anytime soon as it's speeds are terrible and it's heap utilization sucks.
On the other hand, I was very happy to see that tokyocabinet's NON-FFI ruby library (meaning tyrant/cabinet) is super fast and right now that is first place.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(8)
对于疯狂的大可扩展性,您需要关注两件事:
For crazy-big scalability, you'll want to focus on two things:
游戏中的大玩家是甲骨文,但那是大笔资金。
如果你想便宜一点,那么你将不得不以不同的方式付出代价:
Well the big player in the game is Oracle but thats big bucks.
If you want to go cheap then you will have to pay the price in a different terms:
用户---> 网络应用程序 --> 消息队列--> 解析器 --> 数据库?
你需要消息队列做什么?
这些通常是一个很大的性能问题。
user ---> web app --> message queue --> parser --> database?
What do you need the message queue for?
Those are a big performance issue normally.
正如 ojrac 所说的分片和缓存。
另一种选择是退一步,想办法用更少的查询来完成工作! 从你提供的一点信息中,我忍不住想“一定有更好的方法”。 从您提供的示例中,一些汇总表(带有可选的缓存)可能是一个轻松的胜利。
Hypertable 等为某些数据访问模式提供了更好的性能,但您的听起来非常适合典型的数据库。
是的,CouchDB 的速度慢得令人失望。
Sharding and caching as ojrac said.
Another option is to take a step back and figure out to do the work with less queries! From the little information you gave I can't help but think "there must be a better way". From the examples you gave some summary tables (with optional caching) might be an easy win.
Hypertable etc gives better performance for some data access patterns, but yours sound very suited for the typical databases.
And yeah, CouchDB is disappointingly slow.
你试过postgresql吗? 应该比mysql快。 但无论如何,您需要平衡多个服务器上的负载(拆分数据库)。 您可以拥有多个数据库(例如针对每个客户端),然后有一个集中的数据库将与这些小型数据库同步......
have you tried postgresql? it should be faster than mysql. but anyhow, you would need to balance the load over multiple servers (split database). you can have multiple databases (e.g. for each client) and then one centralized one that will sync with those small ones...
您尝试过 redis 吗? 他们承诺的速度为110000 SET/秒,81000 GET/秒。 它是一个高级键值数据库,支持列表和集合。
Have you tried redis? They promise the speed of 110000 SETs/second, 81000 GETs/second. It's an advanced key-value db with support for lists and sets.
我怀疑任何系统都能为您提供所需的开箱即用的性能。 您可能会开始达到您所在计算机的硬限制(对于任何写入密集型数据库,您都会很快达到 I/O 限制)。 可能需要进行一些分析,但磁盘几乎总是瓶颈。 更多 RAM 会有所帮助,使用固态硬盘也会有所帮助。
但是,无论您使用哪个实际数据库,您都可能需要某种类型的集群。 您可以对数据本身进行分片,或者使用 MySQL,设置读取从属设备会将负载分散到各个节点上,并为您提供所需的吞吐量。
另外:MongoDB 非常棒。 也许值得一瞧。
I doubt any system will give you the out-of-the-box performance that you need. You are probably going to start hitting hard limits on the machine you are on (with just about any write-intensive db you will hit I/O limits pretty fast). Some analysis might be required, but the disk is nearly always the bottleneck. More RAM will help, as will using Solid State Disks.
However, you will probably need clustering of some kind regardless of which actual db you use. You can shard the data itself, or with MySQL, setting up read-slaves will spread the load across nodes and should give you the throughput you are looking for.
Also: MongoDB is awesome. Might be worth a look.
在写入量大的应用程序中快速持久存储数据的典型方法是使用仅追加日志。 如果正确部署日志文件位于其自己的旋转磁盘上,则每次写入/追加操作的磁盘寻道时间都会最小化。
每次写入后,可以更新元数据以了解某个主键的偏移量。
如果你想使用mysql,有一个mysql存储引擎可以做到这一点。 另一种选择是新的 nosql 数据库之一,例如afledb。
您也尝试过使用 SSD 吗?
有很多选择可以解决这个问题,但它们可能需要一些体力劳动。
The typical way to quickly store data durably in a write-heavy app is to use an append-only log. If properly deployed s.t. the log file is on its own spinning disk, the disk seek time is minimized per write/append operation.
One can update metadata to know the offset for some primary key after each write.
There is a mysql storage engine that does this is you want to use mysql. Another option is one of the new nosql databases like fleetdb.
Have you tried using a SSD as well?
There are lots of options to solve this problem but they are likely going to require some manual labor.