（如何/应该做什么）我实现一个可扩展到每秒数万个请求的数据库？

发布于 2024-07-13 15:05:30 字数 1833 浏览 10 评论 0原文

通过每秒数万个请求，我希望看到 60,000 -> +90,000 个请求/秒。

我的设置包括以下内容：

user ---> 网络应用程序 --> 消息队列--> 解析器 --> 数据库？

我应该提到的是，解析器目前可以使用 COPY 解析/填充大约 18750 条记录/秒，因此我们在这方面受到限制，直到我们开始添加更多解析器 - 现在这对我来说并不是一个大问题。

我有一个系统，需要能够尽可能快地批量上传尽可能多的记录。这个相同的系统（或者它可能会有所不同，具体取决于您如何处理它）应该能够响应分析类型查询，例如：

wonq = "select sum(amount) from actions where player = '@player' and " +
       "(type = 'award' or type = 'return') and hand = hand_num"
lostq = "select sum(amount) from actions where player = 'player' and " +
        "type != 'award' and type != 'return' and hand = hand_num"

.....10-15,000 次（每个用户），因为它们被锁定另一张桌子。不用说，我们现在对这些结果进行分页，每页 10 个。

我查看了以下内容：（假设这些都在同一台服务器上）

mysql（reg. run of the mill rdbms）——能够达到 15-20,000 个请求/秒的范围；在当前条件下，如果我们尝试横向扩展，每次需要扩展时，我们都需要一个单独的主机/数据库——这是不可行的
couchdb（面向文档的数据库）——每秒不会中断 700 个请求；我真的希望这能拯救我们的屁股——没有机会！
vertica（面向列的数据库）——每秒达到 60000 个请求，闭源，非常昂贵；这仍然是一个选择，但我个人根本不喜欢它
tokyocabinet（基于哈希的数据库）——目前的速度为每秒 45,000 次插入和每秒 66,000 次选择；昨天，当我写这篇文章时，我使用的是基于 FFI 的适配器，该适配器的执行速度约为每秒 5555 个请求；这是迄今为止我见过的最快、最棒的数据库！！
terracotta --（vm cluster）目前正在与 jmaglev 一起评估这一点（等不及 maglev 本身的出现）--这是最慢的！

也许我只是错误地处理了这个问题，但我总是听说 RDBMS 慢得要命——那么我听说过的这些超快系统在哪里呢？

测试条件::

只是为了让大家知道我的开发盒上的规格是：

dual 3.2ghz intel, 1 gig ram

Mysql mysql.cnf 编辑是：

key_buffer = 400M               # was 16M
innodb_log_file_size = 100M     # non existent before
innodb_buffer_pool_size = 200M  # non existent before

更新::

事实证明，terracotta 可能有一席之地我们的应用程序结构，但它不会很快取代我们的数据库，因为它的速度很糟糕，而且堆利用率很糟糕。

另一方面，我很高兴看到 tokyocabinet 的 NON-FFI ruby 库（意思是 tyrant/cabinet）非常快，现在是第一名。

原文

By Upper tens of thousands requests/second I want to see 60,000 -> +90,000 requests/second.

My Setup consists of the following:

user ---> web app --> message queue --> parser --> database?

I should mention that the parser currently can parse/stuff around 18750 records/second using COPY so we are limited on that end until we start adding more parsers -- this isn't a huge concern for me now.

I have a system that requires the ability to bulk upload as fast as I can as many records as I can. This same system (or it can be different depending on how you would approach it) should be able to respond to analytical type queries such as this:

wonq = "select sum(amount) from actions where player = '@player' and " +
       "(type = 'award' or type = 'return') and hand = hand_num"
lostq = "select sum(amount) from actions where player = 'player' and " +
        "type != 'award' and type != 'return' and hand = hand_num"

.....10-15 thousand times (PER USER) since they are keyed off to another table. Needless to say we paginate these results at 10/page for now.

I've looked at the following: (assuming these are all on the same server)

mysql (reg. run of the mill rdbms) -- was able to get into the 15-20 thousand requests/second range; under current conditions if we try to scale this out we need a seperate host/database everytime we need to scale -- this is not doable
couchdb (document oriented db) -- didn't break 700 requests/second; I was really hoping this was going to save our ass -- not a chance!
vertica (columnar oriented db) -- was hitting 60000 request/second, closed source, very pricey; this is still an option but I personally did not like it at all
tokyocabinet (hash based db) -- is currently weighing in at 45,000 inserts/second and 66,000 selects/second; yesterday when I wrote this I was using a FFI based adapater that was performing at around 5555 requests/second; this is by-far THE fastest most awesome database I've seen yet!!
terracotta -- (vm cluster) currently evaluating this along with jmaglev (can't wait until maglev itself comes out) -- this is THE SLOWEST!

maybe I'm just approaching this problem wrong but I've ALWAYS heard that RDBMS were slow as all hell -- so where are these super fast systems that I've heard about?

Testing Conditions::

Just so ppl know my specs on my dev box are:

dual 3.2ghz intel, 1 gig ram

Mysql mysql.cnf edits were:

key_buffer = 400M               # was 16M
innodb_log_file_size = 100M     # non existent before
innodb_buffer_pool_size = 200M  # non existent before

UPDATE::

It turns out that terracotta might have a place in our application structure but it flat out WILL NOT be replacing our database anytime soon as it's speeds are terrible and it's heap utilization sucks.

On the other hand, I was very happy to see that tokyocabinet's NON-FFI ruby library (meaning tyrant/cabinet) is super fast and right now that is first place.

分享到QQ

分享到微博