如何设计和测试 Rails 中的大量并发数据?

发布于 2024-10-15 02:13:41 字数 1139 浏览 4 评论 0原文

问候堆垛机。

我们正在开发一个项目,为心理实验的参与者存储每秒的跟踪数据。我们当前的设计有一个 Flash 客户端,它收集 60 秒的时间戳/活动配对,然后将数据作为字符串以及一些参与者元数据发布到我们的 Rails (3.0.3) / MySQL (5.1) 应用程序。 编辑 我们在前端使用普通的 Passenger/Nginx。 Rails 将时间戳/活动字符串拆分为并行数组,生成单个原始 SQL 插入语句,然后将所有内容推入一个庞大的表中,即: (简化代码)

@feedback_data = params[:feedbackValues].split(",")
@feedback_times = params[:feedbackTimes].split(",")
inserts = []
base = "(" + @userid + "," + @studyid + ","
@feedback_data.each_with_index do |e,i|
  record = base + @feedback_times[i].to_s + ","
  record += "'" + @feedback_data[i].to_s + "')"
  inserts.push(record)
end
sql = "INSERT INTO excitement_datas (participantId, studyId, timestamp, activityLevel) VALUES #{inserts.join(", ")}"
ActiveRecord::Base.connection.execute sql

产量:

INSERT INTO STUDY_DATA (participantId, studyId, timestamp, activityLevel)
VALUES (3,5,2011-01-27 05:02:21,47),(3,5,2011-01-27 05:02:22,56),etc.

该设计在团队中引起了很多争论。研究将有数十或数百人同时参与。我已经为每个客户端错开了 60 秒的 POST 间隔,以便传入的数据分布得更均匀,但我仍然收到很多悲观的预测。

我们还可以/应该做什么来提高 Rails 设计的可扩展性?

我可以使用哪些工具/技术来准确预测其在负载下的表现?

非常感谢。

Greetings Stackers.

We're working on a project which stores second-to-second tracking data for participants in psych experiments. Our current design has a Flash client which collects 60 seconds worth of timestamp/activity pairings and then posts the data as strings, along with a little participant metadata to our rails (3.0.3) / MySQL (5.1) application. Edit We're using vanilla Passenger/Nginx for the front. Rails splits the timestamp/activity strings into parallel arrays, generates a single raw SQL insert statement, and then shoves everything into a massive table, i.e:
(simplified code)

@feedback_data = params[:feedbackValues].split(",")
@feedback_times = params[:feedbackTimes].split(",")
inserts = []
base = "(" + @userid + "," + @studyid + ","
@feedback_data.each_with_index do |e,i|
  record = base + @feedback_times[i].to_s + ","
  record += "'" + @feedback_data[i].to_s + "')"
  inserts.push(record)
end
sql = "INSERT INTO excitement_datas (participantId, studyId, timestamp, activityLevel) VALUES #{inserts.join(", ")}"
ActiveRecord::Base.connection.execute sql

Yields:

INSERT INTO STUDY_DATA (participantId, studyId, timestamp, activityLevel)
VALUES (3,5,2011-01-27 05:02:21,47),(3,5,2011-01-27 05:02:22,56),etc.

The design has generated a lot of debate on the team. Studies will have 10s or 100s of concurrent participants. I've staggered the 60 second POST interval for each client so that incoming data is distributed more evenly, but I'm still getting lots of doom and gloom predictions.

What else can we do / should we do to improve the scalability of this design in rails?

What tools / techniques can I use to accurately predict how this performs under load?

Many thanks.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

半边脸i 2024-10-22 02:13:41

这更多的是一个架构问题而不是代码问题。您的代码看起来很正常,并且仅生成一个 SQL 查询是一种好方法。但是您的应用程序服务器是什么?

例如,如果您使用一台瘦服务器,则在数据库执行 SQL 查询时请求将被阻塞,从而导致应用程序无响应。

使用 Passenger 或 Unicorn 可以提高并发性,但每个请求的 SQL 查询仍然相当慢。

如果您真的担心该查询,您可以尝试中间 Memcache 或 RabbitMQ 层,它为每个收到的请求存储一个作业。然后让一个后台任务(或其中的许多任务)开始执行缓慢的插入。 Memcache 和 Rabbit 比 Mysql 响应更快,并且您正在处理原始请求。

这意味着请求将很快完成,并将繁重的工作交给您的工作人员任务。延迟的工作可能是值得一看的东西,或者是 Workling,或者是兔子的 Bunny/EventMachine。

Memcache 持久性可能对您来说是一个问题,因此如果您喜欢基于队列的方法,我会推荐 Rabbit。

最重要的是,您可以查看 Apache Bench,看看您实际上已经做了什么:

http://httpd.apache.org/docs/2.0/programs/ab.html

This is more of an architecture issue than a code issue. Your code looks sane, and generating only one SQL query is a good approach. What's your application server however?

If you are using, say, one thin server then requests will block while the database is performing the SQL query, leading to an undresponsive app.

Using Passenger or Unicorn you'd get an increase in concurrency, but still quite slow sql queries per request.

If you're really worried about that query, you could try an intermediate Memcache or RabbitMQ layer, that stores a job for each of the received requests. Then have a background task (or many of them) pick up and do the slow insert. Memcache and Rabbit are more responsive than Mysql and you're dealing with the raw request.

This means that the request would complete very quickly and hand off the heavy lifting to your worker tasks. Delayed Job could be something to look at, or Workling, or Bunny/ EventMachine for Rabbit.

Memcache persistence might be an issue for you, so I'd recommend Rabbit if you fancy the queue-based approach.

On top of that, you could look at Apache Bench to see how you're actually doing already:

http://httpd.apache.org/docs/2.0/programs/ab.html

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文