用于非常频繁的 GPS 数据收集的技术堆栈

发布于 2024-09-04 05:11:48 字数 230 浏览 0 评论 0原文

我正在开发一个项目,该项目涉及每秒(当他们移动时)从许多用户(例如 1000 个)收集 GPS 数据。我计划在 EC2 上使用专用数据库实例和 mysql 持久块存储,并使用 nginx 前端运行 ruby​​ on Rails 应用程序。 我以前没有做过这样的数据收集应用程序。我在这里错过了什么吗?

我将有另一个实例,它将充当应用程序服务器并使用来自同一 EBS 的数据。 如果有人以前处理过这样的系统,任何建议将不胜感激?

I am working on a project that involves gps data collection from many users (say 1000) every second (while they move). I am planning on using a dedicated database instance on EC2 with the mysql persistent block storage and run a ruby on rails application with nginx frontend.
I haven't worked on such data collection application before. Am I missing something here?

I will have a another instance which will act as application server and use the data from the same EBS.
If anybody has dealt with such a system before, Any advise would be much appreciated?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

溺渁∝ 2024-09-11 05:11:48

我最担心的是 MySQL 和磁盘成为你的瓶颈。我假设您已经熟悉 Ruby/Rails 的权衡,即始终需要在应用程序层投入更多硬件以换取更高的程序员生产力。然而,您将需要扩展 MySQL 的写入能力,如果您实际上谈论的是超过 1000 QPS(1000 个用户,每秒写入一次),那么这可能是一个棘手的提议。我建议您采用您计划使用的任何 MySQL 配置,并对其施加大量写入流量。如果它在任何低于 3000 QPS 的情况下失败(总是给自己留出应对峰值的喘息空间),您将需要修改您的计划(每秒数据?真的吗?)或者首先写入诸如 memcache 之类的东西并使用定时任务一次性写入数据库(MySQL 3.22.5及更高版本支持一次查询多次插入,还有LOAD DATA INFILE方法,可以与结合使用>/dev/shm)。如果您不使用 InnoDB,您还可以研究延迟插入。

我当然有偏见(我在 Google 工作),但我会为此使用 App Engine。我们一直在 App Engine 上运行比这更多的写入流量的东西,而且效果很好。它可以开箱即用,无需启动新映像,并且您不必处理扩展基于 SQL 的持久性的问题。此外,在计费开始之前,您还可以使用大量免费配额。如果您确实想要 Ruby 环境,可以运行 JRuby,也可以选择支持更好的 Python。即使您将 Vlad 或 Capistrano 与 EC2 结合使用,此类部署也更加容易。

编辑: 这是对数据增长的非常保守的估计。 16 字节只是存储纬度/经度坐标对(两个双精度)所需的最小值。在现实世界中,索引和其他数据库开销会增加这个数字。根据实际数据相应调整公式,以计算出达到 150GB 限制的速度。

I would be most worried about MySQL and the disk being your bottleneck. I'm going to assume you're already familiar with the Ruby/Rails trade-off of always needing to throw more hardware at the application layer in return for higher programmer productivity. However, you're going to need to scale MySQL for writes, and that can be a tricky proposition if you're actually talking about more than 1000 QPS (1000 users, writing once a second). I would recommend taking whatever configuration of MySQL you're planning on using and throwing a serious amount of write traffic at it. If it falls over at anything under, say, 3000 QPS (always give yourself breathing room for spikes), you're going to need to either revise your plan (data every second? really?) or write to something like memcache first and use scheduled tasks to write to the database in one go (MySQL 3.22.5 and later supports multiple inserts in a single query, and there's also the LOAD DATA INFILE method, which can be used in conjunction with /dev/shm). You can also look into delayed insertion if you're not using InnoDB.

I'm biased of course (I work for Google), but I would be using App Engine for this. We run stuff that gets way more write traffic than this all the time on App Engine and it works great. It scales out of the box, there's no need to start up new images, and you don't have to deal with the issues of scaling SQL-based persistence. Also you get a ton of free quota to work with before billing starts. You can run JRuby if you really want a Ruby environment, or you can opt for Python, which is a bit better supported. Deployment is also much easier for something like this, even if you're using Vlad or Capistrano with EC2.

Edit: Here's a very conservative estimate of your data growth. 16 bytes is just the minimum required to store a lat/lon coordinate pair (two doubles). In the real world you have indexes and other database overhead that will increase this number. Adjust the formula accordingly based on real data to figure out how quickly you'll hit the 150GB limits.

如果没有你 2024-09-11 05:11:48

为此,您应该使用 PostgreSQL。 Postgres 对空间数据类型(点、线、面等)有更好的支持。它还具有处理和计算不同空间数据类型以及对此类数据进行索引的功能。您可能希望使用 GeoKit gem for ruby​​ on Rails 在 ActiveRecord 级别上进行各种操作。

我同意 webdestroya - 每秒?

You should use PostgreSQL for this. Postgres has better support for spatial data types (point, line, plane, etc.). Also it has functions for handling and calculations of different spatial data types as well as indexing of such data. You may want to use GeoKit gem for ruby on rails for various operations on ActiveRecord level.

And I agree with webdestroya - every second?

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文