如何从设备读取和写入极大的数据
我正在为我们的一位客户构建一个位置跟踪系统。他们在车辆上安装了 GPS 设备,每个设备都编程了服务器 IP 和端口号,我们开发了一个 TCP 监听器,用于监听设备发送的请求。这些设备每分钟发送一个请求。
实际问题是,每分钟有 1,00,000 万个 GPS 设备发送请求,因此存储信息非常困难,我无法找出存储来自设备的数据的最佳方法,我应该存储在文件中还是在内存缓存?
请帮助我找到解决这个问题的最佳方法。请记住,每个 GP 设备都启用了 GPRS,并且由于它们具有互联网连接,因此它们可以与我们的后端服务器通信。 有时这些设备无法找到互联网,因为车辆将在所有地方移动,但是一旦设备获得互联网连接,它就会一次发送所有数据包,假设每分钟发送一次。
所以我正在寻找解决这个问题的最佳方法,我正在使用Java编程语言。
提前致谢 !!!!
I am building a location tracking system for one of our client. They have GPS devicees installed in vehicles and each device has programmed with a server IP and a port number, We have developed a TCP listener who listen the request sent by device. These devices sent a request every minute.
Actual problem is that, There are 1,00,000 lakh GPS device sent a request every minute, so storing information is very difficult , I am not able to figure out the best approach to store data coming from devices, Should i store in File or in memcached?
Please help me to find the best way to handle this problem. Please remember that each GPs device is GPRS enabled and due to that they have internet connectivity so they can talk to our backend server.
Sometimes these devices can not find internet since vehicle will be moving across all places, but once device gets internet connection, it sends all packet at one time, which is suppose to send every minute.
So i am finding the best way to handle this problem, I am using Java programing language.
Thanks in Advance !!!!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我想到的一个选择是 Apache Flume 作为将数据收集到 Hadoop 的一种方式。
另一个(商业)选项是 Splunk
One option that comes to mind is Apache Flume as a way to collect data into Hadoop.
Another (commercial) option is Splunk
为了能够处理如此大量的数据,我设置了一组消息队列服务器来对所有传入数据进行排队,并设置一组侦听器来从这些队列中获取消息并解释它们。 ActiveMQ、RabbitMQ 和 HornetQ 理论上每秒可以处理数千条消息。
例如,HornetQ 具有 高性能日志将非常有效地平衡内存日志和文件系统分页之间的关系。在 Linux 上,它与 本机集成LibAIO 优化文件系统交互。
如果您设置了硬件负载平衡器,则可以将 GPS 设备配置为与负载平衡器进行通信,这些设备将转发到其中一台消息队列服务器。
瓶颈可能是从消息队列侦听器获取数据到数据库。为了避免这种情况,您可以使用 MySQL Cluster 的水平分区。
To be able to handle this amount of data, I'd set up a bunch of message queue servers to queue all incoming data, and a set of listeners to take messages from these queues and interprete them. ActiveMQ, RabbitMQ and HornetQ all can theoretically handle thousands of messages per second.
HornetQ, for example, has a high performance journal that will very efficiently balance between an in-memory journal and paging to the file system. On Linux, it got a native integration with LibAIO to optimize file system interaction.
If you set up a hardware load balancer, you can configure the GPS devices to communicate with the load balancer, and these will forward to one of the message queue servers.
The bottleneck might be then to get the data from the message queue listeners to your database. To avoid this you might use MySQL Cluster's horizontal partitioning.