流式 Web 应用程序 - Twitter、Facebook、NoSQL 还是 SQL?

发布于 2024-12-01 00:15:45 字数 603 浏览 2 评论 0原文

因此,我们面临着设计挑战,我们有一个绝对干净的基础来开发一个系统,该系统可以呈现各种社交网络提要(例如 Twitter 和 Twitter)的处理结果。网络上的 Facebook 以及通过 REST 等 API 服务。处理部分已经完成,但是我们现在需要某个地方来存储结果。

结果格式类似于消息 ID、消息日期、处理的时间戳,然后是各种处理分数的集合。该数据库中将有大约 2 亿条消息。所以我们首先需要的是存储这些数据的东西。我们认为 NoSQL 文档数据库可能会很有趣,因为我们需要能够选择一系列日期,这会折扣列族样式数据库(因为我认为 HBase 中的键范围扫描很慢)。或者更好的选择可能是简单地将这些数据存储在旧的 MySQL 或 VoltDB 中。有没有人有关于实施此类系统的示例用例或故事?

接下来的事情是开发一个网络应用程序。我们需要一个可以实时获取数据并更新界面的图表服务。我们正在考虑使用 HighCharts 来实现此目的。有更好的吗?

最后,我们需要某种 API 服务,它可以像 Commet 应用程序一样运行并传输数据,类似于 Twitter 的流 API。我认为最好的选择是node.js。

所以我想问题是我们选择的技术是否最适合这项工作,是否有任何好的示例用例以及有人会推荐什么?

干杯!

So we have a design challenge, we have an absolutely clean slate to develop a system which presents the processing results of various social networking feeds like Twitter & Facebook on the web and via an API service like REST. The processing part has already been completed however we now need somewhere to store the results.

The result format looks something like a message ID, the date of the message, the processed timestamp and then a collection of various processing scores. There will be around 200 million messages in this database. So the first thing we need is something to store this data. We are thinking a NoSQL document database might be interesting to try given that we need to be able to select over a range of dates which discounts column family style databases (as I believe key range scanning in HBase is slow). Or the better option may be to simply store this data in good old MySQL or VoltDB. Does anyone have example use cases or stories on their implementation of such a system?

The next thing will be to develop a web application. We need a charting service which can take data in real-time and update the interface. We are thinking of using HighCharts for this purpose. Is there anything better?

Finally we need some sort of API service which can act like a commet application and stream data, something like Twitter's streaming API. I was thinking the best option for this would be node.js.

So I guess the question is are the technologies we have selected the best for the job, are there any good example use cases out there and is there anything anyone would recommend?

Cheers!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

梦屿孤独相伴 2024-12-08 00:15:45

关于存储:nosql存储有4种类型。键/值、列数据库、文档数据库和图形数据库。每一种都比前一种慢,但也为您提供了更多功能。如果您只需要存储数据键/值或列数据库​​是您的选择。对于这种类型的存储,数据处理是手动完成的,您可能需要某种映射减少实现。也许是哈杜普。文档和图形数据库为您提供某种查询,您可以在数据库中移动部分数据处理(例如日期过滤器)。如果我必须选择一些 nosql 存储,我将使用图形数据库(例如 neo4j)进行测试,如果我有性能问题,请切换到列数据库(例如 cassandra)并进行映射缩减

关于图表:HighCharts 似乎是不错的选择。我不知道 svg 浏览器支持,也不知道是否存在一些性能问题,但在我的机器上看起来非常好。

关于数据流。我对 Nodejs 的经验很少,这将是我的第一选择。还有一些其他实现,例如用于 python 的 Tornadoweb 和用于 erlang 的 Misultin、Mochiweb 和 Cowboy。我找到了 基准 的链接该服务器的数量似乎 erlang 服务器比 Nodejs 更快。您也可以看看它们。

About storage: There are 4 types of nosql storage. key/value, column database, document database and graph database. Each one is slower than the previous one but also gives you more features. In case you need only to store data key/value or column database is your choice. With this type of storage data processing is done by hand and you may need some kind of map reduce implementation. Maybe hadoop. Document and graph databases gives you some kind of query and you can move part of data processing in database (e.g. date filters). If i have to choose some nosql storage I'll make tests with graph database (e.g. neo4j) and If i have performance issues switch to column database (e.g. cassandra) and map reduce

About charts: HighCharts seems good option. I don't know about svg browser support and if there are some performance issues but On my machine looks very nice.

About data streaming. I have little experience only with nodejs and it will be my first choise. There are few other implementations like Tornadoweb for python and Misultin, Mochiweb and Cowboy for erlang. I found a link with benchmark of this servers and it seems erlang servers are faster than nodejs. You can also look at them.

爱要勇敢去追 2024-12-08 00:15:45

您还可以将 SOLR/Lucene 与分片结合使用。通过主/从 solr 设置可以增加吞吐量。

You can also use SOLR/Lucene with sharding. Throughput can be increased by having a master/slave solr setup.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文