社交网络的图形数据库
我正在做一个类似于社交网络的系统。用户数量上限最终必须为 50.000 或最多 70.000。
目前我正在使用 mysqli+prepared 语句。 ERD现在有30张表,最终可能达到40张表。
所以,我的问题是:我从未使用过图形数据库...我已经通过 mysql 工作台完成了 ERD,并且已经开发了一些代码。对于该项目的预期用户数量,是否建议从 MySQL 更改为图形数据库?我的sql代码和数据库模型可以用吗?这个改动有什么好处吗?
你怎么认为 ?
谢谢
I am doing a system similar to a social network. The number max of users must be eventually 50.000 or 70.000 at best.
At the moment i am using mysqli+prepared statments. The ERD have now 30 tables, eventually may reach to 40 tables.
So, my question is: i never used a graph database...i have the ERD done by mysql workbench and some code already developed. For the number expected of the users in this project, is recommended change from MySQL to a graph database? my sql code and database model can be availed? there is any advantage with this change?
what do you think ?
thanks
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
如果您可以访问递归查询(MySQL 中不是这种情况,但可以在 PostgreSQL)并且您的查询涉及最大深度标准(这可能是您在社交网络上的情况),或者它们是否正确索引。
有多种方法可以对图进行索引。在您的情况下,您的图表可能并不密集,因为您正在处理几乎独立的多个森林(您通常会处理紧密聚集的用户组),因此您有很多选择。
最容易实现的是传递闭包(基本上,调用预先计算所有潜在路径)。在您的情况下,它很可能是部分的(例如,深度 2 或深度 3)。这允许在单独的表中完全索引相关节点,以实现非常快速的图形查询。使用触发器或存储过程来保持同步。
如果您的图表比这更密集,您可能需要考虑使用 GRIPP 指数。与嵌套集非常相似,如果您删除
(rgt - lft - 1) / 2
= number of Children 属性,并使用 float 值作为 lft/rgt,则后者效果最好(如更新最快)而不是整数。 (这样做可以避免在插入/移动节点时重新索引图表的整个块。)Graphs are nice and fast when stored in SQL, if you've access to recursive queries (which is not the case in MySQL, but which are available in PostgreSQL) and your queries involve a max-depth criteria (which is probably your case on a social network), or if they're indexed properly.
There are multiple methods to index graphs. In your case your graph probably isn't dense, as in you're dealing with multiple forests which are nearly independent (you'll usually be dealing with tightly clustered groups of users), so you've plenty options.
The easiest to implement is a transitive closure (which is, basically, pre-calculating all of the potential paths is called). In your case it may very well be partial (say, depth-2 or depth-3). This allows to fully index related nodes in a separate table, for very fast graph queries. Use triggers or stored procedures to keep it in sync.
If your graph is denser than that, you may want to look into using a GRIPP index. Much like with nested sets, the latter works best (as in updated fastest) if you drop the
(rgt - lft - 1) / 2
= number of children property, and use float values for lft/rgt instead of integers. (Doing so avoids to reindex entire chunks of the graph when you insert/move nodes.)