这些是在 NOSQL 可扩展网站架构中编写没有联接的查询的方法吗?
我一直听说构建可扩展网站的方法之一是不使用联接。既然大多数数据都是相关的,那么你是如何做到这一点的呢?
我有限的研究产生了以下想法:
A)如果您的数据本质上是相关的,那么确实使用关系数据库,即使用正确的工具来完成工作。
B) 维护数据的非规范化版本。
C) 对于可以强制为非关系型的数据,可以使用 NOSQL。以不需要连接的方式进行数据架构。
D) 如果必须关联数据,那么应用程序层必须通过逐一获取数据集并手动关联结果来手动实现联接。
E)由于应用程序层的手动连接非常慢,因此尝试离线执行这些操作(而不是在用户等待时)。
F) 使用 Map-Reduce。
这是正确的/还有更多答案吗?
I keep hearing that one of the ways to architect a scalable website is to not use joins. How is the world do you do that since most data is relational?
My limited research has yielded these thoughts:
A) If your data is inherently relational then indeed use a relational database, i.e., use the right tool for the job.
B) Maintain a denormalized version of your data.
C) For the data that can be forced to be non-relational then you can use NOSQL. Data architect it in such a way that joins are not necessary.
D) If you must relate your data then the application layer must manually implement joins by fetching the data sets one-by-one and manually relating the results.
E) Since manual joins at the application layer are very slow then try to do these offline (not while the user is waiting).
F) Use Map-Reduce.
Is this correct/any more answers?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
高可扩展性有关于这方面的优秀文章。查看 reddit,了解他们如何处理连接:http://highscalability.com/blog/2010/5/17/7-lessons-learned-while-building-reddit-to-270-million-page.html< /a>
然后已经有一个 stackoverflow 问题,答案中包含一堆类似信息的链接:
编写可扩展网站的技术
High scalability has excellent articles on this. Check out the reddit one for how they handled the joins: http://highscalability.com/blog/2010/5/17/7-lessons-learned-while-building-reddit-to-270-million-page.html
Then there's already a stackoverflow question with a bunch of links in the answers for similar info:
Techniques for writing a scalable website
尽可能多地缓存,即当您有昂贵的查询(有或没有连接)时,尝试缓存查询结果而不是再次执行查询等。为了使整个站点更快,请将对象缓存在尽可能高的层上,即尝试缓存整个页面,如果这不起作用,请尝试缓存页面片段,然后缓存提供数据以填充页面的数据对象等等。
警告:有一些道理“计算机科学中只有两件事很难” - 缓存失效和命名”,所以要小心你在缓存中放入的内容和时间。
Cache as much as possible, i.e. when you have expensive queries (with or without joins) try to cache the query result rather than executing the query again etc. To make the whole site fast, cache objects on the highest possible layer, i.e. try to cache whole pages, if that does not works, try caching page fragments, then the data objects that provide the data to populate the pages etc. etc.
Caveat: There's some truth to the old saying "There are only 2 hard things in Computer Science - cache invalidation and naming things", so be careful what you put in your cache and for how long.