如何设计cassandra(或其他nosql)方案?
我们即将将 apache cassandra 上的一个项目从测试转移到试点,作为 RDBMS 团队,我们可能遗漏了一些东西。
基本规则(或经验教训):
- 确保您拥有大量数据或几乎没有数据(两者之间没有任何数据),
- 不要相信极其便宜的存储(便宜或不贵可能是 更好)
- 将主键视为反向索引
- 将时间(或另一个数据创建顺序)视为行/集群键
- 忘记了 100% 外键 当你可以
- 采样时是否可以
- 不关心 dups
- json客户端上的异步时间聚合可以使 cpu 更加轻松
ETL:
- 如果可以的话,可以采样历史记录(或者采样它只是为了报告单独的报告集群上的使用情况)
- 分布在几个服务器上的单线程数据流将进入手
- 如果你可以负担得起异步处理的费用,您可以从数据模式的知识中受益
- ,扔掉废数据(水平和垂直) - 或者它会误导 BI 人员,甚至董事会成员在更坏的情况下
- 不关心重复
问题是我是否仍然遗漏了一些东西? 还有其他方法可以实现更好的性能吗?
We are about to move a project on apache cassandra from test to pilot and as a rdbms team, we were propably missing something.
Basic rules (or lessons learned):
- be sure you have big or almost no data (nothing between)
- do not believe in extremely cheap storage (cheap or not expensive might be
better) - think of your primary key as it was a reverse index
- think of time (or another data creation order) as it was a row/clustering key
- forgot about 100% foreign keys whenewer you can
- sample if you can
- do not care about dups
- json and asynchronous time aggregation on client can make cpus more relaxed
ETL:
- sample history if you can (or sample it just for reporting usage on separate reporting cluster)
- single threaded data streams spreaded over couple of servers will come in hand
- if you can afford asynchronous processing you can profit from knowledge of data patterns
- throw scrap data away (horizontaly and vertically) - or it will mislead BI people or even board members in worse case
- do not care about dups
The question is am I still missing something?
Are there another ways to achieve even better performance?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论