用于产品 ID 管理的 Cassandra Design
我是 Cassandra 的新手,想要开始尝试一个简单的测试
我们使用传统 RDBMS 的数据模型如下
表公司(Id,名称)
表产品(Id,名称,公司 ID)其中公司 ID 是对公司表的 FK 引用
Table ProductInstance (Id, ProductID) 其中 ProductID 是对 Product 表的 FK 引用
Table ProductInstanceRating (Id, ProductInstanceID, Comment) 其中 ProductInstanceID 是对 ProductInstance 表的 FK 引用 关于
Cassandra 的设计方式有何建议?
更新:
我尝试从查询角度查看
要捕获的数据
1) 产品参考号是一个复合键,其中包含:产品名称、产品批号和客户 ID 产品名称:12456 产品批号:PQ23 客户 ID : 879456
那么唯一的产品参考将类似于 12456|PQ23|879456
2) 产品实例将是上述产品的每个实例的唯一哈希值集
,并且产品的每个实例将获得唯一的编号 784A、876T 等,并且该编号对于特定产品参考是唯一的
产品实例参考将类似于 12456|PQ23|879456|784A
3) 每个产品唯一编号可以接收超过 1 个评级
在这种情况下,查询将是类似
查询1) 为了添加/插入产品实例的评级; 获取产品的行,即 12456|PQ23|879456
查询2) 获取该行中的 ProductInstance ID(可能是列族的名称)
查询3) 将评级信息添加为列:值对
如果设计类似于
12456|PQ23|879456 {
784A{timestamp1:{rating:valueA
person name:valueX}
timestamp2:{rating:valueB
person name:valueY}}
876T{timestamp1:{rating:valueC
person name:valueX}
timestamp2:{rating:valueB
person name:valueY}}
}
此后,我们希望:
查询4) 获取所有有实例的产品
查询5) 获取所有具有评级的产品实例
查询6) 获取产品的最高评分
Query7) 获取产品的平均评分
是否有更好、更有效的方法来实现这一点?
I'm new to Cassandra and wanted to start up trying a simple test
Our Data model with traditional RDBMS is as follows
Table Company (Id, Name)
Table Product (Id, Name, Company ID) where Company ID is FK reference to Company table
Table ProductInstance (Id, ProductID) where ProductID is FK reference to Product table
Table ProductInstanceRating (Id, ProductInstanceID, Comment) where ProductInstanceID is FK reference to ProductInstance table
Any suggestions on how the design should be with Cassandra ?
Update:
I tried to look from the querying perspective
Data to be captured
1) Product Reference is a composite key consisting of : Product Name, Product Lot Number, AND Customer ID
Product Name : 12456
Product Lot Number : PQ23
Customer ID : 879456
Then unique Product reference will be something like 12456|PQ23|879456
2) Product Instance will be unique set of hash numbers for every instance of the aforementioned Product
AND each instance of the product will get a unique number
784A, 876T ,etc and this number will be unique for a particular Product Reference
Product Instance reference will be something like 12456|PQ23|879456|784A
3) Each Product Unique number can receive more than 1 rating
In that case, the queries will be something like
Query1) In order to add/insert rating for a Product Instance;
Fetch Row for Product i.e 12456|PQ23|879456
Query2) Fetch the ProductInstance ID within this row (maybe name of column family)
Query3) Add the rating information as a column:value pair
Should the design be something like
12456|PQ23|879456 {
784A{timestamp1:{rating:valueA
person name:valueX}
timestamp2:{rating:valueB
person name:valueY}}
876T{timestamp1:{rating:valueC
person name:valueX}
timestamp2:{rating:valueB
person name:valueY}}
}
Thereafter, we would want to:
Query4) fetch all products that have instances
Query5) fetch all product instance that have ratings
Query6) fetch highest rating for product
Query7) fetch average rating for product
Is there a better and more efficient way to implement this ?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
在 Cassandra 中,您需要考虑希望运行哪些查询(与 RDBMS 不同,在 RDBMS 中您可以创建一个相当抽象的域模型,然后设计 SQL 查询),因为您只能执行行键查找或使用相当有限的辅助查询索引。任何其他索引都需要在列族中显式创建。
因此,非规范化是很常见的。然而,维护数据可能会更加困难,因为 Cassandra 中没有级联删除等功能。
一个候选结构可能是使用 ProductInstance 列族来存储评级,然后使用 Product 列族作为索引来查找给定产品的实例,然后使用 Company 列族作为索引来查找给定产品的产品某特定公司。或者,您可以直接进行非规范化并将“公司”作为“产品”列系列中的一列。
重复一遍,只有当您知道您希望支持哪些查询时,才能确定适合您的“正确”结构。
In Cassandra, you need to think about what queries you wish to run (unlike a RDBMS where you can make a fairly abstract model of the domain and then devise SQL queries afterwards) because you can only do row-key lookups or use fairly limited secondary indexes. Any other indexes need to be explicitly created within your column families.
For this reason it is common to denormalise. However, maintaining the data may then be more difficult, as there are no cascading deletes etc in Cassandra.
One candidate structure might be to have a ProductInstance columnfamily to store the ratings, then a Product columnfamily as an index to find the instances of a given product, then a Company columnfamily as an index to find the products for a particular company. Or you could just denormalise and put the Company as a column in the Product columnfamily.
To repeat, the 'correct' structure for you can only be determined when you know what queries you wish to support.