再次存储数百万个值的关键数据库建模
我想将所有标签存储在它们出现的文档上,并通过其他服务/客户端使其可以搜索。比例:
- 每天100亿搜索查询
- 1000万新标签每天CRUD(从文档中删除或附加到Doc),
因此假设 “你好”出现在1000万个文件中。 因此,当用户对“ Hello”进行查询时,我想返回发生在此的document_ids列表。
我应该为相同的数据建模做什么?
选项1: 使用密钥:值nosql喜欢DynamoDB
key: "hello"
value: [doc_id1, doc_id2, .......]
问题:每当与此标签有关的任何文档发生更改时,我们都必须读取实际值并进行更改。
选项2: 在单个行中存储并使用MongoDB问题
"hello": doc_id1
"hello": doc_id2
上共享
:假设doc_id122删除“ hello”标签时,我们将不得不获取所有条目以删除该标签,因为我们的数据库将在tag_name option3:基于列的基于列(例如cassandra)
该 数据库。 :弹性搜索
对此的广泛要求是:
- 我们希望支持标签服务中标签上的自动驾驶器。
- 根据一些排名返回(我们在第一次中不能返回100万),因此返回前50个最受欢迎的文档(最能查看,最鼓掌)。我认为弹性搜索在内部提供了根据TG-IDF算法对文档进行更高排名的选项
I want to store all tags against the document in which they appeared and make it searchable by some other service/client. Scale:
- 10 Billion search query per day
- 10 Million New tags CRUD per day (deleted from doc or appended to doc)
So suppose
"hello" appeared in 10 million documents.
So when a user does the query for "hello", I want to return the list of document_ids in which it occurred.
What should I do for the data modelling for the same?
option 1:
use key: value NoSQL like dynamodb
key: "hello"
value: [doc_id1, doc_id2, .......]
Issues: whenever there is a change in any document related to this tag, we have to read the real value and make the changes.
option 2:
storing in individual rows and using something like MongoDB
"hello": doc_id1
"hello": doc_id2
Issue: suppose when doc_id122 removes the "hello" tag then we will have to fetch all entries to delete this one as our database will be shared on tag_name
option3 : column based (e.g Cassandra)
option 4: elastic search
An extensive requirement for the same is: that
- we want to support the autosuggest on the tag in our tag service.
- return according to some ranking (we can't return 1 million in the first go) so return the first 50 most popular documents (can be most viewed, most clapped). I think elastic search internally gives the option to rank documents higher based on Tg-IDF algorithm
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论