无法指定DB [Collection]的条件。
一直试图自己解决这个问题,所以我们有以下声明:
练习 5.1(代码)使用 MongoDB 集合中存储的数据和评论进行计数
进行评论的不同用户的数量(即计算reviewerID字段的不同值的数量)
给出至少一个差评(小于或等于 2 星)的不同用户的数量
收到评论的不同书籍的数量(即计算字段asin的不同值的数量)。
这是我的代码:
n_users = len(db[reviews_collection].distinct("reviewerID"))
n_users_bad_rating = len(db[reviews_collection].distinct("reviewerID", {"overall": {"$lte": 2.0}}))
n_books = len(db[reviews_collection].distinct("asin"))
print(f"There are {n_users} distinct users.")
print(f"There are {n_users_bad_rating} distinct users who gave at least one bad rating (less or equal to 2 stars).")
print(f"There are {n_books} distinct books in the reviews.")
这是我得到的输出:
有 3613 个不同的用户。 有 0 个不同的用户给出了至少 1 个差评(小于或等于 2 星)。 评论中有 3807 种不同的书籍。
集合中的数据示例:
{'_id': ObjectId('6252c21c2ee307d4d522af0a'),
'appreciation': 'liked',
'asin': 'B000R93D4Y',
'book_index': 0,
'helpful': [3, 3],
'overall': 5.0,
'reviewText': 'A strange world full of strange creatures, knights, and '
'beautiful maidens. The magical aspect of healing was a nice '
'touch.',
'reviewTime': '06 23, 2013',
'reviewerID': 'A195CNOUUIT4SU',
'summary': 'Great tale of dragons',
'train_val_test': 'train',
'unixReviewTime': 1371945600,
'user_index': 0}
问题:为什么我无法使用条件?我在笔记本中还有其他练习,其中要求我查询数据库,除非我尝试指定条件,否则它工作得很好。当我使用循环时,它告诉我有一个“TypeError:字符串索引必须是 int”
Have been trying to solve it by myself so we have the following statement:
Exercise 5.1 (code) Using the data stored in the MongoDB collection with the reviews, count
The number of distinct users who made a review (i.e. count the number of distinct values for the field reviewerID)
The number of distinct users who gave at least one bad rating (less or equal that 2 stars)
The number of distinct books which received a review (i.e. count the number of distinct values for field asin).
Here is my code:
n_users = len(db[reviews_collection].distinct("reviewerID"))
n_users_bad_rating = len(db[reviews_collection].distinct("reviewerID", {"overall": {"$lte": 2.0}}))
n_books = len(db[reviews_collection].distinct("asin"))
print(f"There are {n_users} distinct users.")
print(f"There are {n_users_bad_rating} distinct users who gave at least one bad rating (less or equal to 2 stars).")
print(f"There are {n_books} distinct books in the reviews.")
Here is the output I get:
There are 3613 distinct users.
There are 0 distinct users who gave at least one bad rating (less or equal to 2 stars).
There are 3807 distinct books in the reviews.
Example of what the data in the collection looks like:
{'_id': ObjectId('6252c21c2ee307d4d522af0a'),
'appreciation': 'liked',
'asin': 'B000R93D4Y',
'book_index': 0,
'helpful': [3, 3],
'overall': 5.0,
'reviewText': 'A strange world full of strange creatures, knights, and '
'beautiful maidens. The magical aspect of healing was a nice '
'touch.',
'reviewTime': '06 23, 2013',
'reviewerID': 'A195CNOUUIT4SU',
'summary': 'Great tale of dragons',
'train_val_test': 'train',
'unixReviewTime': 1371945600,
'user_index': 0}
Question: Why am I unable to use conditions? I have other exercices in the notebook where am I asked to query the database it works perfectly fine except if I try to specify conditions. When I use a loop it tells me that I have a "TypeError: string indexes must be int"
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
以下是如何通过聚合框架查找总体 <=2 的 reviewerID 的不同值的一种方法:
playground
Here is one option how to find distinct values for reviewerID having overall <=2 via aggregation framework:
playground