如何在 Google App Engine 中计算多对多关系的双方
考虑一个允许用户评论歌曲的 GAE (python) 应用程序。预计用户数量为1,000,000+。预计歌曲数量为 5,000 首。
该应用程序必须能够:
- 提供用户评论的歌曲数量
- 提供评论歌曲的用户数量
计数器管理必须是事务性的,以便它们始终反映基础数据。
看来 GAE 应用程序必须始终保持这些类型的计数计算,因为在请求时查询它们效率很低。
我的数据模型
class Song(BaseModel):
name = db.StringProperty()
# Number of users commenting on the song
user_count = db.IntegerProperty('user count', default=0, required=True)
date_added = db.DateTimeProperty('date added', False, True)
date_updated = db.DateTimeProperty('date updated', True, False)
class User(BaseModel):
email = db.StringProperty()
# Number of songs commented on by the user
song_count = db.IntegerProperty('song count', default=0, required=True)
date_added = db.DateTimeProperty('date added', False, True)
date_updated = db.DateTimeProperty('date updated', True, False)
class SongUser(BaseModel):
# Will be child of User
song = db.ReferenceProperty(Song, required=True, collection_name='songs')
comment = db.StringProperty('comment', required=True)
date_added = db.DateTimeProperty('date added', False, True)
date_updated = db.DateTimeProperty('date updated', True, False)
代码
这以事务方式处理用户的歌曲计数,但不处理歌曲的用户计数。
s = Song(name='Hey Jude')
s.put()
u = User(email='[email protected]')
u.put()
def add_mapping(song_key, song_comment, user_key):
u = User.get(user_key)
su = SongUser(parent=u, song=song_key, song_comment=song_comment, user=u);
u.song_count += 1
u.put()
su.put()
# Transactionally add mapping and increase user's song count
db.run_in_transaction(add_mapping, s.key(), 'Awesome', u.key())
# Increase song's user count (non-transactional)
s.user_count += 1
s.put()
问题是:我如何以事务方式管理两个计数器?
根据我的理解,这是不可能的,因为 User、Song 和 SongUser 必须是同一个 实体组。它们不能位于一个实体组中,因为这样我的所有数据都将位于一组中,并且无法由用户分发。
Consider a GAE (python) app that lets users comment on songs. The expected number of users is 1,000,000+. The expected number of songs is 5,000.
The app must be able to:
- Give the number of songs a user has commented on
- Give the number of users who have commented on a song
Counter management must be transactional so that they always reflect the underlying data.
It seems GAE apps must keep these types of counts calculated at all times since querying for them at request time would be inefficient.
My Data Model
class Song(BaseModel):
name = db.StringProperty()
# Number of users commenting on the song
user_count = db.IntegerProperty('user count', default=0, required=True)
date_added = db.DateTimeProperty('date added', False, True)
date_updated = db.DateTimeProperty('date updated', True, False)
class User(BaseModel):
email = db.StringProperty()
# Number of songs commented on by the user
song_count = db.IntegerProperty('song count', default=0, required=True)
date_added = db.DateTimeProperty('date added', False, True)
date_updated = db.DateTimeProperty('date updated', True, False)
class SongUser(BaseModel):
# Will be child of User
song = db.ReferenceProperty(Song, required=True, collection_name='songs')
comment = db.StringProperty('comment', required=True)
date_added = db.DateTimeProperty('date added', False, True)
date_updated = db.DateTimeProperty('date updated', True, False)
Code
This handles the user's song count transactionally but not the song's user count.
s = Song(name='Hey Jude')
s.put()
u = User(email='[email protected]')
u.put()
def add_mapping(song_key, song_comment, user_key):
u = User.get(user_key)
su = SongUser(parent=u, song=song_key, song_comment=song_comment, user=u);
u.song_count += 1
u.put()
su.put()
# Transactionally add mapping and increase user's song count
db.run_in_transaction(add_mapping, s.key(), 'Awesome', u.key())
# Increase song's user count (non-transactional)
s.user_count += 1
s.put()
The question is: How can I manage both counters transactionally?
Based on my understanding this would be impossible since User, Song, and SongUser would have to be a part of the same entity group. They can't be in one entity group because then all my data would be in one group and it could not be distributed by user.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您确实不必担心处理用户在事务中评论的歌曲数量,因为用户似乎不太可能一次评论多于一首歌曲,对吧?
现在,肯定会有许多用户同时评论同一首歌曲,因此您必须担心确保数据不会因竞争条件而变得无效。
但是,如果您在“歌曲”实体中保留对歌曲发表评论的用户数量,并通过事务锁定该实体,那么您将对该实体产生非常高的争用,并且数据存储超时将使您的应用程序出现问题。很多问题。
此问题的答案是分片计数器。
为了确保您可以创建新的 SongUser 实体并更新相关歌曲的分片计数器,您应该考虑让 SongUser 实体将相关歌曲作为父实体。这会将它们放在同一个实体组中,您可以在同一事务中创建 SongUser 并更新分片计数器。 SongUser 与创建它的用户的关系可以保存在 ReferenceProperty 中。
关于您对两个更新(事务更新和用户更新)并非都成功的担忧,这始终是一种可能性,但考虑到任一更新都可能失败,您将需要进行适当的异常处理以确保两者都成功。这是很重要的一点:事务中更新不能保证成功。如果事务因任何原因无法完成,您可能会收到 TransactionfailedError 异常。
因此,如果您的事务完成后没有引发异常,请在事务中运行对用户的更新。如果发生某些错误,这将使您自动重试更新用户。除非用户实体上可能存在一些我不理解的争用,否则它最终不会成功的可能性非常小。如果这是一个不可接受的风险,那么我认为 AppEngine 没有为您提供解决此问题的完美解决方案。
首先问问自己:如果某人评论的歌曲数量少了一首,真的那么糟糕吗?这与更新银行账户余额或完成股票销售一样重要吗?
You really shouldn't have to worry about handling the user's count of songs on which they have commented inside a transaction because it seems unlikely that a User would be able to comment on more than one song at a time, right?
Now, it is definitely the case that many users could be commenting on the same song at one time, so that is where you have to worry about making sure that the data isn't made invalid by a race condition.
However, if you keep the count of the number of users who have commented on a song inside the Song entity, and lock the entity with a transaction, you are going to get very high contention for that entity and datastore timeouts will make you application have lots of problems.
This answer for this problem is Sharded Counters.
In order to make sure that you can create a new SongUser entity and update the related Song's sharded counter, you should consider having the SongUser entity have the related Song as a parent. That will put them in the same entity group and you can both create the SongUser and updated the sharded counter in the same transaction. The SongUser's relationship to the User who created it can be held in a ReferenceProperty.
Regarding your concern about the two updates (the transactional one and the User update) not both succeeding, that is always a possibility, but given that either update can fail, you will need to have proper exception-handling to ensure that both succeed. That's an important point: the in-transaction-updates are not guaranteed to succeed. You may get a TransactionfailedError exception if the transaction can not complete for any reason.
So, if your transaction completes without raising an exception, run the update to User in a transaction. That will get you automatic retries of the update to User, should some error occur. Unless there's something about possible contention on the User entity that I don't understand, the possiblity that it will not eventually succeed is surpassingly small. If that is an unacceptable risk, then I don't think that that AppEngine has a perfect solution to this problem for you.
First ask yourself: is it really that bad if the count of songs that someone has commented on is off by one? Is this as critical as updating a bank account balance or completing a stock sale?