AppEngine 上 ACL 的高效组成员资格测试
我正在为数据存储中的对象创建访问控制列表。每个 ACL 条目都可以有一个允许访问相应条目的所有用户 ID 的列表。然后,我获取用户可以访问的实体列表的查询将非常简单:
select * from ACL where accessors = {userId} and searchTerms >= {search}
问题是,这只能在达到索引条目限制之前支持 2500 个用户,当然,将一个包含大量用户的 ACL 条目放在其中会非常昂贵,因为需要更改许多索引条目。
因此,我考虑添加允许访问实体的用户组列表。这可以大大减少每个 ACL 条目所需的索引条目数量,但查询会变得更长,因为我必须查询用户所在的每个可能的组:
select * from ACL where accessors = {userId} and searchTerms >= {search}
for (GroupId id : theSetOfGroupsTheUserBelongsTo) {
select * from ACL where accessingGroups = {id} and searchTerms >= {search}
}
mergeAllTheseResultsTogether()
这将花费很长时间,更难以分页,等等 吗
任何人都可以推荐一种从 ACL 中获取实体列表且不限制访问用户数量的方法
? 编辑更多详细信息:
我正在对学校使用的一长串学术主题进行搜索和排序。有些主题是由管理员创建的,应该是全校范围的。其他的是由教师创建的,并且可能仅与这些教师相关。我想创建一个类似 google-docs-list 的集合层次结构,将每个主题视为一个文档。 searchTerms 字段将是主题名称中的单词列表 - 没有很多内部文本可供搜索。每个主题将至少位于一个集合(组织的“根”集合)中,并且可能位于多达 10-20 个其他集合中,所有集合均由不同的人员管理。理想情况下,文档可能出现的集合数量没有上限。我在这里的困难是生成特定用户至少具有读取访问权限的所有实体的列表 - 谷歌文档中的模拟将是“所有项目”视图。
I'm creating an access control list for objects in my datastore. Each ACL entry could have a list of all user ids allowed to access the corresponding entry. Then my query to get the list of entities a user can access would be pretty simple:
select * from ACL where accessors = {userId} and searchTerms >= {search}
The problem is that this can only support 2500 users before it hits the index entry limit, and of course it would be very expensive to put an ACL entry with a lot of users because many index entries would need to be changed.
So I thought about adding a list of GROUPs of users that are allowed to access an entity. That could drastically lower the number of index entries needed for each ACL entry, but querying gets longer because I have to query for every possible group that a user is in:
select * from ACL where accessors = {userId} and searchTerms >= {search}
for (GroupId id : theSetOfGroupsTheUserBelongsTo) {
select * from ACL where accessingGroups = {id} and searchTerms >= {search}
}
mergeAllTheseResultsTogether()
which would take a long time, be much more difficult to page through, etc.
Can anyone recommend a way to fetch a list of entities from an ACL that doesn't limit the number of accessing users?
Edit for more detail:
I'm searching and sorting on a long set of academic topics in use at a school. Some of the topics are created by administrators and should be school-wide. Others are created by teachers and are probably only relevant to those teachers. I want to create a google-docs-list-like hierarchy of collections that treats each topic like a document. The searchTerms field would be a list of words in the topic name - there is not a lot of internal text to search. Each topic will be in at least one collection (the organization's "root" collection) and could be in as many as 10-20 other collections, all managed by different people. Ideally there'd be no upper limit to the number of collections a document might appear in. My struggle here is to produce a list of all of the entities a particular user has at least read access to - the analog in google docs would be the "All Items" view.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
假设您的文档和组权限的更改频率低于用户查询(或时间要求较低),我建议这样做(这就是我解决类似问题的方法):
在您的 ACL 中,包含字段
ACL 的 key_name 类似于
"indexed_document_id||index_num"
键中的
index_num
允许您有多个实体存储用户列表,以防超过 5000 个(列表中项目的数据存储限制)或您希望列表中包含的任意数量降低加载一项的成本(尽管您不需要经常这样做)。不要忘记要访问的文档应该是索引实体的父级。这样您就可以执行
select __key__
查询,而不是select *
(这避免了必须反序列化访问器和 searchTerms 字段)。您可以搜索并返回实体的parent(),而无需访问任何字段。有关该内容和其他 gae 搜索设计的更多信息,请访问此 博客发布。遗憾的是,该块帖子没有涵盖像我们这样的 ACL 索引。免责声明:我现在遇到了这种设计的问题,因为用户可以访问的文档是由他们是否关注该用户来控制的。这意味着,如果他们关注或取消关注,可能会有大量现有文档需要用户添加/删除。如果你是这种情况,那么如果你遵循我的技术,你可能会陷入和我一样的困境。我目前计划通过随着时间的推移在后台更新旧文档的索引来处理这个问题。回答这个问题的其他人可能有一个解决方案 - 如果没有,我可能会将其作为一个单独的问题发布。
对此数据结构的操作分析:
添加索引文档:
O(n*m),其中 n 是用户数量,m 是搜索查询数量
查询索引文档:
从 ACL 中选择 __key__ 其中 accessors = {userid} 和 searchTerms >= { search}
(虽然我不确定为什么你实际上这样做“> =,在我的查询中它总是“=”)O(n +m) 其中 n 是用户数量,m 是搜索词的数量 - 这相当快。它使用两个索引的锯齿形合并连接(一个在访问器上,一个在搜索项上)。这假设 gae 索引扫描是线性的。对于“=”查询,它们可能是对数的,但我不了解它们索引的设计,也没有做过任何测试来验证。另请注意,您不需要加载索引实体的任何属性。
添加用户对特定文档的访问权限
从 ACL 中选择 __key__,其中 accessor = {userid} 且parent = {key(document)}
select * from ACL where Parent = {key(document)} and numberOfAccessors < {5000(或任何最大值)} limit 1
O(n),其中 n 是有权访问该文档的人数。
删除用户对特定文档的访问权限
从 ACL 中选择 *,其中访问者 = {userid} 且父级 = {key(document)}
O(n),其中 n 是有权访问该文档的人数。
压缩索引
如果您执行大量删除操作,则必须偶尔执行此操作。不确定检测这一点的最佳方法。
select * from ACL whereparent = {key(document)} and numberOfAccessors < {2500(或最大值的一半)}
O(n),其中 n 是有权访问该文档的人数
Assuming that your documents and group permissions change less often (or are less time critical) than user queries, I suggest this (which is how i'm solving a similar problem):
In your ACL, include the fields
The key_name for ACL would be something like
"indexed_document_id||index_num"
index_num
in the key allows you potentially have multiple entities storing the list of users, incase there are more than 5000 (the datastore limit on items in a list) or however many you want to have in a list to reduce the cost of loading one up (though you wont need to do that often).Don't forget that the document to be accessed should be the parent of the index entity. that way you can do a
select __key__
query rather than aselect *
(this avoids having to deserialize the accessor and searchTerms fields). You can search and return the parent() of the entity without needing to access any of the fields. More on that and other gae search design at this blog post. Sadly that block post doesn't cover ACL indexes like ours.Disclaimer: I've now encountered a problem with this design in that what document a user has access to is controlled by whether they are following that user. That means that if they follow or unfollow, there could be a large number of existing documents the user needs to be added/removed from. If this is the case for you, then you might be stuck in the same hole as me if you follow my technique. I currently plan to handle this by updating the indexes for old documents in the background, over time. Someone else answering this question might have a solution to it baked in - if not I may post it as a separate question.
Analysis of operations on this datastructure:
Add an indexed document:
O(n*m) where n is number of users and m is number of search queries
Query an indexed document:
select __key__ from ACL where accessors = {userid} and searchTerms >= {search}
(though i'm not sure why you do ">=" actually, in my queries it's always "=")O(n+m) where n is the number of users and m is the number of search terms - this is pretty fast. it uses the zig-zag merge join of two indexes (one on accessors, one on searchterms). this assumes that gae index scans are linear. they might be logarithmic for "=" queries but i'm not privy to the design of their indexes nor have i done any tests to verify. note also that you dont need to load any of the properties of the index entity.
Add access for a user to a particular document
select __key__ from ACL where accessor = {userid} and parent = {key(document)}
select * from ACL where parent = {key(document)} and numberOfAccessors < {5000 (or whatever your max is)} limit 1
O(n) where n is the number of people who have access to the document.
Remove access for a user to a particular document
select * from ACL where accessor = {userid} and parent = {key(document)}
O(n) where n is the number of people who have access to the document.
Compact the indexes
You'll have to do this once in a while if you do a lot of removals. not sure the best way to detect this.
select * from ACL where parent = {key(document)} and numberOfAccessors < {2500 (or half wahtever your max is)}
O(n) where n is the number of people who have access to the document