AppEngine 上 ACL 的高效组成员资格测试

发布于 2024-12-28 21:27:01 字数 1039 浏览 2 评论 0原文

我正在为数据存储中的对象创建访问控制列表。每个 ACL 条目都可以有一个允许访问相应条目的所有用户 ID 的列表。然后，我获取用户可以访问的实体列表的查询将非常简单：

select * from ACL where accessors = {userId} and searchTerms >= {search}

问题是，这只能在达到索引条目限制之前支持 2500 个用户，当然，将一个包含大量用户的 ACL 条目放在其中会非常昂贵，因为需要更改许多索引条目。

因此，我考虑添加允许访问实体的用户组列表。这可以大大减少每个 ACL 条目所需的索引条目数量，但查询会变得更长，因为我必须查询用户所在的每个可能的组：

select * from ACL where accessors = {userId} and searchTerms >= {search}
for (GroupId id : theSetOfGroupsTheUserBelongsTo) {
    select * from ACL where accessingGroups = {id} and searchTerms >= {search}
}

mergeAllTheseResultsTogether()

这将花费很长时间，更难以分页，等等吗

任何人都可以推荐一种从 ACL 中获取实体列表且不限制访问用户数量的方法

？ 编辑更多详细信息：

我正在对学校使用的一长串学术主题进行搜索和排序。有些主题是由管理员创建的，应该是全校范围的。其他的是由教师创建的，并且可能仅与这些教师相关。我想创建一个类似 google-docs-list 的集合层次结构，将每个主题视为一个文档。 searchTerms 字段将是主题名称中的单词列表 - 没有很多内部文本可供搜索。每个主题将至少位于一个集合（组织的“根”集合）中，并且可能位于多达 10-20 个其他集合中，所有集合均由不同的人员管理。理想情况下，文档可能出现的集合数量没有上限。我在这里的困难是生成特定用户至少具有读取访问权限的所有实体的列表 - 谷歌文档中的模拟将是“所有项目”视图。

原文

I'm creating an access control list for objects in my datastore. Each ACL entry could have a list of all user ids allowed to access the corresponding entry. Then my query to get the list of entities a user can access would be pretty simple:

select * from ACL where accessors = {userId} and searchTerms >= {search}

The problem is that this can only support 2500 users before it hits the index entry limit, and of course it would be very expensive to put an ACL entry with a lot of users because many index entries would need to be changed.

So I thought about adding a list of GROUPs of users that are allowed to access an entity. That could drastically lower the number of index entries needed for each ACL entry, but querying gets longer because I have to query for every possible group that a user is in:

select * from ACL where accessors = {userId} and searchTerms >= {search}
for (GroupId id : theSetOfGroupsTheUserBelongsTo) {
    select * from ACL where accessingGroups = {id} and searchTerms >= {search}
}

mergeAllTheseResultsTogether()

which would take a long time, be much more difficult to page through, etc.

Can anyone recommend a way to fetch a list of entities from an ACL that doesn't limit the number of accessing users?

Edit for more detail:

I'm searching and sorting on a long set of academic topics in use at a school. Some of the topics are created by administrators and should be school-wide. Others are created by teachers and are probably only relevant to those teachers. I want to create a google-docs-list-like hierarchy of collections that treats each topic like a document. The searchTerms field would be a list of words in the topic name - there is not a lot of internal text to search. Each topic will be in at least one collection (the organization's "root" collection) and could be in as many as 10-20 other collections, all managed by different people. Ideally there'd be no upper limit to the number of collections a document might appear in. My struggle here is to produce a list of all of the entities a particular user has at least read access to - the analog in google docs would be the "All Items" view.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

哆兒滾 2025-01-04 21:27:01

假设您的文档和组权限的更改频率低于用户查询（或时间要求较低），我建议这样做（这就是我解决类似问题的方法）：

在您的 ACL 中，包含字段

访问器 <-- all可以访问文档的用户
ID numberOfAccessors <-- 每当更改该字段时存储访问器的长度
searchTerms

ACL 的 key_name 类似于 "indexed_document_id||index_num"

键中的 index_num 允许您有多个实体存储用户列表，以防超过 5000 个（列表中项目的数据存储限制）或您希望列表中包含的任意数量降低加载一项的成本（尽管您不需要经常这样做）。

不要忘记要访问的文档应该是索引实体的父级。这样您就可以执行 select __key__ 查询，而不是 select * （这避免了必须反序列化访问器和 searchTerms 字段）。您可以搜索并返回实体的parent()，而无需访问任何字段。有关该内容和其他 gae 搜索设计的更多信息，请访问此博客发布。遗憾的是，该块帖子没有涵盖像我们这样的 ACL 索引。

免责声明：我现在遇到了这种设计的问题，因为用户可以访问的文档是由他们是否关注该用户来控制的。这意味着，如果他们关注或取消关注，可能会有大量现有文档需要用户添加/删除。如果你是这种情况，那么如果你遵循我的技术，你可能会陷入和我一样的困境。我目前计划通过随着时间的推移在后台更新旧文档的索引来处理这个问题。回答这个问题的其他人可能有一个解决方案 - 如果没有，我可能会将其作为一个单独的问题发布。

对此数据结构的操作分析：

添加索引文档：

对于每个有权访问该文档的组，创建一个实体，其中包含可以在访问者字段中访问该文档的所有用户
。适合一个字段，创建更多实体并增加index_num值（使用分片计数器）。

O(n*m)，其中 n 是用户数量，m 是搜索查询数量

查询索引文档：

从 ACL 中选择 __key__ 其中 accessors = {userid} 和 searchTerms >= { search} （虽然我不确定为什么你实际上这样做“> =，在我的查询中它总是“=”）
从这些键中获取所有父键
过滤掉重复项
获取这些父文档

O（n +m) 其中 n 是用户数量，m 是搜索词的数量 - 这相当快。它使用两个索引的锯齿形合并连接（一个在访问器上，一个在搜索项上）。这假设 gae 索引扫描是线性的。对于“=”查询，它们可能是对数的，但我不了解它们索引的设计，也没有做过任何测试来验证。另请注意，您不需要加载索引实体的任何属性。

添加用户对特定文档的访问权限

检查用户是否已具有访问权限：从 ACL 中选择 __key__，其中 accessor = {userid} 且parent = {key(document)}
如果没有，请添加它： select * from ACL where Parent = {key(document)} and numberOfAccessors < {5000（或任何最大值）} limit 1
将 {userid} 附加到访问者并放置实体

O(n)，其中 n 是有权访问该文档的人数。

删除用户对特定文档的访问权限

从 ACL 中选择 *，其中访问者 = {userid} 且父级 = {key(document)}
从访问者中删除 {userid} 并放入实体

O(n)，其中 n 是有权访问该文档的人数。

压缩索引

如果您执行大量删除操作，则必须偶尔执行此操作。不确定检测这一点的最佳方法。

要查明特定文档是否有需要压缩的内容：select * from ACL whereparent = {key(document)} and numberOfAccessors < {2500（或最大值的一半）}
对于每一对/任意一对：删除一个，将访问器附加到另一个

O(n)，其中 n 是有权访问该文档的人数

Assuming that your documents and group permissions change less often (or are less time critical) than user queries, I suggest this (which is how i'm solving a similar problem):

In your ACL, include the fields

accessors <-- all userids that can access the document
numberOfAccessors <-- store the length of accessors whenever you change that field
searchTerms

The key_name for ACL would be something like "indexed_document_id||index_num"

index_num in the key allows you potentially have multiple entities storing the list of users, incase there are more than 5000 (the datastore limit on items in a list) or however many you want to have in a list to reduce the cost of loading one up (though you wont need to do that often).

Don't forget that the document to be accessed should be the parent of the index entity. that way you can do a select __key__ query rather than a select * (this avoids having to deserialize the accessor and searchTerms fields). You can search and return the parent() of the entity without needing to access any of the fields. More on that and other gae search design at this blog post. Sadly that block post doesn't cover ACL indexes like ours.

Disclaimer: I've now encountered a problem with this design in that what document a user has access to is controlled by whether they are following that user. That means that if they follow or unfollow, there could be a large number of existing documents the user needs to be added/removed from. If this is the case for you, then you might be stuck in the same hole as me if you follow my technique. I currently plan to handle this by updating the indexes for old documents in the background, over time. Someone else answering this question might have a solution to it baked in - if not I may post it as a separate question.

Analysis of operations on this datastructure:

Add an indexed document:

For each group that has access to the document, create an entity which includes all users that can access it in the accessors field
If there are too many to fit in one field, make more entities and increment that index_num value (using sharded counters).

O(n*m) where n is number of users and m is number of search queries

Query an indexed document:

select __key__ from ACL where accessors = {userid} and searchTerms >= {search} (though i'm not sure why you do ">=" actually, in my queries it's always "=")
Get all the parent keys from these keys
Filter out duplicates
Get those parent documents

O(n+m) where n is the number of users and m is the number of search terms - this is pretty fast. it uses the zig-zag merge join of two indexes (one on accessors, one on searchterms). this assumes that gae index scans are linear. they might be logarithmic for "=" queries but i'm not privy to the design of their indexes nor have i done any tests to verify. note also that you dont need to load any of the properties of the index entity.

Add access for a user to a particular document

Check if the user already has access: select __key__ from ACL where accessor = {userid} and parent = {key(document)}
If not, add it: select * from ACL where parent = {key(document)} and numberOfAccessors < {5000 (or whatever your max is)} limit 1
Append {userid} to accessors and put the entity

O(n) where n is the number of people who have access to the document.

Remove access for a user to a particular document

select * from ACL where accessor = {userid} and parent = {key(document)}
Remove {userid} from accessors and put the entity

O(n) where n is the number of people who have access to the document.

Compact the indexes

You'll have to do this once in a while if you do a lot of removals. not sure the best way to detect this.

To find out whether there's anything to compact for a particular document: select * from ACL where parent = {key(document)} and numberOfAccessors < {2500 (or half wahtever your max is)}
For each/any pair of these: delete one, appending the accessors to the other

O(n) where n is the number of people who have access to the document

回复收藏 0 原文

~没有更多了~