SOLR 权限/根据访问权限过滤结果
例如,我有文档 A、B、C。用户 1 必须只能看到文档 A、B。用户 2 必须只能看到文档 C。是否可以在 SOLR 中执行此操作而不通过元数据过滤?如果我使用元数据过滤器,每次访问权限发生变化时,我都必须重新索引。
[2012 年 2 月 14 日更新] 不幸的是,就客户而言,变化是频繁的。数据是保密的,通常仅由内部用户所有者管理。那么具体的情况是他们需要能够将这些文档共享给某些外部用户并指定这些用户的访问级别。大多数时候,这是一项临时任务,不会提前确定
For example I have Documents A, B, C. User 1 must only be able to see Documents A, B. User 2 must only be able to see Document C. Is it possible to do it in SOLR without filtering by metadata? If I use metadata filter, everytime there are access right changes, I have to reindex.
[update 2/14/2012] Unfortunately, in the client's case, change is frequent. Data is confidential and usually only managed by the owners which are internal users. Then the specific case is they need to be able to share those documents to certain external users and specify access levels for those users. And most of the time this is an adhoc task, and not identified ahead of time
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
我建议将访问角色(是的,它的复数)存储为文档元数据。这里所需的字段
access_roles
是一个可分面的多值字符串字段。拥有该文档的用户是该文档的默认访问角色。
要更改文档的访问角色,您可以编辑
access_roles
。当 Jane 进行搜索时,她所属的访问角色将成为查询的一部分。 Solr 将仅检索与用户访问角色匹配的文档。
当维也纳办公室 (
manager_vienna
) 的经理 Jane (user_jane
) 进行搜索时,她的搜索结果如下:获取包含
user_jane
的所有文档 或access_roles
中的manager_vienna
;Doc1
和Doc2
。当特殊团队 (
specia_team
) 的成员 Bob (user_bob
) 进行搜索时,系统会为他获取
Doc2
。查询改编自 http://wiki.apache.org/solr/SimpleFacetParameters#Multi-Select_Faceting_and_LocalParams< /a>
I would suggest storing the access roles (yes, its plural) as document metadata. Here the required field
access_roles
is a facet-able multi-valued string field.The user owning the document is a default access role for that document.
To change the access roles of a document, you edit
access_roles
.When Jane searches, the access roles she belongs to will be part of the query. Solr will retrieve only the documents that match the user's access role.
When Jane (
user_jane
), manager at vienna office (manager_vienna
) searches, her searches go like:which fetches all documents which contains
user_jane
ORmanager_vienna
inaccess_roles
;Doc1
andDoc2
.When Bob, (
user_bob
), member of a special team (specia_team
) searches,which fetches
Doc2
for him.Queries adapted from http://wiki.apache.org/solr/SimpleFacetParameters#Multi-Select_Faceting_and_LocalParams
可能想要检查文档级安全补丁。
https://issues.apache.org/jira/browse/SOLR-1872
https://issues.apache.org/jira/browse/SOLR-1834
Might want to check the Document level Security patches.
https://issues.apache.org/jira/browse/SOLR-1872
https://issues.apache.org/jira/browse/SOLR-1834
我认为我的方法与@aitchnyu 的答案类似。然而,我不会在元数据中使用个人用户。
如果您为每个文档创建组,那么出于安全原因,您将需要更少地重新索引。
对于给定文档,您可能拥有 access_roles:group_1、group_3
这样,group_1 和 group_3 始终保留对该文档的权限。但是,我可以改变每个用户所属的组并相应地调整查询。
当生成查询时,它总是作为查询的一部分传递用户组。如果我属于 group_1 和 group_2,我的查询将如下所示:
由于组是在查询中动态生成的,我只需从组中删除用户,当发出新查询时,它们将不再包含删除的组在查询中。因此,从 group_1 中删除用户将新创建一个如下查询:
该用户将无法再访问需要组 1 的所有文档。
这允许实时完成大多数更改,而无需重新索引文档。出于安全原因,您必须重新索引的唯一原因是您决定特定组不应再有权访问文档。
在许多现实场景中,这种情况应该相对罕见。人力资源文档似乎更有可能始终可供人力资源部门使用,但特定用户可能并不总是属于人力资源组。
希望有帮助。
I think my approach would be similar to @aitchnyu's answer. I would however NOT use individual users in the meta data.
If you create groups for each document, then you will have to reindex for security reason less often.
For a given document, you might have access_roles: group_1, group_3
In this way, the group_1 and group_3 always retain rights to the document. However, I can vary what groups each user belongs to and adjust the query accordingly.
When the query then is generated, it always passes as a part of the query the user's groups. If I belong to group_1 and group_2, my query will look like this:
Since the groups are dynamically generated in the query, I simply remove a user from the group, and when a new query is issued, they will no longer include the removed group in the query. So removing the user from group_1 would new create a query like this:
All documents that require group 1 will no longer be accessible to the user.
This allows most changes to be done in real-time w/out the need to reindex the documents. The only reason you would have to reindex for security reasons is if you decided that a particular group should no longer have access to a document.
In many real-world scenarios, that should be a relatively uncommon occurrence. It seems much more likely that HR documents will always be available to the HR department, however a specific user may not always be part of the HR group.
Hope that helps.
您可以使用 Solr 的 PostFilter 来实现您的安全模型。有关详细信息,请参阅 http://searchhub.org/2012/ 02/22/custom-security-filtering-in-solr/
注意:您应该缓存您的访问权限,否则性能会很糟糕。
You can implement your security model using Solr's PostFilter. For more information see http://searchhub.org/2012/02/22/custom-security-filtering-in-solr/
Note: you should probably cache your access rights otherwise performance will be terrible.
请记住,solr 是基于纯文本的搜索引擎、索引系统,为了便于快速搜索,您不应期望它具有 RDMS 风格的功能。 solr 不为正在索引的文档提供安全性,如果需要,您必须编写这样的实现。在这种情况下,您有两个选择。
1)只需将文档索引到 solr 中,并将授权详细信息保存到 RDBMS 中。现在查询 solr 进行搜索并收集返回的结果。现在向 DB 发起另一个查询,获取 solr 返回的文档 ID,以查看用户是否有权访问它们或不是。过滤掉那些正在操作的用户无权访问的文档。您就完成了!但事实并非如此,你的问题只是从这里开始。假设,如果 solr 返回的所有结果都被过滤掉怎么办? (假设您没有一次访问所有文档,意味着您仅从 solr 结果集中检索前 1000 个结果,否则您无法获得快速搜索)您必须再次查询 solr 以获取下一批结果集,并且必须迭代这些步骤直到您获得足够的结果来显示。
2)第二种方法是将授权元数据与 solr 中的文档一起索引。与 aitchnyu 所解释的相同。但是为了回答您对外部用户文档共享的查询,以及用户组和角色详细信息,您需要索引这些外部用户的 userid到 access_roles 字段中,或者您也可以将另一个字段添加到您的架构“access_user”中。现在,您可以修改外部用户共享的搜索查询,以将 access_user 字段包含到您的过滤器查询中。
例如
,现在最重要的事情是更新索引文档。这当然是乏味的任务,但是通过仔细的设计和异步处理以及 solrs 部分文档更新功能(solr 4.0=>),您可以使用 solr 实现相当好的 TPS 。如果您使用 solr <4.0,您可以拥有单独的系统用于搜索和更新,并且小心地充分使用负载均衡器和主从复制策略,您将会微笑!
Keeping in mind that solr is pure text based search engine,indexing system,to facilitate fast searching, you should not expect RDMS style capabilities from it. solr does not provide security for documents being indexed, you have to write such an implementation if you want. In that case you have two options.
1)Just index documents into solr and keep authorization details into RDBMS.Now query solr for your search and collect the results returned.Now fire another query to DB for the doc ids returned by solr to see if the user has an access to them or not.Filter out those documents on which user in action has no access.You are done ! But not really, your problem starts from here only.Assume, what if all results returned by solr gets filtered out ? (Assuming you are not accessing all the documents at a time,means you are retrieving top 1000 results only from solr result set,otherwise you can not get fast search) You have to query solr again for next bunch of result set and have to iterate these steps until you get enough results to display.
2)Second approach to this is to index authorization meta data along with document in solr.Same as aitchnyu has explained.But to answer your query for document sharing to an external user,along with usergroup and role detail, you index these external user's userid into access_roles field or you can just add an another field to your schema 'access_user' too. Now you can modify search queries for external user's sharing to include access_user field into your filter query.
e.g
Now the most important thing, update to an indexed documents.Well its off course tedious task, but with careful design and async processing along with solrs partial document update feature(solr 4.0=>), you can achieve reasonably good TPS with solr. If you are using solr <4.0 you can have separate systems for both searching and updates and with care full use of load balancer and master slave replication strategies you will have smile on your face !
据我所知,Solr 没有内置机制可以让您控制对文档的访问,而无需维护元数据的权限。如果您将其保留为真实的角色级别并且不向文档分配用户特定权限,则 aitchnyu 概述的方法似乎是合理的。这样您就可以为用户分配角色,这将使他们能够查看索引中的文档。当然,当角色发生变化时,您仍然需要重新索引文档,但希望您可以提前确定大多数所需的角色,并减少频繁重新索引的需要。
There are no built in mechanisms for Solr that I am aware of that will allow you to control access to documents without maintaining the rights in the metadata. The approach outlined by aitchnyu seems reasonable if you keep it a true role level and not assign user specific permissions to a document. That way you can assign roles to users and this will grant them the ability to see documents in the index. Granted you will still need to reindex documents when the roles change, but hopefully you can identify most of the needed roles ahead if time and reduce the need for frequent reindexing.