为搜索引擎爬虫创建表单身份验证 cookie

发布于 2024-12-18 10:08:27 字数 472 浏览 3 评论 0原文

大局观:我被要求为我们公司的内部网创建一个搜索引擎。这样的搜索引擎将为 Intranet 上的每个独立应用程序抓取由 XML 文件提供给它的页面。问题是,整个 Intranet 都使用表单身份验证,因此爬虫程序必须能够访问每个应用程序,而无需实际拥有用户凭据(例如用户名和密码)。

Intranet 中的每个应用程序的访问都由权限管理器控制,该管理器本质上是 ASP.NET 附带的默认角色管理器的包装。每个应用程序都可以定义自己的角色并分配具有这些角色的人员。

请注意,可能有数百个应用程序。

爬虫可以访问权限管理器的数据库,因此它知道所有角色是什么。因此,我的想法是让爬虫创建一个 cookie,将其标识为具有每个应用程序的所有角色。

我遇到的问题是:如何创建一个已分配角色的表单身份验证 cookie,而不创建相应的用户(IPrincipal)。

我完全有可能未能完全理解表单身份验证的工作原理,如果是这样,请告诉我我可以采取哪些不同的措施。

Big picture: I have been asked to create a search engine for our company's intranet. Such a search engine will crawl pages supplied to it by XML files for each independent application on the intranet. Problem is, the entire intranet is using Forms Authentication, and so the crawler will have to have access to each application without actually having user credentials (e.g. username and password).

Each application within the intranet has its access controlled by a permission manager, which is essentially a wrapper on the default Role Manager ASP.NET comes with. Each application can define its own roles and assign people who have those roles.

Please note that there are potentially hundreds of applications.

The crawler has access to the permission manager's database, so it knows what all the roles are. Therefore my idea was to have the crawler create a cookie that identifies it as having all roles for each application.

The problem I'm running into is this: how do I create a forms authentication cookie which already has the roles assigned in it without creating a corresponding user (IPrincipal).

It is entirely possible that I've failed to completely understand how Forms Authentication works, and if so, please tell me what I can do differently.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

灯下孤影 2024-12-25 10:08:27

这可能不是您想听到的,但是......

我会让爬虫像其他人一样进行身份验证。

鉴于这是您控制的爬虫,为什么要对抗表单身份验证呢?在每个应用程序中创建一个具有所有必需角色的用户似乎是合乎逻辑的(希望您有数百个应用程序的中央管理点,否则我不想成为那里的管理员;-)

如果您执行任何允许“仅爬网程序“特殊访问(绕过基于用户的身份验证基于......什么?爬虫的用户代理?特定的原始 IP?),您创建了一个安全漏洞,黑客可以利用该漏洞来访问所有 Intranet 应用程序,否则这些应用程序将无法访问已与用户认真确保安全ID、密码和角色(事实上,安全漏洞特别广泛,因为您建议授予系统中每个角色的访问权限)。

This is probably not what you want to hear, but...

I would just have the crawler authenticate like anyone else.

Given that this is a crawler you control, why fight Forms Authentication? Seems logical to create a user with all required roles in each application (hopefully you have a central administration point for the hundreds of apps, else I would not want to be an administrator there ;-)

If you do anything that allows "just the crawler" special access (bypass user-based authentication based on... what? The crawler's user agent? A specific origin IP?), you create a security hole that a hacker can leverage to gain access to all of the intranet applications that have otherwise been diligently secured with user IDs, passwords and roles (in fact, the security hole is particularly wide because you propose granting access to EVERY role in the system).

扮仙女 2024-12-25 10:08:27

听起来您想要的是适当加密的 System.Web.Security.FormsAuthenticationTicket (然后将其作为 cookie 附加到 HTTP 请求)。

加密逻辑位于System.Web.Security.FormsAuthentication.Encrypt()中,我认为它使用MachineKey作为加密密钥。另请查看 GetAuthCookie() 逻辑(使用 Reflector)。

您可能必须编写自己版本的加密方法,但只要您拥有远程站点加密密钥的副本,您想要做的事情应该是可能的。您不需要用户的密码——只需将用户名编码到票证中。

It sounds like what you want is an appropriately encrypted System.Web.Security.FormsAuthenticationTicket (which then gets attached to HTTP requests as a cookie).

The encryption logic is located in System.Web.Security.FormsAuthentication.Encrypt(), which I think uses the MachineKey as the encryption key. Also have a look at the GetAuthCookie() logic (using Reflector).

You might have to write your own version of the encryption method, but what you want to do should be possible, provided you have a copy of the remote site's encryption keys. You don't need the user's passwords -- only the user name is encoded into the Ticket.

花辞树 2024-12-25 10:08:27

在我看来,这个问题还没有明确定义(至少对我来说!)。
如果页面有细粒度的权限,为什么还需要抓取页面并为其建立索引?!如何在不违反权限的情况下显示搜索结果?为什么不通过完全传递页面来索引后端(我的意思是索引数据库记录而不是页面)......

It seems to me that the problem is not yet well defined, (at least to me!).
Why do you need to crawl the pages and index them if there are fine grained permissions on them?! How do you show search results without violating the permissions? Why not index the back end by passing the pages altogether (I mean index the database records not the pages)....

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文