数以百万计的匿名 ASP.Net 配置文件?

发布于 2024-08-31 02:37:01 字数 360 浏览 6 评论 0原文

更新:我刚刚意识到我们正在使用 Google 迷你搜索来抓取网站,以便支持 Google 搜索。这必然不仅会为每次抓取甚至每个页面创建一个匿名配置文件 - 这可能吗?

大家好,需要一些建议!

我们的网站每天收到大约 50,000 次点击,并且我们使用匿名 ASP.Net 会员配置文件/用户,这导致数以百万计(目前为 450 万)的“活跃”配置文件,并且数据库正在“爬行”,我们每晚都有一个任务清理所有不活动的。

不可能我们有450万独立访客(我们县人口只有二分之一),难道是爬虫和蜘蛛造成的?

另外,如果我们必须忍受如此大量的配置文件,是否有优化数据库的方法?

谢谢凯夫

UPDATE: I've just realised that we are using Google Mini Search to crawl the website in order for us to support Google Search. This is bound to be creating an anonymous profile for not only each crawl but maybe even each page - would that be possible?

Hi all, some advice needed!

Our website receives approximately 50,000 hits a day, and we use anonymous ASP.Net membership profiles/users, this is resulting in millions (4.5m currently) of "active" profiles and the database is 'crawling', we have a nightly task that cleans up all the inactive ones.

There is no way that we have 4.5m unique visitors (our county population is only 1/2 million), could this be caused by crawlers and spiders?

Also, if we have to live with this huge number of profiles is there anyway of optimising the DB?

Thanks

Kev

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

浊酒尽余欢 2024-09-07 02:37:01

更新以下对话:

我可以建议您实现一个过滤器,可以通过请求标头识别爬虫,并记录当天晚些时候可以使用的匿名 cookie。解密并删除带有关联 UserId 的匿名 aspnet_profile 和 aspnet_users 记录。

您可能正在打一场失败的战斗,但至少您会清楚地了解所有流量来自何处。


AnonymousId cookie 和代理匿名配置文件在上次使用后 90 天内有效。这可能会导致匿名配置文件堆积。

处理此问题的一个非常简单的方法是使用 ProfileManager

ProfileManager.DeleteInactiveProfiles(ProfileAuthenticationOption.Anonymous, DateTime.Now.AddDays(-7));

将清除过去 7 天内未访问过的所有匿名个人资料。

但这会在 aspnet_Users 中留下匿名记录。 Membership 不会公开类似于 ProfileManager 的方法来删除过时的匿名用户。

所以...

最好的选择是原始 sql 攻击,从您认为过时的 aspnet_Profile 中删除,然后在 IsAnonymous = 1 的 aspnet_User 上运行相同的查询。

祝你好运。一旦你把它清理干净,就留在上面。


更新的更新:

下面的代码仅在 IIS7 上有效,并且如果您通过 ASP.Net 传送所有请求

您可以实现一个监视请求的模块 robots.txt并获取匿名 id cookie 并将其存储在机器人表中,您可以使用该表每晚安全地清除机器人元的会员资格/个人资料表。这可能会有所帮助。

示例:

using System;
using System.Diagnostics;
using System.Web;

namespace NoDomoArigatoMisterRoboto
{
    public class RobotLoggerModule : IHttpModule
    {
        #region IHttpModule Members

        public void Init(HttpApplication context)
        {
            context.PreSendRequestHeaders += PreSendRequestHeaders;
        }

        public void Dispose()
        {
            //noop
        }

        #endregion

        private static void PreSendRequestHeaders(object sender, EventArgs e)
        {
            HttpRequest request = ((HttpApplication)sender).Request;

            

            bool isRobot = 
                request.Url.GetLeftPart(UriPartial.Path).EndsWith("robots.txt", StringComparison.InvariantCultureIgnoreCase);

            string anonymousId = request.AnonymousID;

            if (anonymousId != null && isRobot)
            {
                // log this id for pruning later
                Trace.WriteLine(string.Format("{0} is a robot.", anonymousId));
            }
        }
    }
}

参考:http://www. codeproject.com/Articles/39026/Exploring-Web-config-system-web-httpModules.aspx


Update following conversation:

Might I suggest that you implement a filter that can identify crawlers via request headers, and logging the anon cookie which you can later that same day. decrypt and delete the anon aspnet_profile and aspnet_users record with the associated UserId.

You might be fighting a losing battle but at least you will get a clear idea of where all the traffic is coming from.


AnonymousId cookies and, by proxy, anonymous profiles are valid for 90 days after last use. This can result in the anon profiles piling up.

A very simple way to handle this is to use ProfileManager.

ProfileManager.DeleteInactiveProfiles(ProfileAuthenticationOption.Anonymous, DateTime.Now.AddDays(-7));

will clear out all the anonymous profiles that have not been accessed in the last 7 days.

But that leaves you with the anonymous records in aspnet_Users. Membership does not expose a method similar to ProfileManager for deleting stale anonymous users.

So...

The best bet is a raw sql attack, deleting from aspnet_Profile where you consider them stale, and then run the same query on aspnet_User where IsAnonymous = 1.

Good luck with that. Once you get it cleaned up, just stay on top of it.


Updated Update:

The code below is only valid on IIS7 AND if you channel all requests through ASP.Net

You could implement a module that watches for requests to robots.txt and get the anonymous id cookie and stash it in a robots table which you can use to safely purge your membership/profile tables of robot meta every night. This might help.

Example:

using System;
using System.Diagnostics;
using System.Web;

namespace NoDomoArigatoMisterRoboto
{
    public class RobotLoggerModule : IHttpModule
    {
        #region IHttpModule Members

        public void Init(HttpApplication context)
        {
            context.PreSendRequestHeaders += PreSendRequestHeaders;
        }

        public void Dispose()
        {
            //noop
        }

        #endregion

        private static void PreSendRequestHeaders(object sender, EventArgs e)
        {
            HttpRequest request = ((HttpApplication)sender).Request;

            

            bool isRobot = 
                request.Url.GetLeftPart(UriPartial.Path).EndsWith("robots.txt", StringComparison.InvariantCultureIgnoreCase);

            string anonymousId = request.AnonymousID;

            if (anonymousId != null && isRobot)
            {
                // log this id for pruning later
                Trace.WriteLine(string.Format("{0} is a robot.", anonymousId));
            }
        }
    }
}

Reference: http://www.codeproject.com/Articles/39026/Exploring-Web-config-system-web-httpModules.aspx


站稳脚跟 2024-09-07 02:37:01

您可以尝试删除 Global.asax.cs 文件中 Session_End 事件中的匿名配置文件。

您的网站很可能正在被合法的搜索引擎爬虫和/或非法爬虫抓取,以寻找允许黑客控制您的网站/服务器的漏洞。无论您采用哪种解决方案来删除旧配置文件,您都应该研究这一点。

如果您使用默认的配置文件提供程序,它将所有配置文件信息保存在一列中,您可能需要阅读 此链接,指向 Scott Guthrie 的关于性能更好的基于表的配置文件提供程序的文章。

You could try deleting anonymous profiles in the Session_End event in your Global.asax.cs file.

There is every likelyhood that your site is being crawled, either by a legitimate search engine crawler and/or by an illegal crawler looking for vulnerabilities that would allow hackers to take control of your site/server. You should look into this, regardless of which solution you take for removing old profiles.

If you are using the default Profile Provider, which keeps all of the profile information in a single column, you might want to read this link which is to Scott Guthrie's article on a better performing table-based profile provider.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文