Ruby on Rails 项目需要过滤器/观察器的替代方案
Rails 有一组很好的过滤器(before_validation、before_create、after_save 等)以及对观察者的支持,但我面临着依赖过滤器或观察者的计算成本太高的情况。 我需要一个替代方案。
问题:我正在记录网络服务器对大量页面的点击。 我需要的是一个触发器,当给定页面被查看超过 X 次时,该触发器将执行操作(例如发送电子邮件)。 由于页面数量和点击量巨大,使用过滤器或观察器会导致大量时间浪费,因为 99% 的情况下,它测试的条件都是错误的。 电子邮件不必立即发送(即,5-10 分钟的延迟是可以接受的)。
相反,我正在考虑实施某种流程,每 5 分钟左右扫描一次数据库,并检查哪些页面被点击超过 X 次,将该状态记录在新的数据库表中,然后发送相应的电子邮件。 它并不完全优雅,但它会起作用。
还有其他人有更好的主意吗?
Rails has a nice set of filters (before_validation, before_create, after_save, etc) as well as support for observers, but I'm faced with a situation in which relying on a filter or observer is far too computationally expensive. I need an alternative.
The problem: I'm logging web server hits to a large number of pages. What I need is a trigger that will perform an action (say, send an email) when a given page has been viewed more than X times. Due to the huge number of pages and hits, using a filter or observer will result in a lot of wasted time because, 99% of the time, the condition it tests will be false. The email does not have to be sent out right away (i.e. a 5-10 minute delay is acceptable).
What I am instead considering is implementing some kind of process that sweeps the database every 5 minutes or so and checks to see which pages have been hit more than X times, recording that state in a new DB table, then sending out a corresponding email. It's not exactly elegant, but it will work.
Does anyone else have a better idea?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
我必须在这里写一些东西,以便 stackoverflow 代码突出显示第一行。
让 cron 作业运行
rake fancy_counter:process
无论您希望它运行多少次。I have to write something here so that stackoverflow code-highlights the first line.
Have a cron job run
rake fancy_counter:process
however often you want it to run.我曾经是编写自定义广告服务器的团队的一员,该服务器具有相同的要求:监视每个文档的点击次数,并在达到某个阈值后执行某些操作。 该服务器将为具有大量流量的现有超大型站点提供支持,而可扩展性是一个真正令人担忧的问题。 我的公司聘请了两名 Doubleclick 顾问来征求他们的意见。
他们的观点是:保存任何信息的最快方法是将其写入自定义 Apache 日志指令中。 因此,我们建立了一个网站,每次有人点击文档(广告、页面,都一样)时,处理请求的服务器就会向日志写入一条 SQL 语句:“INSERT INTO 印象数(时间戳、页面、IP 等) ) VALUES (x, 'path/to/doc', y, 等等);" -- 所有输出均使用来自网络服务器的数据动态输出。 每隔 5 分钟,我们就会从 Web 服务器收集这些文件,然后一次将它们全部转储到主数据库中。 然后,在闲暇时,我们可以解析该数据以执行我们满意的任何操作。
根据您的具体要求和部署设置,您可以执行类似的操作。 检查是否超过某个阈值的计算要求可能仍然比执行 SQL 来增加值或插入行还要小(这里猜测)。 您可以通过记录命中(特殊格式或非特殊格式)来消除这两种开销,然后定期收集它们,解析它们,将它们输入到数据库,并用它们做任何您想做的事情。
I was once part of a team that wrote a custom ad server, which has the same requirements: monitor the number of hits per document, and do something once they reach a certain threshold. This server was going to be powering an existing very large site with a lot of traffic, and scalability was a real concern. My company hired two Doubleclick consultants to pick their brains.
Their opinion was: The fastest way to persist any information is to write it in a custom Apache log directive. So we built a site where every time someone would hit a document (ad, page, all the same), the server that handled the request would write a SQL statement to the log: "INSERT INTO impressions (timestamp, page, ip, etc) VALUES (x, 'path/to/doc', y, etc);" -- all output dynamically with data from the webserver. Every 5 minutes, we would gather these files from the web servers, and then dump them all in the master database one at a time. Then, at our leisure, we could parse that data to do anything we well pleased with it.
Depending on your exact requirements and deployment setup, you could do something similar. The computational requirement to check if you're past a certain threshold is still probably even smaller (guessing here) than executing the SQL to increment a value or insert a row. You could get rid of both bits of overhead by logging hits (special format or not), and then periodically gather them, parse them, input them to the database, and do whatever you want with them.
保存点击模型时,更新页面模型中存储点击总数的冗余列,这会花费您 2 个额外的查询,因此也许每个点击需要两倍的时间来处理,但您可以决定是否需要发送带有简单 if 的电子邮件。
你原来的解决方案也不错。
When saving your Hit model, update a redundant column in your Page model that stores a running total of hits, this costs you 2 extra queries, so maybe each hit takes twice as long to process, but you can decide if you need to send the email with a simple if.
Your original solution isn't bad either.
Rake 任务很好! 但是您最终将为添加的每个后台作业编写更多的自定义代码。 查看延迟作业插件 http://blog.leetsoft.com/ 2008/2/17/delayed-job-dj
DJ 是一种依赖于一个简单数据库表的异步优先级队列。 根据 DJ 网站,您可以使用如下所示的 Delayed::Job.enqueue() 方法创建作业。
Rake tasks are nice! But you will end up writing more custom code for each background job you add. Check out the Delayed Job plugin http://blog.leetsoft.com/2008/2/17/delayed-job-dj
DJ is an asynchronous priority queue that relies on one simple database table. According to the DJ website you can create a job using Delayed::Job.enqueue() method shown below.