寻找在 Cron 上构建 Feed 阅读器/聚合器的最佳实践
我有一个社交网站,该网站开始获得一些发展势头并且用户群不断扩大。目前,我们允许用户导入他们的博客、flickr 和 Twitter feed。我们使用 php 库 simplepie 来读取提要,然后检查数据库以确保每个找到的提要项没有重复的条目。如果提要项是新的,我们将其存储在数据库中。每个提要更新程序都在自己的 cron 上运行。因此,我们有一个用于 twitter feed,一个用于 flickr,一个用于博客。
我注意到该网站变得缓慢,很可能是在 cron 任务运行时发生的。必须有更好的方法来做到这一点。有什么想法吗?
I have a social networking site which is beginning to gain some momentum and has an expanding user-base. We currently allow the users to import their blog, flickr and twitter feeds. We use the php library simplepie to read the feeds and then we check the DB to make sure we do not have a duplicate entry for each found feed item. If the feed item is new, we store it in the DB. The feed updaters each run on their own cron. So we have one for twitter feeds, one for flickr and one for blogs.
I have noticed the site gets sluggish and it is most likely when the cron tasks are running. There must be a better way to do this. Any thoughts?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
总体思路很好,我不会改变它。
如果您确定是 cron 任务导致性能问题,那么我会在单独的服务器上运行它们。使用“批处理服务器”来独立于前端 Web 服务器运行此类作业是一种非常常见的解决方案。
但如果没有完全确定问题是什么,我不会着手进行任何改变来提高性能。据我所知,您的数据库模式可能效率极低。
The general idea is fine, I would not change that.
If you are sure that it is the cron tasks causing performance problems then I would run them on a separate server. Having a 'batch server' to run these sorts of jobs separate to the front-end web server is quite a common solution.
But I would not embark on any changes to improve performance without being absolutely sure what the problem is. For all I know, your database schema could just be horribly inefficient.
Ben James 在那里提出了一个很好的观点,你需要 100% 确定 cron 是原因。不过,除非您无法优化已有的服务器,否则我不会立即购买新服务器。
你经历过什么类型的迟缓?
掌握所有变量后,进行分析,然后知道在哪里优化。
Ben James gives a good point there, you need to be 100% sure that the cron's are the cause. I wouldn't jump on getting a new server yet tho, not until you are unable to optimize what you already have.
What type of sluggishness do you experience?
Do an analysis and then know where to optimize, once you have all the variables.