关于高流量网站缓存的问题
假设我们正在构建一个电子商务网站,允许消费者通过输入关键字来搜索产品。假设最多有 20 万种产品,并且有数百万消费者使用该系统。假设产品表更新得相当频繁。由于产品数量不是那么多,我们可以将整个产品表存储在内存中并对其进行搜索,而不是访问数据库。我们希望创建存储相同数据但驻留在不同服务器中的分布式缓存(出于高可用性和性能原因),并且我们需要能够在这些缓存之间同步数据并在修改产品表时使缓存失效。
我们的应用程序是使用 ASP.NET MVC 和 NHibernate 构建的。我试图了解 NHibernate 的 2 级缓存是否对我的情况有所帮助。如果你们能对此有所了解,我将非常感激。
我知道二级缓存将有助于缓存查询结果,因此如果两个不同的用户使用相同的关键字进行搜索,二级缓存将从缓存而不是从数据库提供结果。但这对我们没有多大帮助,因为产品表会频繁更新,并且缓存的结果会过时。 我的问题是我是否正确理解 L2 缓存,是否存在任何可以帮助按照我想要的方式管理缓存的内容(多个缓存、相同的数据、缓存之间的同步和无效缓存)。任何想法都受到高度赞赏。
Suppose we are building an E-commerce site that allows consumers to search for products by typing in keywords. Say there are at most 200,000 products, and there are millions of consumers using the system. Let’s say the product table is updated fairly frequently. Since the number of products is not that high and we can probably store the entire product table in memory and search against it instead of hitting the database. We are hoping to create distributed caches that store the same data but reside in different servers (for high availability and performance reason) and we need to be able to synchronize data among these caches and invalidate caches when product table is modified.
Our application is built using ASP.NET MVC and NHibernate. I am trying to understand whether NHibernate’s level-2 caching would help with my situation. I would really appreciate if you guys can shed some light on this.
I understand that level-2 caching will help cache query result so if two different users are searching using the same keyword, the L2 Cache will serve the result from the cache instead of from the database. But it doesn’t help us much since the product table is updated frequently and the cached result will be stale.
My question is am I understanding L2 caching correctly and is there exists anything that help manage cache the way I would like to do (multiple caches, the same data, synchronize between cache and invalidate cache). Any thoughts is highly appreciated.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
使用二级缓存(使用 memcached 提供程序)和 NHibernate.Search 附加组件后,在我看来,您可以从两者中受益。
NHibernate.Search 组件依赖于 Lucene.Net,关键字搜索与数据库本身是解耦的。每个映射的类都会创建一个不同的索引文件,并且可以使用属性在属性级别上设置优化,从而为您提供额外的粒度级别。此外,您还可以实现最佳匹配和建议(检查“行动中的 Lucene”和/或“行动中的 Hibernate 搜索”)。请注意,您不必维护索引(除非您明确请求重建索引);尽管您可以根据需要操作索引,但该实现会在幕后管理所有内容。因此,添加/删除/更新产品将自动更新相应的索引。
对于二级缓存,您可以获得即时性能提升。在具有大约 200 万行数据集的测试环境中,即使请求数极低,我也获得了超过 20% 的改进。随着请求计数的增加,性能提升逐渐变大 - 应用程序首先访问二级缓存,如果没有找到,则访问数据库以获取所需的行并将它们插入到缓存中以供将来查询。同样,您可以管理缓存持续时间和其他配置设置等内容,如果您愿意的话,还可以显式清除缓存(全部、部分或特定条目)。请注意,缓存状态由应用程序在保存/更新/删除期间管理。
为了可扩展性
* 二级缓存取决于提供商(即memcached 具有高性能和可扩展性并支持分布式实例)。
* 对于 Lucene.Net/NHibernate.Search,您需要设置索引驻留的特定位置,并且所有 Web 应用程序实例都必须可以访问该位置以进行读/写。请注意,这里的敏感链接是 I/O 和文件争用,因此设置一台具有比光速更快的文件系统的计算机将防止这种情况发生(我指的是每秒有数千个搜索请求的场景
)请注意,我强烈推荐 NHibernate.Search,因为它比 LIKE 查询要快得多,并且比在应用程序中实现 SQL-Server 的全文搜索(我已经完成)更容易使用。
Having used both the second-level cache (using the memcached provider) and the NHibernate.Search add-on it seems to me you could benefit from both.
The NHibernate.Search component depends on Lucene.Net and keyword search is decoupled from the Database it self. A different index file is created per class mapped and optimizations can be set on the property level using attributes, giving you an extra level of granularity. Additionally, you can implement best match and propositions (check Lucene in Action and/or Hibernate Search in action). As a note, you don't have to maintain the index (unless you explicitly request an index rebuild); the implementation manages everything behind the scenes although you can manipulate the index if you wish to do so. So, adding/deleting/updating a product will automatically update the according index.
For the second-level cache you get instant performance boost. On a test environment with a data set of approx 2 mil rows i had more than 20% improvement even on an extremely low request count. The performance boost is gradually larger as the request count increases - the application first hits the 2nd level cache and if it does not find it then hits the DB to fetch the required rows and inserts them on the cache for future queries. Again you can manage stuff like cache duration and other configuration settings, as well as explicitly clear the cache (all of it, a part of it, or particular entries) if you wish to do so. Note that cache state is managed by the application during save/update/delete.
For scallability
* the 2nd level cache depends on the provider (ie memcached is highly performant and scalable and supports distributed instances).
* for the Lucene.Net/NHibernate.Search you will need to set up a specific place that the indexes will reside and that place must be accessible for read/write by all web-application instances. Note here that the sensitive link is I/O and file contention, so setting up a machine with a faster than light file system will prevent that from happening (i am speaking for your scenario with many thousands of search requests per second)
As a side note i would highly recommend NHibernate.Search since it is extremely faster than LIKE queries and is easier to use than implementing SQL-Server's FullText search inside the application (which i have done).
二级缓存是否有帮助取决于产品表相对于缓存命中的更新频率。如果您每小时添加 100 个新产品,但每小时收到 10,000 个查询,即使 10% 的缓存命中率也会产生很大的差异。如果速率相反,二级缓存将几乎没有任何价值。
我建议您设置一个与您的生产环境非常接近的压力测试环境,并对各种二级缓存提供程序执行基准测试。
另请检查您的数据库是否针对更新频繁的场景进行了正确配置。
Whether a second level cache will help depends on exactly how frequently your product table is updated in relation to cache hits. If you add 100 new products an hour but receive 10,000 queries an hour, even a 10% cache hit rate will make a big difference. If the rates are reversed, a second level cache will be of almost no value.
I suggest you set up a stress test environment that closely approximates your production environment and perform benchmarking on the various second level cache providers.
Also check that your DB is configured properly for an update-heavy scenario.
我建议使用 NHibernate.Search w/卢塞恩。它与二级缓存一起工作。 Lucene 可以快速执行复杂的文本搜索,然后将实体键返回给 NHibernate,NHibernate 将完整的实体从其二级缓存中拉出。 NHibernate.Search 扩展负责保持 Lucene 索引同步。
TekPub 最近做了一集关于搜索产品描述的具体场景。本集比较了 NHibernate 查询、SQL 全文索引和 Lucene w/ NHibernate.Search。
I recommend using NHibernate.Search w/ Lucene. It works together with the 2nd level cache. Lucene can do sophisticated text searching ripping fast and then return back the entity keys to NHibernate which pulls the full entity out of its 2nd level cache. The NHibernate.Search extension does the work of keeping your Lucene index in sync.
TekPub did a recent episode on your exact scenario of searching product descriptions. The episode compares NHibernate queries, SQL Full-text indexing and Lucene w/ NHibernate.Search.