缓存整个数据宇宙?
我正在使用一些要缓存的大型数据集 - 每次从数据库中查询速度有些慢,并且数据在一天的过程中不会发生变化,因此我正在寻找一种加载方法准备就绪时一次,并保留记忆以根据需要从内存中读取。但是,数据库调用基于数据子集,类似:
def get_db_results(pk=None, start_date=None, end_date=None):
query = "SELECT * FROM dbo.large_table "
if id:
query += f"WHERE id = {pk}"
if start_date:
query += f"AND date >= '{start_date}'"
if end_date:
query += f"AND date <= '{end_date}'"
return pd.read_sql(query, conn)
是否有任何现有库可以从缓存中推断出这些输入的基础哪些特定记录并返回该数据?还是我需要手动拉下数据并检查指定集的内存值?是否有任何最佳实践来设置?
I'm working with a few large data sets that I'm looking to cache - querying from the database each time is somewhat slow and the data won't change over the course of a day so I was looking for a way to load the whole dataset once when ready and hold in memory to read from as needed. However the database calls are based on subsets of data, something like:
def get_db_results(pk=None, start_date=None, end_date=None):
query = "SELECT * FROM dbo.large_table "
if id:
query += f"WHERE id = {pk}"
if start_date:
query += f"AND date >= '{start_date}'"
if end_date:
query += f"AND date <= '{end_date}'"
return pd.read_sql(query, conn)
Are there any existing libraries that would be able to infer from the cache what specific records are available base on those inputs and return just that data? Or would I need to manually pull down the data and check the in memory values for the specified set? Are there any best practices for this sort of set up?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论