缓存整个数据宇宙？

发布于 2025-02-07 02:50:47 字数 545 浏览 1 评论 0原文

我正在使用一些要缓存的大型数据集 - 每次从数据库中查询速度有些慢，并且数据在一天的过程中不会发生变化，因此我正在寻找一种加载方法准备就绪时一次，并保留记忆以根据需要从内存中读取。但是，数据库调用基于数据子集，类似：

def get_db_results(pk=None, start_date=None, end_date=None):
    query = "SELECT * FROM dbo.large_table "

    if id:
        query += f"WHERE id = {pk}"
    if start_date:
        query += f"AND date >= '{start_date}'"
    if end_date:
        query += f"AND date <= '{end_date}'"

    return pd.read_sql(query, conn)

是否有任何现有库可以从缓存中推断出这些输入的基础哪些特定记录并返回该数据？还是我需要手动拉下数据并检查指定集的内存值？是否有任何最佳实践来设置？

原文

I'm working with a few large data sets that I'm looking to cache - querying from the database each time is somewhat slow and the data won't change over the course of a day so I was looking for a way to load the whole dataset once when ready and hold in memory to read from as needed. However the database calls are based on subsets of data, something like:

def get_db_results(pk=None, start_date=None, end_date=None):
    query = "SELECT * FROM dbo.large_table "

    if id:
        query += f"WHERE id = {pk}"
    if start_date:
        query += f"AND date >= '{start_date}'"
    if end_date:
        query += f"AND date <= '{end_date}'"

    return pd.read_sql(query, conn)

Are there any existing libraries that would be able to infer from the cache what specific records are available base on those inputs and return just that data? Or would I need to manually pull down the data and check the in memory values for the specified set? Are there any best practices for this sort of set up?

分享到QQ

分享到微博