缓存整个数据宇宙?

发布于 2025-02-07 02:50:47 字数 545 浏览 1 评论 0原文

我正在使用一些要缓存的大型数据集 - 每次从数据库中查询速度有些慢,并且数据在一天的过程中不会发生变化,因此我正在寻找一种加载方法准备就绪时一次,并保留记忆以根据需要从内存中读取。但是,数据库调用基于数据子集,类似:

def get_db_results(pk=None, start_date=None, end_date=None):
    query = "SELECT * FROM dbo.large_table "

    if id:
        query += f"WHERE id = {pk}"
    if start_date:
        query += f"AND date >= '{start_date}'"
    if end_date:
        query += f"AND date <= '{end_date}'"

    return pd.read_sql(query, conn)

是否有任何现有库可以从缓存中推断出这些输入的基础哪些特定记录并返回该数据?还是我需要手动拉下数据并检查指定集的内存值?是否有任何最佳实践来设置?

I'm working with a few large data sets that I'm looking to cache - querying from the database each time is somewhat slow and the data won't change over the course of a day so I was looking for a way to load the whole dataset once when ready and hold in memory to read from as needed. However the database calls are based on subsets of data, something like:

def get_db_results(pk=None, start_date=None, end_date=None):
    query = "SELECT * FROM dbo.large_table "

    if id:
        query += f"WHERE id = {pk}"
    if start_date:
        query += f"AND date >= '{start_date}'"
    if end_date:
        query += f"AND date <= '{end_date}'"

    return pd.read_sql(query, conn)

Are there any existing libraries that would be able to infer from the cache what specific records are available base on those inputs and return just that data? Or would I need to manually pull down the data and check the in memory values for the specified set? Are there any best practices for this sort of set up?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。
列表为空,暂无数据
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文