使用Clickhouse的数据缓存
Intro
我将Clickhouse作为数据仓库(具有数十亿行的表)。 用户使用我的应用程序后端与DWH进行交互,该应用程序将SQL查询生成ClickHouse。 不同的用户可以访问相同的数据(有时在查询中可以更改过滤条件)。 假定将来,Clickhouse将跨不同的服务器扩展。
目前
,我正在缓存频繁的SQL查询结果,并根据数据库中存储的表创建新表,并声明表格等于1天的TTL。 如果白天还有其他查询到达桌子,那么我确实会更改表并更新TTL 1天。 我怀疑这种方法是有效的。 我还存储了一个表格,在该表中修复了表的名称和上次访问的时间(以使用我的应用程序删除过时的空表)。
是否有可能在Clickhouse中实施有效访问最常用的数据或现成的机制的一些模式? 我还要感谢与文学的链接,在那里我可以熟悉此类信息或以不同的角度处理此问题。
Intro
I have ClickHouse as data warehouse (tables with billions of rows).
Users interact with the DWH using my application backend that generates SQL queries to ClickHouse.
Different users can access the same data (sometimes the WHERE filtering conditions can change in queries).
It is assumed that in the future ClickHouse will scale across different servers.
The task
At the moment, I am caching the results of frequent SQL queries with creating new tables based on those stored in the database and declaring a TTL for the table equal to 1 day.
If during the day another query arrives at the table, then I do ALTER TABLE and update the TTL for another 1 day.
I doubt that this method is efficient.
I also additionally store a table where I fix the name of the table and the time of the last access (in order to delete obsolete empty tables using my application).
Is it possible that there are some patterns for implementing efficient access to the most frequently used data or ready-made mechanisms in ClickHouse?
I would also be grateful for links to literature where I can get acquainted with such information or approach this issue from a different angle.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
Clickhouse没有缓存机制。另一方面,它严重依赖文件系统缓存。
ClickHouse does not have caching mechanism. On the other hand, it relies heavily on the file system cache.