FiveTran 连接每天恢复的 PostgreSQL 数据库
我已经设置了 Fivetran 连接器来连接到 EC2 服务器和雪花中的 PostgreSQL 数据库。连接似乎可以工作(没有错误),但数据并未真正更新。
在 EC2 服务器上,每天都有一个脚本会拉取应用程序生产数据库的最新转储并将其恢复到 EC2 服务器上,然后 Fivetran 连接器预计会将数据库同步到 Snowflake。但第一次设置日期之后的数据未与雪花同步。 FiveTran 可以在这样的设置中使用吗?如果是这样,您知道同步失败可能是什么问题吗?
I have set up a Fivetran connector to connect to a PostgreSQL database in an EC2 server and snowflake. The connection seems to work (no error), but the data is not really updated.
On the EC2 server, every day a script will pull down the latest dump of our app production database and restore it on the EC2 server, and then the Fivetran connector is expected to sync the database to snowflake. But the data after the first setup date is not synced with the snowflake. Could FiveTran be used in such a setup? If so, do you know what may be the issue of the sync failing?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
是的,但并不理想。
然而,如果没有更多上下文,很难回答这个问题:Fivetran 使用日志记录来复制您的数据库(在 PostgreSQL 中是 WAL),因此,如果您每天都恢复数据库,Fivetran 将无法跟踪更改,并且需要重新 -同步整个数据库。
NickW 提出的观点完全正确,为什么不从数据库复制呢?我认为答案与您需要修改的数据一致。您可以使用列阻塞和/或散列来防止敏感数据被传输,或者在将其刷新到 Snowflake 之前对其进行混淆。
Yes, but it's not ideal.
It's hard to answer this question without more context, however: Fivetran uses logging to replicate your DB (WAL in the case of PostgreSQL), so if you restore the DB every single day Fivetran will loose track of the changes and will need to re-sync the whole database.
The point made by NickW is completely valid, why not replicate from the DB? I assume the answer is along the lines of the data you need to modify. You can use column blocking and/or hashing to prevent sensible data from being transfered, or to obfuscate it before it's flushed to Snowflake.