在 SQL 到 HBase 之间转换(同步)数据

发布于 2024-11-15 16:32:01 字数 301 浏览 6 评论 0原文

我们正在彻底改造我们的产品,从 Microsoft 和 .NET 系列完全转向开源(原因之一是成本削减和数据呈指数级增长)。

我们计划将我们的数据模型从 SQL Server(关系数据)完全迁移到 Hadoop(著名的键值对生态系统)。

一开始,我们希望支持两个版本(比如 1.0 和新的 v2.0)。为了保持数据的一致性,我们计划在两个系统之间同步数据,这是一项相当具有挑战性的任务,而且很容易出错,但我们没有任何其他选择。

我有点困惑从哪里开始,我正在向专家社区求助。 这个方向的任何策略/现有文献或任何其他类型的指导都会非常有帮助。

We are overhauling our product by completely moving from Microsoft and .NET family to open source (well one of the reasons is cost cutting and exponential increase in data).

We plan to move our data model completely from SQL Server (relational data) to Hadoop (the famous key-Value pair ecosystem).

In the beginning, we want to support both versions (say 1.0 and new v2.0). In order to maintain the data consistency, we plan to sync the data between both systems, which is a fairly challenging task and error prone, but we don't have any other option.

A bit confused where to start from, I am looking up to the community of experts.
Any strategy/existing literature or any other kind of guidance in this direction would be greatly helpful.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

香草可樂 2024-11-22 16:32:01

我不完全确定您的代码的结构,但如果您当前有数据或持久层 ,或者至少是一个执行所有 SQL 的数据库访问类,您可以重写保存函数以将更改写入两个数据库。如果您没有数据层,您可能需要考虑在开始转换之前编写一个数据层。

否则,您可以在 MSSQL 中添加 触发器 来更新 Hadoop,但不确定是什么您可以在 Hadoop 中保持 MSSQL 同步。

或者,您可以有一个每 x 分钟运行一次的进程,手动同步两个数据库。

就我个人而言,我会尽量避免维护两个记录数据库。将更改从新的实验数据库转移到稳定的数据库似乎有风险。您有可能破坏稳定的系统。相反,我会编写一个转换器将数据从关系数据库移动到 Hadoop。然后每天晚上左右,将数据复制到 Hadoop 中,并将其用于新系统的开发和测试。如果您说您的测试版只是一个测试平台,并且不会影响您的实际产品,我认为测试用户会理解。如果您计划对 UI 进行重大更改,并且担心有些人不想过渡到 2.0,那么您可能会尝试一次处理太多问题。

这些是我想出的解决方案......祝你好运!

I am not entirely sure how your code is structured, but if you currently have a data or persistence layer, or at least a database access class where all your SQL is executed through, you could override the save functions to write changes to both databases. If you do not have a data layer, you may want to considering writing one before starting the transition.

Otherwise, you could add triggers in MSSQL to update Hadoop, not sure what you can do in Hadoop to keep MSSQL in-sync.

Or, you could have a process that runs every x minutes, that manually syncs the two databases.

Personally, I would try to avoid trying to maintain two databases of record. Moving changes from a new, experimental database to your stable database seems risky. You stand the chance of corrupting your stable system. Instead, I would write a convertor to move data from your relational DB to Hadoop. Then every night or so, copy your data into Hadoop and use it for the development and testing of your new system. I think test users would understand if you said your beta version is just a test playground, and won't effect your live product. If you plan on making major changes to your UI and fear some will not want to transition to 2.0, then you might be trying to tackle too much at once.

Those are the solutions I came up with... Good luck!

喜你已久 2024-11-22 16:32:01

考虑使用 Flume (http://www.cloudera.com/blog/2010/07/whats-new-in-cdh3b2-flume/) 等排队工具在两个系统之间分割输入。

Consider using a queuing tool like Flume (http://www.cloudera.com/blog/2010/07/whats-new-in-cdh3b2-flume/) to split your input between both systems.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文