Web 服务适合 ETL 用途吗?

发布于 2024-08-16 03:58:51 字数 222 浏览 4 评论 0原文

我的公司正在考虑使用 Web 服务作为 ETL 流程的手段。然而,我认为 Web 服务不适合这个目的,原因如下: 1. Web 服务在生成大型 xml 时可能会消耗大量内存。 2.xml是一种臃肿的格式。 3. 如果服务器花费大量时间来生成数据,可能会超时 4. 文件大小限制? (对于 Windows,它是 2Gb,如果我没记错的话)

我不是 Web 服务专家,所以我需要您的意见。 :)

谢谢。

My company is considering using web service as mean of ETL process. However I don't think web service fit into this purpose, for several reasons:
1. web service could possibly consume a lot of memory when generating large xml.
2. xml is a bloated format.
3. possibly time-out if the server takes huge amount of time to generate data
4. file size limitation? (for windows, it's 2Gb, if my memory serves me right)

I am not a web service expert, so I need your opinions. :)

Thanks.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

海之角 2024-08-23 03:58:51

Web 服务工具棚中有大量技术可以规避您所阐述的所有问题。有面向流的 XML 粉碎、有用于交付的 XML 压缩格式、处理碎片和公平性的协议,还有许多可以容纳 TB 级数据的存储系统。

如果通过 Web 服务,您想象一些大学新生家庭作业的界面,该界面接受单个 glop 参数,其中包含 2GB 序列化表,那么您的所有参数都是有效的。但是,如果您将需求提供给经验丰富的团队,并且了解 WS-ReliableMessagingWS-Transaction 那么没有理由不围绕 Web 服务建立 ETL 流程。请注意,我并不提倡 SOAP 协议本身,但我确实提倡了解和理解所涉及的概念。

话虽如此,面向 Web 服务的 ETL 流程是否对您有意义取决于一系列其他原因。但是,您对 Web 服务技术的反驳是站不住脚的。

There are plenty of technologies in the Web Services tool shed that circumvent all the problems you elaborate. There is stream oriented XML shredding, there are XML compression formats for delivery, protocols that deal with fragmentation and fairness and there are many a storage systems that can hold terabytes upon terabytes of data.

If by web service you imagine some college freshmen homework concoction of an interface that accepts a single glop argument with a 2GB serialized table in it then all your arguments are valid. But if you give your requirements to an experienced team with knowledge of the concepts involved in WS-ReliableMessaging and WS-Transaction then there is no reason not to have an ETL process around Web Services. Note that I do not advocate the SOAP protocols per-se, but I do advocate knowledge and understanding of the concepts involved.

Now that being said, whether an Web Service oriented ETL process makes sense for you or not it depends on a whole set of other reasons. However, your rebuttal of the Web Service technologies does not hold water.

倥絔 2024-08-23 03:58:51

我不会使用 Web 服务来执行 ETL 任务。有更适合该任务的专用工具(例如 Ab Initio、Informatica 等)。

如果您有大量数据,我想说网络引入的额外延迟的代价将是令人望而却步的。

I would not use a web service for an ETL task. There are specialized tools for that task (e.g., Ab Initio, Informatica, etc.) that are better suited.

If you have a large amount of data, I'd say that the price of the extra latency that the network would introduce would be prohibitive.

撩起发的微风 2024-08-23 03:58:51

这确实取决于您正在做什么以及您如何努力实现它。一般来说,与 ETL 流程相比,Web 服务需要更多的关注和支持,但它们在完成任务时也能非常有效。我没有得到足够的关于你的场景的细节来说明它是否有效。

我曾研究过传输和接收 100+ MB 文档的 Web 服务,其中一些以 XML 编码,有些则没有,并且在几秒钟内完成(在封闭的本地网络上)。这些服务需要大量的调整和规划,但它们确实适合我们的场景,并且允许各种客户端通过相当标准的接口连接和传输不同数量的数据。这与我们的其他一些 ETL 作业不同,该作业特定于每个客户,并且必须为每个客户进行设置和维护。

这完全取决于您在做什么以及您的限制是什么。

如果您打算采用这条路线,请坐下来起草从头到尾的流程,包括您希望客户端如何连接,验证是否已收到数据并验证作业是否已完成。考虑一些场景、客户端和传输的数据类型,然后计算出需要什么。将其与其他工具中已有的功能进行比较,以及完成它需要多少时间。

It really does depend on what you are doing and how you are trying to accomplish it. In general webservices require more care and feeding than you would normally put into an ETL process, but they can be surprisingly effective at the task as well. I did not get enough specifics for your scenario to say whether it would work.

I have worked on Webservices which transmit and recieve 100+ MB documents, some encoded in XML some not, and do it in seconds (on a closed local network). These services required a good deal of tuning and planning, but they did work well for our scenario and they allowed a wide variety of clients to connect and transmit differing amounts of data through a fairly standard interface. This differed from some of the other ETL jobs we had were the job was specific to each client and had to be setup and maintained for each client.

It all depends on what you are doing and what your constraints are.

If you are going to pursue this route sit down and draft out the process from beginning to end, including how you want clients to connect, verify that the data was received and verify that the job is finished. Consider some of the scenarios, the clients and the types of data being transmitted and then work out what would be needed. Contrast that with what is already available in other tools, and how much time you have to get it done.

谷夏 2024-08-23 03:58:51

我真的很想知道为什么贵公司考虑使用真正的 ETL 工具,就像 duffymo 的回答Talend 或 CloverETL(如果可以选择开源)。

  1. 它们通常适用于 ETL 目的:)
  2. 构建自己的解决方案听起来就像重新发明轮子。
  3. 其中许多具有面向 Web 服务的功能(请参阅将作业导出为 Web 服务<例如,Talend 的 wiki 中的 /a> 或 CloverETL Server HTTP 启动服务)。

我不是 ETL 产品专家,也没有检查所有这些产品,但我很确定这是值得考虑的事情。

I'm really wondering why your company is not considering using a real ETL tool like like those mentioned by duffymo in his answer or, Talend or CloverETL if open source is an option.

  1. They are in general good for ETL purpose :)
  2. Building your own solution sounds like reinventing the wheel.
  3. Many of them have web services oriented features (see Export a job as webservice in Talend's wiki or CloverETL Server HTTP Launch Services for example).

I'm not an ETL product expert and I didn't check them all but I'm pretty sure this is something to consider.

手心的海 2024-08-23 03:58:51

首先查找 MTOM,它允许在 Web 服务中传输任意非 XML 数据。

Look up MTOM, to start with, which allows arbitrary non-XML data to be streamed in a web service.

活雷疯 2024-08-23 03:58:51

Web 服务非常适合 ETL 任务。请记住,每个任务都将在其自己的线程中免费处理,并且可以保证在请求之间进行适当的清理。在 Tomcat 之类的东西中使用 Web 服务不会像您想象的那么繁重。

如果您担心 XML 的臃肿,请考虑 JSON 格式。

Web services are just fine for ETL tasks. Remember that each task is going to get handled in its own thread for free, and you're guaranteed proper cleanup between requests. Using web services inside something like Tomcat wouldn't be nearly as heavy as you think.

If you're concerned over the bloat of XML, consider JSON format.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文