是 spring-batch 对我来说,即使我没有使用 itemReader 和 itemWriter ?

发布于 2024-12-28 12:21:14 字数 535 浏览 1 评论 0原文

spring-batch newbie:我有一系列批次,

  • 从一些sql表中读取所有新记录(自上次执行以来)
  • 上传所有 hadoop的新记录对
  • 所有数据(旧的和新的)运行一系列map-reduce (pig)作业
  • 所有输出下载到本地并运行一些其他本地处理在所有输出

点上,我没有任何明显的“项目” - 我不想与数据中的特定文本行相关,我将所有这些文本作为一大块进行处理,并且不希望有任何提交间隔等等......

但是,我确实想要保持所有这些步骤松散耦合 - 例如,步骤 a+b+c 可能会成功几天并积累已处理的内容,而步骤 d 不断失败,然后当它最终成功时,它将读取并处理之前步骤的所有输出。

SO:我的“项目”是一个虚构的“工作项目”吗?它将表示整个新数据?我自己维护一系列队列并在它们之间传递这个虚构的工作项吗?

谢谢!

spring-batch newbie: I have a series of batches that

  • read all new records (since the last execution) from some sql tables
  • upload all the new records to hadoop
  • run a series of map-reduce (pig) jobs on all the data (old and new)
  • download all the output to local and run some other local processing on all the output

point is, I don't have any obvious "item" - I don't want to relate to the specific lines of text in my data, I work with all of it as one big chunk and don't want any commit intervals and such...

however, I do want to keep all these steps loosely coupled - as in, step a+b+c might succeed for several days and accumulate processed stuff while step d keeps failing, and then when it finally succeeds it will read and process all of the output of it's previous steps.

SO: is my "item" a fictive "working-item" which will signify the entire new data? do I maintain a series of queues myself and pass this fictive working-items between them?

thanks!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

不醒的梦 2025-01-04 12:21:15

人们总是认为 Spring Batch 的唯一用途实际上只是用于块处理。这是一个巨大的功能,但被忽视的是处理和作业控制的可见性。

给 5 个人相同的任务,但没有 Spring Batch,他们将按照自己的方式实施流程控制和可见性。给 5 个人相同的任务和 Spring Batch,您最终可能会得到以不同方式完成的自定义 Tasklet,但是访问作业元数据以及启动和停止作业将是一致的。从我的角度来看,它是工作管理的一个很好的工具。如果您已经编写了作业,并且不想重写它们以符合“项目”范例,则可以将它们实现为自定义微线程。您仍然会看到好处。

people always assume that the only use of spring batch is really only for the chunk processing. that is a huge feature, but what's overlooked is the visibility of the processing and job control.

give 5 people the same task with no spring batch and they're going to implement flow control and visibility their own way. give 5 people the same task and spring batch and you may end up with custom tasklets all done differently, but getting access to the job metadata and starting and stopping jobs is going to be consistent. from my perspective it's a great tool for job management. if you already have your jobs written, you can implement them as custom tasklets if you don't want to rewrite them to conform the 'item' paradigm. you'll still see benefits.

江湖正好 2025-01-04 12:21:15

我没有看到问题所在。对我来说,您的场景似乎是 Spring Batch 的经典应用。

  • 从某些sql表中读取所有新记录(自上次执行以来)

这里,一个项目是一条记录

  • 将所有新记录上传到hadoop

此处相同

  • 对所有数据(旧数据和新数据)运行一系列 map-reduce (pig) 作业

听起来像StepListenerChunkListener

  • 将所有输出下载到本地并对所有输出运行一些其他本地处理

这是下一步。


我看到的唯一问题是如果您没有域对象作为记录。但即便如此,您也可以使用映射或数组,同时仍然使用 ItemReaders 和 ItemWriter。

I don't see the problem. Your scenario seems like a classic application of Spring Batch to me.

  • read all new records (since the last execution) from some sql tables

Here, an item is a record

  • upload all the new records to hadoop

Same here

  • run a series of map-reduce (pig) jobs on all the data (old and new)

Sounds like a StepListener or ChunkListener

  • download all the output to local and run some other local processing on all the output

That's the next step.


The only problem I see is if you don't have Domain Objects for your records. But even then, you can work with maps or arrays, while still using ItemReaders and ItemWriters.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文