是 spring-batch 对我来说,即使我没有使用 itemReader 和 itemWriter ?
spring-batch newbie:我有一系列批次,
- 从一些sql表中读取所有新记录(自上次执行以来)
- 上传所有 hadoop的新记录对
- 所有数据(旧的和新的)运行一系列map-reduce (pig)作业
- 将所有输出下载到本地并运行一些其他本地处理在所有输出
点上,我没有任何明显的“项目” - 我不想与数据中的特定文本行相关,我将所有这些文本作为一大块进行处理,并且不希望有任何提交间隔等等......
但是,我确实想要保持所有这些步骤松散耦合 - 例如,步骤 a+b+c 可能会成功几天并积累已处理的内容,而步骤 d 不断失败,然后当它最终成功时,它将读取并处理之前步骤的所有输出。
SO:我的“项目”是一个虚构的“工作项目”吗?它将表示整个新数据?我自己维护一系列队列并在它们之间传递这个虚构的工作项吗?
谢谢!
spring-batch newbie: I have a series of batches that
- read all new records (since the last execution) from some sql tables
- upload all the new records to hadoop
- run a series of map-reduce (pig) jobs on all the data (old and new)
- download all the output to local and run some other local processing on all the output
point is, I don't have any obvious "item" - I don't want to relate to the specific lines of text in my data, I work with all of it as one big chunk and don't want any commit intervals and such...
however, I do want to keep all these steps loosely coupled - as in, step a+b+c might succeed for several days and accumulate processed stuff while step d keeps failing, and then when it finally succeeds it will read and process all of the output of it's previous steps.
SO: is my "item" a fictive "working-item" which will signify the entire new data? do I maintain a series of queues myself and pass this fictive working-items between them?
thanks!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
人们总是认为 Spring Batch 的唯一用途实际上只是用于块处理。这是一个巨大的功能,但被忽视的是处理和作业控制的可见性。
给 5 个人相同的任务,但没有 Spring Batch,他们将按照自己的方式实施流程控制和可见性。给 5 个人相同的任务和 Spring Batch,您最终可能会得到以不同方式完成的自定义 Tasklet,但是访问作业元数据以及启动和停止作业将是一致的。从我的角度来看,它是工作管理的一个很好的工具。如果您已经编写了作业,并且不想重写它们以符合“项目”范例,则可以将它们实现为自定义微线程。您仍然会看到好处。
people always assume that the only use of spring batch is really only for the chunk processing. that is a huge feature, but what's overlooked is the visibility of the processing and job control.
give 5 people the same task with no spring batch and they're going to implement flow control and visibility their own way. give 5 people the same task and spring batch and you may end up with custom tasklets all done differently, but getting access to the job metadata and starting and stopping jobs is going to be consistent. from my perspective it's a great tool for job management. if you already have your jobs written, you can implement them as custom tasklets if you don't want to rewrite them to conform the 'item' paradigm. you'll still see benefits.
我没有看到问题所在。对我来说,您的场景似乎是 Spring Batch 的经典应用。
这里,一个项目是一条记录
此处相同
听起来像
StepListener
或ChunkListener
这是下一步。
我看到的唯一问题是如果您没有域对象作为记录。但即便如此,您也可以使用映射或数组,同时仍然使用 ItemReaders 和 ItemWriter。
I don't see the problem. Your scenario seems like a classic application of Spring Batch to me.
Here, an item is a record
Same here
Sounds like a
StepListener
orChunkListener
That's the next step.
The only problem I see is if you don't have Domain Objects for your records. But even then, you can work with maps or arrays, while still using ItemReaders and ItemWriters.