亚马逊红移的数据摄入
我有多个数据源需要在AWS中构建和实现DWH。关于我的非结构化数据源之一(来自不同API的数据),我面临一个挑战。我如何从此来源将数据摄取到Amazon Redshift中???我们可以首先将其拉入Amazon S3存储桶中,然后将S3与Amazon Redshift集成?什么是更好的方法?
I have multiple data source from which I need to build and implement a DWH in AWS. I have one challenge with respect to one of my unstructured data source (Data coming from different APIs). How can I ingest data from this source into the Amazon Redshift??? Can we first pull it into Amazon S3 bucket and then integrate S3 with Amazon redshift? What is a better approach?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
是的,首先是S3。您的API可以写入S3或/,如果您愿意,则可以使用诸如Kinesis(或不使用Firehose)之类的服务来填充S3。从那里开始,它只是在红移中工作。
Yes, S3 first. You APIs can write to S3 or/and if you like you can use a service like Kinesis (with or without firehose) to populate S3. From there it is just work in Redshift.
在不了解来源的情况下,是的S3可能是正确的方法 - 无论您需要在几秒钟内,几分钟还是小时内延迟是一个重要的考虑因素。
如果延迟不是驾驶问题,请简单地:
如前所述,运动性可能会有价值,特别是如果您正在使用实时数据流(服务最近引入了支持S3并直接流向Redshift的支持)。
如果您不尝试分析实时流,S3可能是更容易的方法。
Without knowing more about the sources, yes S3 is likely the right approach - whether you require latency in seconds, minutes or hours will be an important consideration.
If latency is not a driving concern, simply:
As noted, there may be value in Kinesis, especially if you're working with real-time data streams (the service recently introduced support for skipping S3 and streaming directly to Redshift).
S3 is probably the easier approach, if you're not trying to analyze real-time streams.