微软Azure的架构。 CSV 到 SQL
我是一名实习生,负责研究 azure 项目。
目前我正在为该项目的一部分设计一个架构。
目标是将多个 csv 文件转换为云中的 SQL 数据库。这些 csv 将从该国的随机位置发送,需要进行处理,以便最终可以使用 Web 服务访问数据库。
我对蔚蓝的场景完全陌生,一直在自学,但我的头脑有点模糊。
一些信息:
csv 是小文件,但每天会收到大约 20.000 个 是的,它需要是 SQL 存储,因为我们需要轻松聚合数据。
csv 中包含哪些内容并且需要存储?
唯一的键值(字符串)
消费值(双倍)
日期时间戳(日期时间/字符串)
质量值(int)
我想到的架构是:
向云端发送Http请求(云端需要监听服务吗?)
在处理 csv 之前保存它们的队列服务
sql 驱动器存储(直接导入?或者我是否需要中间某种辅助角色?)
一种 Web 服务,将从外部 AOS 或客户端应用程序获取请求,并查询 sqlDB 中的数据。
我是否正确地假设这个问题可以通过标准组件解决,或者我是否需要实现虚拟机角色?你会如何设置这个?
任何意见将不胜感激,因为我真的感觉迷失在云端:)
我希望我清楚地概述了要求......
解释你自己不完全理解的事情并不容易
I'm an intern in charge of researching an azure project.
At the moment i'm devising an architecture for a part of the project.
The goal is to convert multiple csv files to an SQL database in the cloud. These csv's will be sent from random locations in the country and need to be processed so the database can eventually be accessed using a web service.
I'm totally new to the azure scene and have been schooling myself but it's all a bit fuzzy in my head.
some info:
The csv's are small files but about 20.000 would be received daily
yes it needs to be SQL storage because we need to aggregate the data easily.
what will be in the csv and needs to be stored??
a unique key value (string)
a consumption value (double)
a datetime stamp (datetime/string)
a quality value (int)
The architecture i had in mind would be:
Http requests to the cloud (does the cloud need a listener service?)
A queue service that holds the csv's before they are processed
The sql drive storage (direct import? or do i need some kind of worker role in between?)
A web service that will get requests from an external AOS or a client application with a query for the data in the sqlDB.
Am i correct in assuming this problem can be solved with standard components or do i need to implement a vm role? How would you set this up?
Any input would be much appreciated because i really feel lost in the clouds :)
I hope I gave a clear overview of the requirements...
It's not easy explaining something you don't fully grasp yourself
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您根本不需要 VM 角色。这是一个稻草人的想法:
绝对不需要 VM 角色。
You don't need a VM role at all. Here's a strawman idea:
Absolutely no need for VM Role.
是否有理由不能仅使用 BCP(批量复制)将数据直接导入到 SQL Azure 中? BCP 支持 CSV 文件,我怀疑您可以创建一个非常简单的流程来使用此工具每天导入数据。如果您这样做,请确保您阅读了一些可以优化数据负载。如果您有大量数据集,这确实会产生影响。
Is there a reason why you can not just use BCP (Bulk Copy) to import the data directly into SQL Azure? BCP supports CSV files and I suspect you could create a pretty simple process to import the data on a daily basis using this tool. If you do this, make sure you read up on some of the ways that you can optimize the load of the data. This can really make a difference if you have large data sets.