帮助使用导入工具或 SSIS 将每行具有可变列的 CSV 文件导入 SQL 表
我遇到了一个包含超过 100,000 行的 CSV 文件,其中包含来自提供商的产品图像。以下是该问题的详细信息,我非常感谢一些有助于解决此问题的提示。谢谢。
该文件每个产品有 1 行和以下 4 列。 ID、URL、高度、宽度 例如:1,http://i.img.com,100,200
当产品有多个图像时就会出现问题。 文件不是每个图像 1 行,而是在同一行中有更多列。
例子: 1,http://i.img.com,100,200,//i.img.com,20,100,//i.img.com,30,50
请注意,只有第一张图像以“http://”开头,其余图像开始使用“//”
无法得知每个产品有多少图像,因此无法得知每行总共有多少列或最大列数。
如何使用 SSIS 或 sql 导入向导导入它。
我还需要定期这样做。
感谢您的帮助。
I am stuck with a CSV file with over 100,000 rows that contains product images from a provider. Here are the details of the issue, I would really appreciate some tips to help resolve this. Thanks.
The File has 1 Row per product and the following 4 columns.
ID,URL,HEIGHT,WIDTH
example: 1,http://i.img.com,100,200
Problem starts when a product has multiple images.
Instead of having 1 row per image the file has more columns in same row.
example:
1,http://i.img.com,100,200,//i.img.com,20,100,//i.img.com,30,50
Note that only first image has "http://" remaining images start with "//"
There is no telling how many images per product hence no way to tell how many total columns per row or max columns.
How can I import this using SSIS or sql import wizard.
Also I need to do this on regular intervals.
Thank you for your help.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我认为您无法使用任何标准 SSIS 任务或向导来执行此操作。您将必须编写一些解析每一行的自定义代码。您可以使用 VB 代码在 SSIS 中执行此操作,也可以将文件导入到临时表中,该临时表只是一个列来保存每一行并在 SQL 中进行解析。对于这种操作,SSIS 可能会更快。
另一种可能性是使用正则表达式或搜索和替换命令预处理文件。尝试在图像列表周围加上双引号,然后您应该能够正常导入整个文件,并将引用的部分放入单个列中。考虑到您可以搜索的“http:\”,捕获字符串的开头应该很容易。确定最终引用的位置可能是一个更大的问题。
第三种可能的解决方案是获取修复数据的源。即使您无法获取单独行中的图像(或另一个具有单独行的文件,这将是理想的),也许您可以在导出过程中获取从源添加的双引号。与使用搜索和替换方法相比,这可能不太容易出错。
祝你好运!
I don't think that you can use any standard SSIS task or wizard to do this. You're going to have to write some custom code which parses each line. You can do this in SSIS using VB code or you can import the file into a staging table that's just a single column to hold each row and do the parsing in SQL. SSIS will probably be faster for this kind of operation.
Another possibility is to preprocess the file using regex or a search-and-replace command. Try to get double-quotes around the image list then you should be able to import the whole file fine, with the quoted part going into a single column. Catching the start of the string should be easy enough given the "http:\" for which you can search. Determining where the end quote goes might be more of a problem.
A third potential solution would be to get the source to fix the data. Even if you can't get the images in separate rows (or another file with separate rows, which would be ideal), maybe you can get the double-quotes added from the source as part of the export. This would likely be less error-prone than using the search-and-replace method.
Good luck!