访问存储在 GAE blob 中的制表符分隔值内容,但使用通用换行模式或等效模式
我正在尝试读取 TSV 文件的内容作为 Google App Engine 应用程序的一部分。
我可以使用以下方法很好地从文件中读取数据:
f=csv.reader(open(matrixpath, "rU"),dialect='excel-tab')
但是我现在需要使用 blobreader 从 blobstore 读取数据:
blob_key = ...
blobdata = blobstore.BlobReader(blob_key)
f=csv.reader(blobdata,dialect='excel-tab')
如果没有 rU 参数,我会在不带引号的字段错误中得到换行符:
错误:未加引号的字段中出现换行符 - 是否需要以通用换行模式打开文件?
我想修复我的文件以免出现此错误,或者以通用换行模式模拟从 blobstore 打开?
我的文件大约 20MB,它的缩减样本(脚本仍然失败)可以在这里找到。
I'm trying to read the contents of a TSV file as part of a Google App Engine application.
I can read from a file fine by using:
f=csv.reader(open(matrixpath, "rU"),dialect='excel-tab')
However I now need to read the data from the blobstore using blobreader:
blob_key = ...
blobdata = blobstore.BlobReader(blob_key)
f=csv.reader(blobdata,dialect='excel-tab')
(I've uploaded a copy of the entire code that I'm having this issue with here)
Without the rU argument I get a new-line in unquoted field error:
Error: new-line character seen in unquoted field - do you need to open the file in universal-newline mode?
I would like to either fix my file so that I do not get this error, or emulate opening from the blobstore in a universal-newline mode?
My file is around 20MB, and a cut down sample of it (that the script still fails on) can be found here.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我无法直接从示例文件重现该错误。你可以吗?
给定
blob = open('sample-file.tsv', 'rb').read()
:reader = csv.reader(blob, dialect='excel-tab' )
正如预期的那样,产生了无数个左右的单字节字段。替换
StringIO.StringIO(blob)
或blob.splitlines()
会生成 50 行,每行约 10000 列......似乎工作正常。除非您显示 (1) 您的 blob 上传代码(以及相关文档的 URL)(2) 您在 GAE 上遇到错误的代码,否则似乎无法提供进一步的帮助。
I cannot reproduce the error directly from the sample file. Can you?
Given
blob = open('sample-file.tsv', 'rb').read()
:reader = csv.reader(blob, dialect='excel-tab')
produces a zillion or so one-byte fields, as expected.Substituting
StringIO.StringIO(blob)
orblob.splitlines()
produces 50 rows each with about 10000 columns ... appears to be working correctly.Unless you show (1) your blob uploading code (and URL of relevant docs) (2) your code that is getting the error on GAE, further assistance doesn't appear to be possible.
从上传并解析csv 文件在 Google App Engine 上的 python 中带有“通用换行符” ,以下答案对我有用:
csv.reader(blob.open.read().splitlines())
读取GNU/Linux 上的 mac 格式的 csv 文件。
From Upload and parse csv file with "universal newline" in python on Google App Engine , the following answer worked for me:
csv.reader(blob.open.read().splitlines())
to read a mac formatted csv file on GNU/Linux.