创建和加载avro文件,创建文件,但很空
我正在读取CSV文件并将其加载到GCS存储桶中的AVRO文件中。 AVRO文件会创建,但没有数据。打印时有数据。我检查了缓冲区,但缓冲区中也没有数据。
我尝试了writer.close(),但是我遇到了这个错误 - “如果不结束上传,就无法齐平。使用colle(),” io.unsupportedoperation:如果不结束上传,就无法冲洗。而是使用Close()。”
'def load_avro_file(records):
schema_parsed = avro.schema.parse(json.dumps(schema))
client = storage.Client()
bucket = client.get_bucket(BUCKET)
blob = bucket.blob(DESTINATION_FILE)
with blob.open(mode='wb') as f:
writer = DataFileWriter(f, DatumWriter(), schema_parsed)
for record in records:
record = dict((f, getattr(record, f)) for f in record._fields)
print("In here",record)
writer.append(record)
''
I am reading a CSV file and loading it into an Avro file in the GCS bucket. The Avro file gets created but there is no data. There is data when I print. I checked the buffer but there is no data in the buffer as well.
I tried writer.close() but I am getting this error - "Cannot flush without finalizing upload. Use close() instead, "io.UnsupportedOperation: Cannot flush without finalizing upload. Use close() instead."
'def load_avro_file(records):
schema_parsed = avro.schema.parse(json.dumps(schema))
client = storage.Client()
bucket = client.get_bucket(BUCKET)
blob = bucket.blob(DESTINATION_FILE)
with blob.open(mode='wb') as f:
writer = DataFileWriter(f, DatumWriter(), schema_parsed)
for record in records:
record = dict((f, getattr(record, f)) for f in record._fields)
print("In here",record)
writer.append(record)
'
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

我面临类似的问题,但找不到任何答案。也许您已经解决了这个问题,但是让我在这里分享我的工作方式。
阅读 blob.open 方法,我发现此
image> image_flush
参数:AVRO需要在二进制模式下打开文件,因此打开BLOB时,我们需要将此参数设置为
true
以避免错误。另外,如果您不调用
.close()
avro方法,则文件无法正确生成,因此我们需要将io对象馈送给作者,而无需将其包裹在上下文管理器上它将由Avro本身处理。最终解决方案看起来像这样:
I was facing a similar problem but couldn't find any answer for this. Maybe you already solved this, but let me share here how I had this working.
Reading Google Cloud docs for
blob.open
method, I found thisignore_flush
parameter:Avro needs to open the files on binary mode, so when opening the blob we need to set this parameter to
True
to avoid errors.Also, if you don't call the
.close()
avro method the file won't be generated properly, so we need to feed our IO object to the writer without wrapping it on a context manager as it will be handled by avro itself.The final solution looks like this: