在 Django 中流式传输 CSV 文件
我正在尝试将 csv 文件作为附件下载。 CSV 文件的大小将达到 4MB 或更大,我需要一种方法让用户能够主动下载文件,而无需等待所有数据首先创建并提交到内存中。
我首先使用了我自己的基于 Django 的 FileWrapper 类的文件包装器。那失败了。然后我在这里看到了一种使用生成器来传输响应的方法: 如何使用 Django 传输 HttpResponse
当我在生成器,我可以看到我正在使用 get_row_data() 函数创建正确的数据,但是当我尝试返回响应时,它返回为空。我还禁用了 Django GZipMiddleware
。有谁知道我做错了什么?
编辑:我遇到的问题是 ConditionalGetMiddleware
。我必须替换它,代码在下面的答案中。
这是视图:
from django.views.decorators.http import condition
@condition(etag_func=None)
def csv_view(request, app_label, model_name):
""" Based on the filters in the query, return a csv file for the given model """
#Get the model
model = models.get_model(app_label, model_name)
#if there are filters in the query
if request.method == 'GET':
#if the query is not empty
if request.META['QUERY_STRING'] != None:
keyword_arg_dict = {}
for key, value in request.GET.items():
#get the query filters
keyword_arg_dict[str(key)] = str(value)
#generate a list of row objects, based on the filters
objects_list = model.objects.filter(**keyword_arg_dict)
else:
#get all the model's objects
objects_list = model.objects.all()
else:
#get all the model's objects
objects_list = model.objects.all()
#create the reponse object with a csv mimetype
response = HttpResponse(
stream_response_generator(model, objects_list),
mimetype='text/plain',
)
response['Content-Disposition'] = "attachment; filename=foo.csv"
return response
这是我用来流式传输响应的生成器:
def stream_response_generator(model, objects_list):
"""Streaming function to return data iteratively """
for row_item in objects_list:
yield get_row_data(model, row_item)
time.sleep(1)
这是我创建 csv 行数据的方式:
def get_row_data(model, row):
"""Get a row of csv data from an object"""
#Create a temporary csv handle
csv_handle = cStringIO.StringIO()
#create the csv output object
csv_output = csv.writer(csv_handle)
value_list = []
for field in model._meta.fields:
#if the field is a related field (ForeignKey, ManyToMany, OneToOne)
if isinstance(field, RelatedField):
#get the related model from the field object
related_model = field.rel.to
for key in row.__dict__.keys():
#find the field in the row that matches the related field
if key.startswith(field.name):
#Get the unicode version of the row in the related model, based on the id
try:
entry = related_model.objects.get(
id__exact=int(row.__dict__[key]),
)
except:
pass
else:
value = entry.__unicode__().encode("utf-8")
break
#if it isn't a related field
else:
#get the value of the field
if isinstance(row.__dict__[field.name], basestring):
value = row.__dict__[field.name].encode("utf-8")
else:
value = row.__dict__[field.name]
value_list.append(value)
#add the row of csv values to the csv file
csv_output.writerow(value_list)
#Return the string value of the csv output
return csv_handle.getvalue()
I am attempting to stream a csv file as an attachment download. The CSV files are getting to be 4MB in size or more, and I need a way for the user to actively download the files without waiting for all of the data to be created and committed to memory first.
I first used my own file wrapper based on Django's FileWrapper
class. That failed. Then I saw a method here for using a generator to stream the response:
How to stream an HttpResponse with Django
When I raise an error within the generator, I can see that I am creating the proper data with the get_row_data()
function, but when I try to return the response it comes back empty. I've also disabled the Django GZipMiddleware
. Does anyone know what I'm doing wrong?
Edit: The issue I was having was with the ConditionalGetMiddleware
. I had to replace it, the code is in an answer below.
Here is the view:
from django.views.decorators.http import condition
@condition(etag_func=None)
def csv_view(request, app_label, model_name):
""" Based on the filters in the query, return a csv file for the given model """
#Get the model
model = models.get_model(app_label, model_name)
#if there are filters in the query
if request.method == 'GET':
#if the query is not empty
if request.META['QUERY_STRING'] != None:
keyword_arg_dict = {}
for key, value in request.GET.items():
#get the query filters
keyword_arg_dict[str(key)] = str(value)
#generate a list of row objects, based on the filters
objects_list = model.objects.filter(**keyword_arg_dict)
else:
#get all the model's objects
objects_list = model.objects.all()
else:
#get all the model's objects
objects_list = model.objects.all()
#create the reponse object with a csv mimetype
response = HttpResponse(
stream_response_generator(model, objects_list),
mimetype='text/plain',
)
response['Content-Disposition'] = "attachment; filename=foo.csv"
return response
Here is the generator I use to stream the response:
def stream_response_generator(model, objects_list):
"""Streaming function to return data iteratively """
for row_item in objects_list:
yield get_row_data(model, row_item)
time.sleep(1)
And here is how I create the csv row data:
def get_row_data(model, row):
"""Get a row of csv data from an object"""
#Create a temporary csv handle
csv_handle = cStringIO.StringIO()
#create the csv output object
csv_output = csv.writer(csv_handle)
value_list = []
for field in model._meta.fields:
#if the field is a related field (ForeignKey, ManyToMany, OneToOne)
if isinstance(field, RelatedField):
#get the related model from the field object
related_model = field.rel.to
for key in row.__dict__.keys():
#find the field in the row that matches the related field
if key.startswith(field.name):
#Get the unicode version of the row in the related model, based on the id
try:
entry = related_model.objects.get(
id__exact=int(row.__dict__[key]),
)
except:
pass
else:
value = entry.__unicode__().encode("utf-8")
break
#if it isn't a related field
else:
#get the value of the field
if isinstance(row.__dict__[field.name], basestring):
value = row.__dict__[field.name].encode("utf-8")
else:
value = row.__dict__[field.name]
value_list.append(value)
#add the row of csv values to the csv file
csv_output.writerow(value_list)
#Return the string value of the csv output
return csv_handle.getvalue()
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
下面是一些用于传输 CSV 的简单代码;您可能可以从这里转到您需要执行的任何操作:
这只是将每一行写入内存中的文件,读取该行并生成它。
该版本对于生成批量数据更加高效,但在使用之前请务必了解以上内容:
Here's some simple code that'll stream a CSV; you can probably go from this to whatever you need to do:
This simply writes each row to an in-memory file, reads the row and yields it.
This version is more efficient for generating bulk data, but be sure to understand the above before using it:
从 Django 1.5 开始,中间件问题已得到解决,并且 StreamingHttpResponse已被介绍。应该执行以下操作:
有一些关于 如何从 Django 输出 csv 的文档,但它没有'没有利用
StreamingHttpResponse
,所以我继续开具了一张票证以跟踪它 。The middleware issue has been solved as of Django 1.5 and a StreamingHttpResponse has been introduced. The following should do:
There's some documentation on how to output csv from Django but it doesn't take advantage of the
StreamingHttpResponse
so I went ahead and opened a ticket in order to track it.我遇到的问题是 ConditionalGetMiddleware。我看到 django-piston 为 ConditionalGetMiddleware 提供了一个允许流式传输的替代中间件:
因此,您将用 ConditionalMiddlewareCompatProxy 中间件替换 ConditionalGetMiddleware,并且在您看来(从这个问题的巧妙答案中借用了代码):
The problem I was having was with the ConditionalGetMiddleware. I saw django-piston come up with a replacement middleware for the ConditionalGetMiddleware that allows streaming:
So then you will replace ConditionalGetMiddleware with your ConditionalMiddlewareCompatProxy middleware, and in your view (borrowed code from a clever answer to this question):