如何使用 magic 来验证 Django 表单 clean 方法中的文件类型?

发布于 2024-12-23 04:42:32 字数 420 浏览 6 评论 0 原文

我在 Django 中使用 FileField 编写了一个电子邮件表单类。我想通过检查其 mimetype 来检查上传文件的类型。随后,我想将文件类型限制为 pdf、word 和打开的 Office 文档。

为此,我已经安装了 python-magic,并希望按照 python-magic 的规范检查文件类型:

mime = magic.Magic(mime=True)
file_mime_type = mime.from_file('address/of/file.txt')

但是,最近上传的文件在我的服务器上缺少地址。我也不知道 mime 对象有任何类似于“from_file_content”的方法来检查给定文件内容的 mime 类型。

在Django表单中使用magic来验证上传文件的文件类型的有效方法是什么?

I have written an email form class in Django with a FileField. I want to check the uploaded file for its type via checking its mimetype. Subsequently, I want to limit file types to pdfs, word, and open office documents.

To this end, I have installed python-magic and would like to check file types as follows per the specs for python-magic:

mime = magic.Magic(mime=True)
file_mime_type = mime.from_file('address/of/file.txt')

However, recently uploaded files lack addresses on my server. I also do not know of any method of the mime object akin to "from_file_content" that checks for the mime type given the content of the file.

What is an effective way to use magic to verify file types of uploaded files in Django forms?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

百变从容 2024-12-30 04:42:32

斯坦描述了带有缓冲液的良好变体。不幸的是,这种方法的弱点是将文件读取到内存中。另一种选择是使用临时存储的文件:

import tempfile
import magic
with tempfile.NamedTemporaryFile() as tmp:
    for chunk in form.cleaned_data['file'].chunks():
        tmp.write(chunk)
    print(magic.from_file(tmp.name, mime=True))

另外,您可能想检查文件大小:

if form.cleaned_data['file'].size < ...:
    print(magic.from_buffer(form.cleaned_data['file'].read()))
else:
    # store to disk (the code above)

另外

当指定的临时文件仍然打开时,是否可以使用该名称再次打开该文件,因平台而异(在 Unix 上可以如此使用;在 Windows NT 或更高版本上不能使用)。

所以你可能想要像 so 一样处理它:

import os
tmp = tempfile.NamedTemporaryFile(delete=False)
try:
    for chunk in form.cleaned_data['file'].chunks():
        tmp.write(chunk)
    print(magic.from_file(tmp.name, mime=True))
finally:
    os.unlink(tmp.name)
    tmp.close()

另外,你可能想要 seek(0)read() 之后:

if hasattr(f, 'seek') and callable(f.seek):
    f.seek(0)

上传数据的存储位置

Stan described good variant with buffer. Unfortunately the weakness of this method is reading file to the memory. Another option is using temporary stored file:

import tempfile
import magic
with tempfile.NamedTemporaryFile() as tmp:
    for chunk in form.cleaned_data['file'].chunks():
        tmp.write(chunk)
    print(magic.from_file(tmp.name, mime=True))

Also, you might want to check the file size:

if form.cleaned_data['file'].size < ...:
    print(magic.from_buffer(form.cleaned_data['file'].read()))
else:
    # store to disk (the code above)

Additionally:

Whether the name can be used to open the file a second time, while the named temporary file is still open, varies across platforms (it can be so used on Unix; it cannot on Windows NT or later).

So you might want to handle it like so:

import os
tmp = tempfile.NamedTemporaryFile(delete=False)
try:
    for chunk in form.cleaned_data['file'].chunks():
        tmp.write(chunk)
    print(magic.from_file(tmp.name, mime=True))
finally:
    os.unlink(tmp.name)
    tmp.close()

Also, you might want to seek(0) after read():

if hasattr(f, 'seek') and callable(f.seek):
    f.seek(0)

Where uploaded data is stored

心房敞 2024-12-30 04:42:32

为什么不尝试在您看来类似的事情:

m = magic.Magic()
m.from_buffer(request.FILES['my_file_field'].read())

或者使用 request.FILES 代替 form.cleaned_data 如果 django.forms.Form 确实是不是一个选择。

Why no trying something like that in your view :

m = magic.Magic()
m.from_buffer(request.FILES['my_file_field'].read())

Or use request.FILES in place of form.cleaned_data if django.forms.Form is really not an option.

陌若浮生 2024-12-30 04:42:32
mime = magic.Magic(mime=True)

attachment = form.cleaned_data['attachment']

if hasattr(attachment, 'temporary_file_path'):
    # file is temporary on the disk, so we can get full path of it.
    mime_type = mime.from_file(attachment.temporary_file_path())
else:
    # file is on the memory
    mime_type = mime.from_buffer(attachment.read())

另外,您可能想要 seek read() 之后的 (0)

if hasattr(f, 'seek') and callable(f.seek):
    f.seek(0)

来自 Django 代码。在验证期间对图像字段执行。

mime = magic.Magic(mime=True)

attachment = form.cleaned_data['attachment']

if hasattr(attachment, 'temporary_file_path'):
    # file is temporary on the disk, so we can get full path of it.
    mime_type = mime.from_file(attachment.temporary_file_path())
else:
    # file is on the memory
    mime_type = mime.from_buffer(attachment.read())

Also, you might want to seek(0) after read():

if hasattr(f, 'seek') and callable(f.seek):
    f.seek(0)

Example from Django code. Performed for image fields during validation.

暖伴 2024-12-30 04:42:32

您可以使用 django-safe-filefield 包来验证上传的文件扩展名是否与 MIME 匹配-类型。

from safe_filefield.forms import SafeFileField

class MyForm(forms.Form):

    attachment = SafeFileField(
        allowed_extensions=('xls', 'xlsx', 'csv')
    )

You can use django-safe-filefield package to validate that uploaded file extension match it MIME-type.

from safe_filefield.forms import SafeFileField

class MyForm(forms.Form):

    attachment = SafeFileField(
        allowed_extensions=('xls', 'xlsx', 'csv')
    )
冬天旳寂寞 2024-12-30 04:42:32

如果您正在处理文件上传并且只关心图像
Django 将为您设置 content_type (或者更确切地说为它自己?):

from django.forms import ModelForm
from django.core.files import File
from django.db import models
class MyPhoto(models.Model):
    photo = models.ImageField(upload_to=photo_upload_to, max_length=1000)
class MyForm(ModelForm):
    class Meta:
        model = MyPhoto
        fields = ['photo']
photo = MyPhoto.objects.first()
photo = File(open('1.jpeg', 'rb'))
form = MyForm(files={'photo': photo})
if form.is_valid():
    print(form.instance.photo.file.content_type)

它不依赖于用户提供的内容类型。但
django.db.models.fields.files.FieldFile.file 是一个未记录
属性

实际上,最初 content_type 是从 请求,但是当
表单经过验证,值已更新

关于非图像,request.FILES['name'].read()对我来说似乎没问题。
首先,这就是 Django 所做的。二、默认大于2.5Mb的文件
存储磁盘。因此,让我向您指出其他答案
这里。


出于好奇,这里是导致更新的堆栈跟踪
content_type

django.forms.forms.BaseForm.is_valid:self.errors
django.forms.forms.BaseForm.errors: 自我。 full_clean()
django.forms.forms.BaseForm.full_clean: 自我。 _clean_fields()
django.forms.forms.BaseForm._clean_fiels: 字段。干净()
django.forms.fields.FileField.clean: 超级( ).clean()
django.forms.fields.Field.clean: 自我。 to_python()
django.forms.fields.ImageField.to_python< /a>

In case you're handling a file upload and concerned only about images,
Django will set content_type for you (or rather for itself?):

from django.forms import ModelForm
from django.core.files import File
from django.db import models
class MyPhoto(models.Model):
    photo = models.ImageField(upload_to=photo_upload_to, max_length=1000)
class MyForm(ModelForm):
    class Meta:
        model = MyPhoto
        fields = ['photo']
photo = MyPhoto.objects.first()
photo = File(open('1.jpeg', 'rb'))
form = MyForm(files={'photo': photo})
if form.is_valid():
    print(form.instance.photo.file.content_type)

It doesn't rely on content type provided by the user. But
django.db.models.fields.files.FieldFile.file is an undocumented
property.

Actually, initially content_type is set from the request, but when
the form gets validated, the value is updated.

Regarding non-images, doing request.FILES['name'].read() seems okay to me.
First, that's what Django does. Second, files larger than 2.5 Mb by default
are stored on a disk. So let me point you at the other answer
here.


For the curious, here's the stack trace that leads to updating
content_type:

django.forms.forms.BaseForm.is_valid: self.errors
django.forms.forms.BaseForm.errors: self.full_clean()
django.forms.forms.BaseForm.full_clean: self._clean_fields()
django.forms.forms.BaseForm._clean_fiels: field.clean()
django.forms.fields.FileField.clean: super().clean()
django.forms.fields.Field.clean: self.to_python()
django.forms.fields.ImageField.to_python

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文