Goland检查文件类型XLSX

发布于 2025-02-08 11:07:47 字数 2022 浏览 2 评论 0原文

GO标准库设施检查XLSX文件的文件类型提供了类似的内容

import (
    "fmt"
    "log"
    "net/http"
    "os"
)

func main() {
    f, err := os.Open("file_c.xlsx")
    if err != nil {
        log.Fatal(err.Error())
    }
    defer f.Close()
    buf := make([]byte, 512)
    _, err = f.Read(buf)
    if err != nil {
        log.Fatal(err.Error())
    }
    contentType := http.DetectContentType(buf)
    fmt.Println(contentType)
}

,并打印出来:

application/zip

此软件包 - > https://github.com/h2non/filetype

import (
    "fmt"
    "io/ioutil"
    "net/http"
    "os"

    "github.com/h2non/filetype"
)

func main() {
    buf, _ := ioutil.ReadFile("file_c.xlsx")
    kind, _ := filetype.Match(buf)
    if kind == filetype.Unknown {
        fmt.Println("unknown")
        return
    }
    fmt.Printf("file type %s. MIME %s\n", kind.Extension, kind.MIME.Value)
}

prints prints prints:

file type xlsx. MIME application/vnd.openxmlformats-officedocument.spreadsheetml.sheet

但是,当我有这样的代码时:

// where file is of type *multipart.FileHeader
mpf, err := file.Open()
    if err != nil {
        wlog.Errorf("could not open %s file", file.Filename)
    } else {
        defer mpf.Close()
    }
buf := make([]byte, 512)
    _, err = mpf.Read(buf)
    if err != nil {
        wlog.Error("failed to read file")
    } else {
        kind, _ := filetype.Match(buf)
        if kind == filetype.Unknown {
            wlog.Info("unknown file type")
        } else {
            wlog.Infof("file type %s. MIME %s\n", kind.Extension, kind.MIME.Value)
        }
    }

打印 时打印:

file type zip. MIME application/zip

因此,即使我使用此外部代码 - > https://github.com/h2non/filetype 您是否知道为什么或我在做什么错?

Go standard library facility to check file type for xlsx file gives something like this

import (
    "fmt"
    "log"
    "net/http"
    "os"
)

func main() {
    f, err := os.Open("file_c.xlsx")
    if err != nil {
        log.Fatal(err.Error())
    }
    defer f.Close()
    buf := make([]byte, 512)
    _, err = f.Read(buf)
    if err != nil {
        log.Fatal(err.Error())
    }
    contentType := http.DetectContentType(buf)
    fmt.Println(contentType)
}

and that prints:

application/zip

This package -> https://github.com/h2non/filetype

import (
    "fmt"
    "io/ioutil"
    "net/http"
    "os"

    "github.com/h2non/filetype"
)

func main() {
    buf, _ := ioutil.ReadFile("file_c.xlsx")
    kind, _ := filetype.Match(buf)
    if kind == filetype.Unknown {
        fmt.Println("unknown")
        return
    }
    fmt.Printf("file type %s. MIME %s\n", kind.Extension, kind.MIME.Value)
}

prints:

file type xlsx. MIME application/vnd.openxmlformats-officedocument.spreadsheetml.sheet

However when I have code like this:

// where file is of type *multipart.FileHeader
mpf, err := file.Open()
    if err != nil {
        wlog.Errorf("could not open %s file", file.Filename)
    } else {
        defer mpf.Close()
    }
buf := make([]byte, 512)
    _, err = mpf.Read(buf)
    if err != nil {
        wlog.Error("failed to read file")
    } else {
        kind, _ := filetype.Match(buf)
        if kind == filetype.Unknown {
            wlog.Info("unknown file type")
        } else {
            wlog.Infof("file type %s. MIME %s\n", kind.Extension, kind.MIME.Value)
        }
    }

prints:

file type zip. MIME application/zip

so information about xlsx file is lost somewhere in the middle even when I use this external code -> https://github.com/h2non/filetype
Do you have any idea why or what am I doing wrong?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

悟红尘 2025-02-15 11:07:47

buf, _ := ioutil.ReadFile("file_c.xlsx")
kind, _ := filetype.Match(buf)

是因为它可以扫描整个文件。这

buf := make([]byte, 512)
// ...
kind, _ := filetype.Match(buf)

不是因为它只能看到第一个512字节,这还不足以确定为XLSX。 XLSX文件只是一个具有一定内容模式的ZIP文件,因此默认为更通用的ZIP类型(从技术上讲也是正确的)。

This

buf, _ := ioutil.ReadFile("file_c.xlsx")
kind, _ := filetype.Match(buf)

works because it gets to scan the entire file. This

buf := make([]byte, 512)
// ...
kind, _ := filetype.Match(buf)

does not because it only gets to see the first 512 bytes, which is not enough to identify the file definitively as XLSX. An XLSX file is just a zip file with a certain pattern of contents, so it defaults to the more generic ZIP type (which is technically also correct).

You can view the implementation to see just how much data it's scanning through to detect file type - up to several kilobytes.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文