如何将上传的文件存储在文件系统中？

发布于 2024-07-08 09:44:16 字数 307 浏览 15 评论 0 原文

我正在尝试找出在文件系统中存储用户上传的文件的最佳方法。文件范围从个人文件到 wiki 文件。当然，数据库会以某种方式指向这些文件，但我还没有弄清楚。

基本要求：

相当好的安全性，人们无法猜测文件名（图片001.jpg，图片002.jpg， Music001.mp3 是一个大禁忌）
轻松备份和备份可镜像（我更喜欢一种方式，这样我就不必每次想要备份时都复制整个 HDD。我喜欢只备份最新项目的想法，但我对此处的选项很灵活。）
可扩展到数百万个如果需要，多个服务器上的文件。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

心头的小情儿 2024-07-15 09:44:16

一种技术是将数据存储在以其内容的哈希值 (SHA1) 命名的文件中。这不容易猜到，任何备份程序都应该能够处理它，并且很容易进行分片（通过在一台机器上存储以 0 开头的哈希值，在另一台机器上存储以 1 开头的哈希值，等等）。

该数据库将包含用户分配的名称和内容的 SHA1 哈希值之间的映射。

回复收藏 0 原文

情栀口红 2024-07-15 09:44:16

文件名指南，自动扩展文件夹层次结构，每个文件夹中的文件/文件夹不超过几千个。备份新文件是通过备份新文件夹来完成的。

您没有指出您正在使用什么环境和/或编程语言，但这里有一个 C# / .net / Windows 示例：

using System;
using System.IO;
using System.Xml.Serialization;

/// <summary>
/// Class for generating storage structure and file names for document storage.
/// Copyright (c) 2008, Huagati Systems Co.,Ltd. 
/// </summary>

public class DocumentStorage
{
    private static StorageDirectory _StorageDirectory = null;

    public static string GetNewUNCPath()
    {
        string storageDirectory = GetStorageDirectory();
        if (!storageDirectory.EndsWith("\\"))
        {
            storageDirectory += "\\";
        }
        return storageDirectory + GuidEx.NewSeqGuid().ToString() + ".data";
    }

    public static void SaveDocumentInfo(string documentPath, Document documentInfo)
    {
        //the filestream object don't like NTFS streams so this is disabled for now...
        return;

        //stores a document object in a separate "docinfo" stream attached to the file it belongs to
        //XmlSerializer ser = new XmlSerializer(typeof(Document));
        //string infoStream = documentPath + ":docinfo";
        //FileStream fs = new FileStream(infoStream, FileMode.Create);
        //ser.Serialize(fs, documentInfo);
        //fs.Flush();
        //fs.Close();
    }

    private static string GetStorageDirectory()
    {
        string storageRoot = ConfigSettings.DocumentStorageRoot;
        if (!storageRoot.EndsWith("\\"))
        {
            storageRoot += "\\";
        }

        //get storage directory if not set
        if (_StorageDirectory == null)
        {
            _StorageDirectory = new StorageDirectory();
            lock (_StorageDirectory)
            {
                string path = ConfigSettings.ReadSettingString("CurrentDocumentStoragePath");
                if (path == null)
                {
                    //no storage tree created yet, create first set of subfolders
                    path = CreateStorageDirectory(storageRoot, 1);
                    _StorageDirectory.FullPath = path.Substring(storageRoot.Length);
                    ConfigSettings.WriteSettingString("CurrentDocumentStoragePath", _StorageDirectory.FullPath);
                }
                else
                {
                    _StorageDirectory.FullPath = path;
                }
            }
        }

        int fileCount = (new DirectoryInfo(storageRoot + _StorageDirectory.FullPath)).GetFiles().Length;
        if (fileCount > ConfigSettings.FolderContentLimitFiles)
        {
            //if the directory has exceeded number of files per directory, create a new one...
            lock (_StorageDirectory)
            {
                string path = GetNewStorageFolder(storageRoot + _StorageDirectory.FullPath, ConfigSettings.DocumentStorageDepth);
                _StorageDirectory.FullPath = path.Substring(storageRoot.Length);
                ConfigSettings.WriteSettingString("CurrentDocumentStoragePath", _StorageDirectory.FullPath);
            }
        }

        return storageRoot + _StorageDirectory.FullPath;
    }

    private static string GetNewStorageFolder(string currentPath, int currentDepth)
    {
        string parentFolder = currentPath.Substring(0, currentPath.LastIndexOf("\\"));
        int parentFolderFolderCount = (new DirectoryInfo(parentFolder)).GetDirectories().Length;
        if (parentFolderFolderCount < ConfigSettings.FolderContentLimitFolders)
        {
            return CreateStorageDirectory(parentFolder, currentDepth);
        }
        else
        {
            return GetNewStorageFolder(parentFolder, currentDepth - 1);
        }
    }

    private static string CreateStorageDirectory(string currentDir, int currentDepth)
    {
        string storageDirectory = null;
        string directoryName = GuidEx.NewSeqGuid().ToString();
        if (!currentDir.EndsWith("\\"))
        {
            currentDir += "\\";
        }
        Directory.CreateDirectory(currentDir + directoryName);

        if (currentDepth < ConfigSettings.DocumentStorageDepth)
        {
            storageDirectory = CreateStorageDirectory(currentDir + directoryName, currentDepth + 1);
        }
        else
        {
            storageDirectory = currentDir + directoryName;
        }
        return storageDirectory;
    }

    private class StorageDirectory
    {
        public string DirectoryName { get; set; }
        public StorageDirectory ParentDirectory { get; set; }
        public string FullPath
        {
            get
            {
                if (ParentDirectory != null)
                {
                    return ParentDirectory.FullPath + "\\" + DirectoryName;
                }
                else
                {
                    return DirectoryName;
                }
            }
            set
            {
                if (value.Contains("\\"))
                {
                    DirectoryName = value.Substring(value.LastIndexOf("\\") + 1);
                    ParentDirectory = new StorageDirectory { FullPath = value.Substring(0, value.LastIndexOf("\\")) };
                }
                else
                {
                    DirectoryName = value;
                }
            }
        }
    }
}

Guids for filenames, automatically expanding folder hierarchy with no more than a couple of thousand files/folders in each folder. Backing up new files is done by backing up new folders.

You haven't indicated what environment and/or programming language you are using, but here's a C# / .net / Windows example:

using System;
using System.IO;
using System.Xml.Serialization;

/// <summary>
/// Class for generating storage structure and file names for document storage.
/// Copyright (c) 2008, Huagati Systems Co.,Ltd. 
/// </summary>

public class DocumentStorage
{
    private static StorageDirectory _StorageDirectory = null;

    public static string GetNewUNCPath()
    {
        string storageDirectory = GetStorageDirectory();
        if (!storageDirectory.EndsWith("\\"))
        {
            storageDirectory += "\\";
        }
        return storageDirectory + GuidEx.NewSeqGuid().ToString() + ".data";
    }

    public static void SaveDocumentInfo(string documentPath, Document documentInfo)
    {
        //the filestream object don't like NTFS streams so this is disabled for now...
        return;

        //stores a document object in a separate "docinfo" stream attached to the file it belongs to
        //XmlSerializer ser = new XmlSerializer(typeof(Document));
        //string infoStream = documentPath + ":docinfo";
        //FileStream fs = new FileStream(infoStream, FileMode.Create);
        //ser.Serialize(fs, documentInfo);
        //fs.Flush();
        //fs.Close();
    }

    private static string GetStorageDirectory()
    {
        string storageRoot = ConfigSettings.DocumentStorageRoot;
        if (!storageRoot.EndsWith("\\"))
        {
            storageRoot += "\\";
        }

        //get storage directory if not set
        if (_StorageDirectory == null)
        {
            _StorageDirectory = new StorageDirectory();
            lock (_StorageDirectory)
            {
                string path = ConfigSettings.ReadSettingString("CurrentDocumentStoragePath");
                if (path == null)
                {
                    //no storage tree created yet, create first set of subfolders
                    path = CreateStorageDirectory(storageRoot, 1);
                    _StorageDirectory.FullPath = path.Substring(storageRoot.Length);
                    ConfigSettings.WriteSettingString("CurrentDocumentStoragePath", _StorageDirectory.FullPath);
                }
                else
                {
                    _StorageDirectory.FullPath = path;
                }
            }
        }

        int fileCount = (new DirectoryInfo(storageRoot + _StorageDirectory.FullPath)).GetFiles().Length;
        if (fileCount > ConfigSettings.FolderContentLimitFiles)
        {
            //if the directory has exceeded number of files per directory, create a new one...
            lock (_StorageDirectory)
            {
                string path = GetNewStorageFolder(storageRoot + _StorageDirectory.FullPath, ConfigSettings.DocumentStorageDepth);
                _StorageDirectory.FullPath = path.Substring(storageRoot.Length);
                ConfigSettings.WriteSettingString("CurrentDocumentStoragePath", _StorageDirectory.FullPath);
            }
        }

        return storageRoot + _StorageDirectory.FullPath;
    }

    private static string GetNewStorageFolder(string currentPath, int currentDepth)
    {
        string parentFolder = currentPath.Substring(0, currentPath.LastIndexOf("\\"));
        int parentFolderFolderCount = (new DirectoryInfo(parentFolder)).GetDirectories().Length;
        if (parentFolderFolderCount < ConfigSettings.FolderContentLimitFolders)
        {
            return CreateStorageDirectory(parentFolder, currentDepth);
        }
        else
        {
            return GetNewStorageFolder(parentFolder, currentDepth - 1);
        }
    }

    private static string CreateStorageDirectory(string currentDir, int currentDepth)
    {
        string storageDirectory = null;
        string directoryName = GuidEx.NewSeqGuid().ToString();
        if (!currentDir.EndsWith("\\"))
        {
            currentDir += "\\";
        }
        Directory.CreateDirectory(currentDir + directoryName);

        if (currentDepth < ConfigSettings.DocumentStorageDepth)
        {
            storageDirectory = CreateStorageDirectory(currentDir + directoryName, currentDepth + 1);
        }
        else
        {
            storageDirectory = currentDir + directoryName;
        }
        return storageDirectory;
    }

    private class StorageDirectory
    {
        public string DirectoryName { get; set; }
        public StorageDirectory ParentDirectory { get; set; }
        public string FullPath
        {
            get
            {
                if (ParentDirectory != null)
                {
                    return ParentDirectory.FullPath + "\\" + DirectoryName;
                }
                else
                {
                    return DirectoryName;
                }
            }
            set
            {
                if (value.Contains("\\"))
                {
                    DirectoryName = value.Substring(value.LastIndexOf("\\") + 1);
                    ParentDirectory = new StorageDirectory { FullPath = value.Substring(0, value.LastIndexOf("\\")) };
                }
                else
                {
                    DirectoryName = value;
                }
            }
        }
    }
}

回复收藏 0 原文

谁许谁一生繁华 2024-07-15 09:44:16

文件名 + 盐的 SHA1 哈希值（或者，如果您愿意，可以是文件内容的 SHA1 哈希值。这使得检测重复文件变得更容易，但也给服务器带来了更大的压力）。这可能需要一些调整才能独一无二（即添加上传的用户 ID 或时间戳），而盐是为了使其不可猜测。

文件夹结构由散列的部分组成。

例如，如果哈希值是“2fd4e1c67a2d28fced849ee1bb76e7391b93eb12”，那么文件夹可能是：

/2
/2/2f/
/2/2f/2fd/
/2/2f/2fd/2fd4e1c67a2d28fced849ee1bb76e7391b93eb12

这是为了防止大型文件夹（某些操作系统无法枚举包含一百万个文件的文件夹，因此为部分哈希值创建几个子文件夹。多少级? 这取决于您期望有多少个文件，但 2 或 3 个通常是合理的。

SHA1 hash of the filename + a salt (or, if you want, of the file contents. That makes detecting duplicate files easier, but also puts a LOT more stress on the server). This may need some tweaking to be unique (i.e. add Uploaded UserID or a Timestamp), and the salt is to make it not guessable.

Folder structure is then by parts of the hash.

For example, if the hash is "2fd4e1c67a2d28fced849ee1bb76e7391b93eb12" then the folders could be:

/2
/2/2f/
/2/2f/2fd/
/2/2f/2fd/2fd4e1c67a2d28fced849ee1bb76e7391b93eb12

This is to prevent large folders (some Operating Systems have trouble enumarating folders with a million of files, hence making a few subfolders for parts of the hash. How many levels? That depends on how many files you expect, but 2 or 3 is usually reasonable.

回复收藏 0 原文

九局 2024-07-15 09:44:16

仅就您问题的一方面（安全性）而言：在文件系统中安全存储上传文件的最佳方法是确保上传文件不在网络根目录中（即，您无法通过 URL 直接访问它们 - 您必须通过脚本）。

这使您可以完全控制人们可以下载的内容（安全性）并允许进行日志记录等操作。当然，您必须确保脚本本身是安全的，但这意味着只有您允许的人才能下载某些文件。

回复收藏 0 原文

紫南 2024-07-15 09:44:16

扩展 Phill Sacre 的答案，安全性的另一个方面是为上传的文件使用单独的域名（例如，维基百科使用 upload.wikimedia.org），并确保该域无法读取您的任何内容网站的 cookie。这可以防止人们上传带有脚本的 HTML 文件来窃取用户的会话 cookie（仅设置 Content-Type 标头是不够的，因为某些浏览器会忽略它并根据文件内容进行猜测；它也可以嵌入到其他类型的文件中，因此检查 HTML 并禁止它并不是一件容易的事）。