We don’t allow questions seeking recommendations for software libraries, tutorials, tools, books, or other off-site resources. You can edit the question so it can be answered with facts and citations.
Closed 9 years ago.
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
接受
或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
发布评论
评论(3)
S3 是一个有趣的想法。 使用 cron 将超过 1 个月未访问的文件同步到 Amazon S3,然后创建一个 Web 界面供用户将同步的文件恢复到服务器。 在将文件移动到 S3 之前以及恢复文件之后发送电子邮件。
无限存储空间,只需按使用量付费。 不完全是一个现有的开源项目,但组装起来也不太困难。
如果您需要良好的安全性,请在将文件推送到 Amazon 之前对文件进行 GPG 加密。 GPG 非常非常安全。
更昂贵的替代方案是将所有数据存储在本地。 如果您不想购买大型磁盘集群或大型 NAS,您可以使用 HDFS:
并同步到行为类似于 S3 的集群。 您可以使用商用硬件来扩展 HDFS。 特别是如果您已经有几台旧机器和一个快速网络,这可能比严肃的 NAS 便宜得多,并且在大小上更具可扩展性。
祝你好运! 我期待看到更多关于此问题的答案。
S3 is an interesting idea here. Use cron to sync files that are not accessed for over 1 month up to Amazon's S3, then create a web interface for users to restore the sync'd files back to the server. Send emails before you move files to S3 and after they are restored.
Limitless storage, only pay for what you use. Not quite an existing open-source project, but not too tough to assemble.
If you need good security, wrap the files in GPG encryption before pushing them to Amazon. GPG is very, very safe.
A more expensve alternative is to store all the data locally. If you don't want to buy a large disk cluster or big NAS, you could use HDFS:
And sync to a cluster that behaves similar to S3. You can scale HDFS with commodity hardware. Especially if you have a couple old machines and a fast network already laying around, this could be much cheaper than serious NAS, as well as more scalable in size.
Good luck! I look forward to seeing more answers on this.
-请-不要将患者数据上传到 S3(至少不是我的)。
-Please- do not upload patient data to S3 (at least not mine).
谷歌“开源‘文件生命周期管理’”。 抱歉,我只知道商业 SAN 应用程序,不知道是否有 F/OSS 替代品。
商业应用程序的工作方式是文件系统显示正常——所有文件都存在。 但是,如果该文件在一段时间内(对我们来说是 90 天)没有被访问,该文件将被移动到辅助存储。 也就是说,除了前 4094 个字节之外的所有字节都被移动。 文件归档后,如果您查找(读取)超过字节 4094,则在从辅助存储拉回文件时会出现轻微延迟。 我猜测小于 4094 字节的文件永远不会发送到辅助存储,但我从未考虑过这一点。
此方案的唯一问题是,如果您碰巧有某个东西试图扫描您的所有文件(例如网络搜索索引)。 这往往会从辅助存储中拉回所有内容,填满主存储,IT 人员就会开始对您虎视眈眈。 (我是,咳咳,从一些轻微的经验中谈起。)
您可以尝试在 ServerFault.com 上询问这个问题。
如果您很方便,您也许可以使用 cron 和 shell 脚本想出类似的方法。 您必须用符号链接替换 4094 字节的内容(请注意,下面的内容未经测试)。
Google 'open source "file lifecycle management"'. I'm sorry, I'm only aware of commercial SAN apps, not if there are F/OSS alternatives.
The way the commercial apps work is the filesystem appears normal -- all files are present. However, if the file has not been accessed in a certain period (for us, this is 90 days), the file is moved to secondary storage. That is, all but the first 4094 bytes are moved. After a file is archived, if you seek (read) past byte 4094 there is a slight delay while the file is pulled back in from secondary storage. I'm guessing files smaller than 4094 bytes are never sent to secondary storage, but I'd never thought about it.
The only problem with this scheme is if you happen to have something that tries to scan all of your files (a web search index, for example). That tends to pull everything back from secondary storage, fills up primary, and the IT folks start giving you the hairy eyeball. (I'm, ahem, speaking from some slight experience.)
You might try asking this over on ServerFault.com.
If you're handy, you might be able to come up with a similar approach using cron and shell scripts. You'd have to replace the 4094-byte stuff with symlinks (and note, the below is not tested).