归档平面文件的理想选择
目前,我们每周收到数千个平面文件,我有一个系统可以运行这些报告并将其导出为 PDF 供我们的人员处理和参考。
我目前将它们批量加载到数据库中,确保所有字段/格式均有效,导出它们,并在下次运行时截断表。
我想知道的是,每个人都认为存储可能 6 个月的批量加载纯文本数据的最节省空间的方法是什么?
无论是日常 SQL 备份、压缩存档还是其他形式,因此我始终能够重新加载旧数据以进行故障排除。
欢迎任何想法,我愿意接受任何建议。
We receive multiple thousands of flat files per week currently, and I have a system that runs reports on these and exports them to PDF for our people to process and reference.
I currently bulk load these into a database, make sure all fields/formatting is valid, export them, and truncate the tables on the next run.
What I'm wondering is what everyone thinks would be the most space efficient way to store possibly 6 months of this bulk load plain text data?
Either in the form of daily SQL backups, or zipped archives, or whatever, so I always had the ability to reload old data for trouble shooting.
Any ideas are welcome, I'm open to any suggestions.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
那么,您批量加载原始数据的平面文件,使用 SQL Server 2005 来处理它们并获得一组单独的处理后的平面文件,然后转储数据?
好吧,如果这是正确的,那么 SQL 备份将无济于事,因为您似乎在说数据不会保留在数据库中。 您唯一的选择是有效压缩输入和/或输出文件,并在目录中良好地组织批次。
我会推荐一种激进的压缩程序,它具有预定的批处理功能,但请注意不要对您使用的程序进行深奥的操作,以避免被锁定在一个程序中......
So, you bulk-load flat files of raw data, you use SQL Server 2005 to process them and get a separate bunch of processed flat files, and then dump the data?
Well, if this is correct, SQL backups won't help since you seem to be saying the data doesn't stay in the DB. Your only option is efficient compression of the input and/or output files coupled with good organization of the batches in directories.
I would recommend an aggressive compression program, that has scheduled batch functionality, but be careful to not get to esoteric with the program you use for the sake of avoiding being locked in to one program...
使用最新一代的压缩实用程序(7z 和 rar 压缩很棒)并在组织所有内容后压缩成捆绑包,以便轻松查找。
有适用于 7zip 的 SDK 可以与 .net 配合使用,使这一切变得简单。
-亚当
Use a recent generation compression utility (7z and rar compression are great) and compress into bundles after organizing everything so it's easy to find.
There are SDK's for 7zip that work with .net to make this easy.
-Adam
数据后分析有两种类型:
在您的情况下,派生数据可能是进入报告的数据。 对于您的原始数据,我只是制作一个巨大的压缩存档文件,并根据日期和数据类型使用系统名称。 这样做的价值在于,如果您团队中的某个新手以某种方式完全删除了将原始数据导入数据库的代码,您可以从中恢复。 如果派生数据很小,您可能会考虑将其复制到另一个数据库表,或将其保存在单独的平面文件中,因为只需获取派生数据即可解决一些问题。
一般来说,备份数据是一个棘手的问题,因为它取决于以下因素:
你的设置是什么样的? 硬盘驱动器的增长速度是否足以容纳数据的压缩版本? 您是否考虑过异地备份?
There are two types of data post-analysis:
In your case, the derived data might be the data that goes into your reports. For your original data I'd just make a huge, compressed archive file of it with a systematic name based on the date and the type of data. The value of this is that if some newbie on your team somehow totally obliterates the code that imports your original data into the database, you can recover from it. If the derived data is small, you might think about copying that to either another database table, or keeping it in a separate flat file because some of your problems could be solved by just getting to the derived data.
Backing up your data in general is a tricky problem, because it depends on things like:
What's your setup like? Will hard drives grow fast enough to hold the compressed version of your data? Have you thought about off-site backups?
构建一个适当组织文件的文件层次结构,压缩整个目录,并在 zip 上使用
-u
标志来添加新文件。存档后,您可以删除文件,但保留目录下一批要添加的结构。如果文件名以某种方式对版本进行编码(日期或其他)或者是唯一的,那么它不需要比单一目录更奇特。 如果没有,您需要设置目录以允许您恢复版本。
Construct a file hierarchy that organizes the files appropriately, zip the whole directory, and use the
-u
flag on zip to add new files.after you archive them, you can delete the files, but preserve the directory structure for the next batch to be added.If the file names encode the version somehow (dates or whatever) or are otherwise unique it doesn't need to be anything fancier than a signle directory. If not, you need to set up your directories to let you recover versions.
压缩它们并将它们保存在数据库中的二进制字段中。 然后,您可以构建一个“重新加载数据集”按钮来引入您的数据集(我假设您跟踪您导入以替换它的每个数据集等),
这样,所有内容都存储在数据库中并支持与数据库同步,正确索引和链接,同时压缩。
Compress them and save them in a binary field in the database. Then you can build a "reload data-set" button to do bring in your dataset (i'm assuming you keep track of each dataset that you import to replace it, etc.)
This way, everything's stored in the database, and backed up with the database, indexed and linked correctly, and compressed at the same time.
您已表示希望避免在远程系统上使用 SDK 和安装软件。
你的选择非常有限。
既然你使用的是Windows计算机,为什么不使用一个简单的脚本呢?
这个问题提供了一些关于如何使用Windows VBscript压缩和解压缩文件的建议:
Windows 内置 ZIP 压缩可以编写脚本吗?
安装”,没有 SDK。 只需复制脚本,通过调度程序调用它,就可以了。
-亚当
You've indicated that you'd like to avoid SDKs and installing software on remote systems.
Your options are pretty limited.
Since you are using windows computers, why not use a simple script?
This question offers several suggestions on how to use windows VBscript to compress and decompress files:
Can Windows' built-in ZIP compression be scripted?
Nothing to 'install', no SDKs. Just copy the script over, call it via the scheduler, and you're all set.
-Adam