您是否有使用事务性 NTFS (TxF) 的真实企业级经验?
背景:
我知道这个问题 关于事务性 NTFS (TxF) 和 这篇文章描述如何使用它,但我正在寻找现实世界的体验具有相当大容量企业系统,其中大量 blob 数据(例如文档和/或照片)需要以事务方式保存一次并读取多次< /强>。
- 我们预计每天会写入数万个文档,每小时读取数万个文档。
- 我们可以将索引存储在文件系统或 SQL Server 中,但必须能够将其扩展到多个设备上。
- 我们必须保留轻松备份和恢复数据的能力,以便进行灾难恢复。
问题:
- 有任何使用事务性 NTFS (TxF) 的真实企业级经验吗?
相关问题:
- 有人尝试过使用 TxF 进行分布式事务吗?同一个文件同时提交到两个镜像服务器?
- 有人尝试过使用文件系统和数据库进行分布式事务吗?
- 您可以分享任何性能问题/可靠性问题/性能数据吗? 在交易成为问题之前,是否有人做过如此规模的事情?
编辑:更清楚地说,我研究了其他技术,包括 SQL Server 2008 的新 FILESTREAM 数据类型,但是这个问题专门针对事务文件系统。
更多资源:
- 一篇关于 TxF 的 MSDN 杂志文章,名为 "通过文件系统事务增强您的应用程序”。
- 名为 的网络广播“事务 Vista:内核事务管理器和朋友(TxF、TxR)”。该视频引用了使用 TxF 的 2-5% 的开销,性能讨论大约在 25 分钟后开始。这是我发现的第一组硬数据。该视频很好地概述了其幕后工作原理。 34:30 左右,演讲者描述了与该问题非常相似的场景。
- 名为 "Surendra Verma 的 Channel 9 截屏视频: Vista 事务文件系统”。他谈论了大约 35 分钟开始的表现。没有硬性数据。
- B# 上的 TxF 文章列表.NET 博客。
- 名为 “事务性 NTFS” 的 Channel 9 截屏视频。
Background:
I am aware of this SO question about Transactional NTFS (TxF) and this article describing how to use it, but I am looking for real-world experience with a reasonably high-volume enterprise system where lots of blob data (say documents and/or photos) need to be persisted once transactionally and read many times.
- We are expecting a few tens of thousands of documents written per day and reads of several tens of thousands per hour.
- We could either store indexes within the file system or in SQL Server but must be able to scale this out over several boxes.
- We must retain the ability to back up and restore the data easily for disaster recovery.
The Question:
- Any real-world, enterprise-grade experience with Transactional NTFS (TxF)?
Related questions:
- Anyone tried distributed transactions using TxF where the same file is committed to two mirror servers at once?
- Anyone tried a distributed transaction with the file system and a database?
- Any performance concerns/reliability concerns/performance data you can share?
Has anyone even done something on this scale before where transactions are a concern?
Edits: To be more clear, I have researched other technologies, including SQL Server 2008's new FILESTREAM data type, but this question is specificially targeted at the transactional file system only.
More Resources:
- An MSDN Magazine article on TxF called "Enhance Your Apps With File System Transactions".
- A webcast called "Transactional Vista: Kernel Transaction Manager and friends (TxF, TxR)". This video quotes an overhead from using TxF of 2-5%, with the performance discussion starting about 25 minutes in. This is first set of hard numbers I've found. And the video is a very good overview of how this works under the hood. At about 34:30, the speaker describes a very similar scenario to this question.
- A Channel 9 screencast called "Surendra Verma: Vista Transactional File System". He talks about performance starting around 35 minutes in. No hard numbers.
- A list of TxF articles on the B# .NET Blog.
- An Channel 9 screencast called "Transactional NTFS".
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
我认为“现实世界的企业级”体验比听起来更主观。
Windows 更新使用 TXF。因此,就频率而言,它的使用相当频繁。现在,它没有执行任何多节点工作,也没有通过 DTC 或类似的任何东西,但它使用 TXF 来操作文件状态。它通过对注册表 (TXR) 的更改来协调这些更改。这算吗?
我的一位同事向 SNIA 做了这次演讲,关于 TXF 的许多工作非常坦率,可能会带来更多启发。如果您正在考虑使用 TXF,那么值得一读。
I suppose "real-world, enterprise-grade" experience is more subjective than it sounds.
Windows Update uses TXF. So it is being used quite heavily in terms of frequency. Now, it isn't doing any multi-node work and it isn't going through DTC or anything fancy like that, but it is using TXF to manipulate file state. It coordinates these changes with changes to the registry (TXR). Does that count?
A colleague of mine presented this talk to SNIA, which is pretty frank about a lot of the work around TXF and might shed a little more light. If you're thinking of using TXF, it's worth a read.
不幸的是,答案似乎是“不”。
在近两周(一周有 100 点赏金)和 156 次浏览中,没有人回答说他们已经将 TxF 用于我所描述的任何大容量应用程序。我不能说这是出乎意料的,当然我也不能证明是否定的,但 Windows 的这一功能似乎并不为人所知或经常使用,至少在撰写本文时 SO 社区的活跃成员是如此。
如果我有时间写某种概念证明,我会在这里发布我学到的东西。
Unfortunately, it appears that the answer is "No."
In nearly two weeks (one week with a 100 point bounty) and 156 views, no one has answered that they have used TxF for any high-volume applications as I described. I can't say this was unexpected, and of course I cannot prove a negative, but it appears this feature of Windows is not well known or frequently used, at least by active members of the SO community at the time of writing.
If I ever get around to writing some kind of proof of concept, I'll post here what I learn.
您是否考虑过 SQL Server 2008 中的文件流支持(如果您是当然使用 SQL Server 2008)?我不确定性能,但它提供事务性并支持备份/恢复。
Have you considered filestream support in SQL Server 2008 (if you're using SQL Server 2008 of course)? I'm not sure about performance, but it offers transactionality and supports backup/restore.
虽然我在 TxF 方面没有丰富的经验,但我在 MS DTC 方面确实有经验。 TxF 本身的性能相当不错。当您使用 MS DTC 来处理跨多台计算机的多个资源管理器时,性能会受到相当大的影响。
从您的描述来看,您似乎正在存储和索引大量非结构化数据。我假设您还需要搜索这些数据的能力。因此,我强烈建议您研究一下 Microsoft 的 Dryad 或 < a href="http://en.wikipedia.org/wiki/MapReduce" rel="nofollow noreferrer">Google 的 MapReduce 和高性能分布式文件系统来处理非结构化数据存储和索引。存储和索引大量 blob 数据的大容量企业系统的最佳示例是 Bing 和 Google 等互联网搜索引擎。
有相当多的资源可用于管理高吞吐量非结构化数据,它们可能比 SQL Server 和 NTFS 更有效地解决您的问题。
我知道它比您可能想要的要开箱即用一些……但您确实提到您已经用尽了 NTFS/TxF/SQL 框周围的所有其他搜索途径。 ;)
While I don't have extensive experienve with TxF, I do have experience with MS DTC. TxF itself is fairly performant. When you throw in the MS DTC to handle multiple resource managers across multiple machines, performance takes a considerable hit.
From your description, it sounds like you are storing and indexing very large volumes of unstructured data. I assume that you also need the ability to search for this data. As such, I would highly recommend looking into something like Microsoft's Dryad or Google's MapReduce and a high performance distributed file system to handle your unstructured data storage and indexing. The best examples of high-volume enterprise systems that store and index massive volumes of blob data are Internet search engines like Bing and Google.
There are quite a few resources available for managing high-throughput unstructured data, and they would probably solve your problem more effectively than SQL Server and NTFS.
I know its a bit farther out of the box than you were probably looking for...but you did mention that you had already exhausted all other search avenues around the NTFS/TxF/SQL box. ;)
Ronald:FileStream 位于 TxF 之上。
JR:虽然 Windows Update 使用 TxF/KTM 并演示了它的实用性,但它不是一个高吞吐量应用程序。
Ronald: FileStream is layered on top of TxF.
JR: While Windows Update uses TxF/KTM and demonstrates it's utility, it is not a high throughput application.