C# DataTable(DataRowCollection)存储在临时文件中,而不是内存中?
我想用一个自定义类替换 DataTable,该类通过将行存储在临时数据文件中而不是将它们保留在内存中来实现 DataRowCollection。
我知道与内存中的表相比,这会很慢,但我偶尔需要使用根本不适合 RAM(> 4GB 数据)的表。我将在运行结束时丢弃该表并删除临时文件。
表数据来自数据库查询。我知道我可以更改查询以减少返回的数据集的大小。这不是重点。关键是内存总是有一些限制,我希望可以选择使用缓慢的临时文件,而不是仅仅说“你不能这样做”。
是否有预先编写的类或方法可以做到这一点?看来我在这里重新发明轮子......
这是我的骨架开始:
/// <summary>
/// like DataTable, but storing data in a file instead of memory
/// </summary>
public class FileBackedDataTable : DataTable, IIntegrationTest
{
new public FileBackedDataRowCollection Rows = null;
// Summary:
// Initializes a new instance of the System.Data.DataTable class with no arguments.
public FileBackedDataTable()
{
Rows = new FileBackedDataRowCollection(this);
}
}
/// <summary>
/// like a DataRowCollection but data is stored in a file, not in memory
/// </summary>
public class FileBackedDataRowCollection : ICollection, IEnumerable, IDisposable
{
/// <summary>
/// internally track each file record
/// </summary>
class recordInfo
{
public long recordPosition;
public int recordLength;
public int recordMaxLength;
public long hash;
}
DataTable table;
ArrayList rows = new ArrayList();
public FileBackedDataRowCollection(DataTable table)
{
this.table = table;
openBackingFile(table);
}
public int Count
{
get { return rows.Count; }
}
public void Clear()
{
rows.Clear();
truncateBackingFile();
}
public DataRow this[int index]
{
get
{
recordInfo info = (recordInfo)rows[index];
return readRow(info);
}
set
{
writeRow(index, value);
}
}
private void writeRow(int index, DataRow value)
{
byte[] bytes = rowToBytes(value);
recordInfo info = (recordInfo)rows[index];
if (bytes.Length <= info.recordMaxLength)
{
info.recordLength = bytes.Length;
info.hash = value.GetHashCode();
writeBytes(info.recordPosition, bytes);
}
else
{
rows[index] = appendRow(bytes, value.GetHashCode());
}
}
private DataRow readRow(recordInfo recordInfo)
{
byte[] bytes = readBytes(recordInfo.recordPosition, recordInfo.recordLength);
DataRow row = bytesToRow(bytes);
return row;
}
public void Add(DataRow r)
{
byte[] bytes = rowToBytes(r);
recordInfo info = appendRow(bytes, r.GetHashCode());
rows.Add(info);
}
private recordInfo appendRow(byte[] bytes, long hash)
{
recordInfo info = new recordInfo();
info.recordLength = bytes.Length;
info.recordMaxLength = info.recordLength;
info.recordPosition = appendBytes(bytes);
info.hash = hash;
return info;
}
I would like to replace a DataTable with a custom class that implements DataRowCollection by storing the rows in a temporary data file instead of keeping them in memory.
I understand that this will be slow compared to in-memory tables, but I occasionally need to work with tables that simply will not fit in ram (> 4GB of data). I will discard the table and delete the temporary file at the end of the run.
The table data is coming from a database query. I know that I can change queries to reduce the size of the data set I get back. That is not the point. The point is there will always be some limit on memory and I would like to have the option of using a slow temporary file rather than just saying "you can't do that".
Is there a pre-written class or method of doing this? It seems like I am reinventing the wheel here...
Here is my skeletal start:
/// <summary>
/// like DataTable, but storing data in a file instead of memory
/// </summary>
public class FileBackedDataTable : DataTable, IIntegrationTest
{
new public FileBackedDataRowCollection Rows = null;
// Summary:
// Initializes a new instance of the System.Data.DataTable class with no arguments.
public FileBackedDataTable()
{
Rows = new FileBackedDataRowCollection(this);
}
}
/// <summary>
/// like a DataRowCollection but data is stored in a file, not in memory
/// </summary>
public class FileBackedDataRowCollection : ICollection, IEnumerable, IDisposable
{
/// <summary>
/// internally track each file record
/// </summary>
class recordInfo
{
public long recordPosition;
public int recordLength;
public int recordMaxLength;
public long hash;
}
DataTable table;
ArrayList rows = new ArrayList();
public FileBackedDataRowCollection(DataTable table)
{
this.table = table;
openBackingFile(table);
}
public int Count
{
get { return rows.Count; }
}
public void Clear()
{
rows.Clear();
truncateBackingFile();
}
public DataRow this[int index]
{
get
{
recordInfo info = (recordInfo)rows[index];
return readRow(info);
}
set
{
writeRow(index, value);
}
}
private void writeRow(int index, DataRow value)
{
byte[] bytes = rowToBytes(value);
recordInfo info = (recordInfo)rows[index];
if (bytes.Length <= info.recordMaxLength)
{
info.recordLength = bytes.Length;
info.hash = value.GetHashCode();
writeBytes(info.recordPosition, bytes);
}
else
{
rows[index] = appendRow(bytes, value.GetHashCode());
}
}
private DataRow readRow(recordInfo recordInfo)
{
byte[] bytes = readBytes(recordInfo.recordPosition, recordInfo.recordLength);
DataRow row = bytesToRow(bytes);
return row;
}
public void Add(DataRow r)
{
byte[] bytes = rowToBytes(r);
recordInfo info = appendRow(bytes, r.GetHashCode());
rows.Add(info);
}
private recordInfo appendRow(byte[] bytes, long hash)
{
recordInfo info = new recordInfo();
info.recordLength = bytes.Length;
info.recordMaxLength = info.recordLength;
info.recordPosition = appendBytes(bytes);
info.hash = hash;
return info;
}
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
最近,我一直在研究 System.Data.SQLite 来保存一些应用程序数据,而不是自己编写数据。
如何使用 SQLite 创建一个临时文件并在其中加载旧数据?然后你就可以像本地文件一样使用它并在咀嚼后删除。
Recently, I've been looking at System.Data.SQLite to persist some application data instead of writing one myself.
How about create a temp file with SQLite and load your legacy data there? Then you can use it like a local file and delete after munching.
你的计划几乎 100% 都是糟糕的设计。花一些时间重新设计,使用你的同伴数据库而不是文件,它们是为了操作大块数据而创建的。如果需要,您可以用 C# 或其他语言编写存储过程(如果您的数据库允许)。
描述您想要操作数据的方式,您将得到真正问题的真正答案。它要么需要 SQL 查询,要么如果无法在 SQL 中完成,则可以在某种几乎可以肯定使用较小数据大小的循环中完成。
Almost 100% your plan is bad design. Spend some time on redesign, use your fellow DB instead of FILE they were kinda created to manipulate large chunks of data. IF needed you can write stored procedures in C# or other language if your db allows that.
describe the way you want to manipulate your data and you will get a real answer to your real problem. It will either require SQL query, or if it cant be done in SQL it can be done in some kind of a loop working with smaller data size almost for sure.
您可以使用DataTable.WriteXml。但我会支持其他人,最好首先限制从数据库获取的记录。
You can use DataTable.WriteXml. But I will support other people, it is better to limit the records you get from the database in the first place.