如何在不创建大缓冲区的情况下将大型 .NET 对象图序列化为 SQL Server BLOB?

发布于 2024-08-18 11:30:51 字数 1651 浏览 3 评论 0原文

我们有这样的代码:

ms = New IO.MemoryStream
bin = New System.Runtime.Serialization.Formatters.Binary.BinaryFormatter
bin.Serialize(ms, largeGraphOfObjects)
dataToSaveToDatabase = ms.ToArray()
// put dataToSaveToDatabase in a Sql server BLOB

但是内存流从大内存堆中分配了一个大缓冲区,这给我们带来了问题。那么我们如何在不需要足够的可用内存来保存序列化对象的情况下传输数据呢?

我正在寻找一种从 SQL Server 获取 Stream 的方法,然后将其传递给 bin.Serialize() ,从而避免将所有数据保留在我的进程内存中。

同样用于读回数据...


更多背景知识。

这是复杂数值处理系统的一部分,该系统近乎实时地处理数据以查找设备问题等,完成序列化是为了在数据源等的数据质量出现问题时允许重新启动。(我们存储数据源,并可以在操作员编辑掉错误值后重新运行它们。)

因此,我们更频繁地序列化对象然后我们对它们进行反序列化。

我们正在序列化的对象包括非常大的数组(主要是双精度数)以及许多小的“更正常”对象。我们正在推动 32 位系统的内存限制,并使垃圾收集器非常努力地工作。 (系统中的其他地方正在采取措施来改善这一点,例如重用大型数组而不是创建新数组。)

状态的序列化通常是 导致内存不足异常的最后一根稻草;我们的内存使用高峰总是在 这个序列化步骤。

我认为,当我们反序列化对象时,我们会得到大量内存池碎片,考虑到数组的大小,我预计大内存池碎片还会存在其他问题。 (这还没有被调查,因为首先看到这个的人是数值处理专家,而不是内存管理专家。)

我们的客户混合使用 SQL Server 2000、2005 和 2008,我们宁愿没有不同的代码如果可能,每个 SQL Server 版本的路径。

我们一次可以有许多活动模型(在不同的进程中,跨许多机器),每个模型可以有许多保存的状态。因此,保存的状态存储在数据库 blob 中而不是文件中。

由于保存状态的传播很重要,因此我宁愿不将对象序列化到文件,然后将文件一次一个块地放入 BLOB 中。

我问过的其他相关问题

We have code like:

ms = New IO.MemoryStream
bin = New System.Runtime.Serialization.Formatters.Binary.BinaryFormatter
bin.Serialize(ms, largeGraphOfObjects)
dataToSaveToDatabase = ms.ToArray()
// put dataToSaveToDatabase in a Sql server BLOB

But the memory steam allocates a large buffer from the large memory heap that is giving us problems. So how can we stream the data without needing enough free memory to hold the serialized objects.

I am looking for a way to get a Stream from SQL server that can then be passed to bin.Serialize() so avoiding keeping all the data in my processes memory.

Likewise for reading the data back...


Some more background.

This is part of a complex numerical processing system that processes data in near real time looking for equipment problems etc, the serialization is done to allow a restart when there is a problem with data quality from a data feed etc. (We store the data feeds and can rerun them after the operator has edited out bad values.)

Therefore we serialize the object a lot more often then we de-serialize them.

The objects we are serializing include very large arrays mostly of doubles as well as a lot of small “more normal” objects. We are pushing the memory limit on 32 bit systems and make the garbage collector work very hard. (Effects are being made elsewhere in the system to improve this, e.g. reusing large arrays rather then create new arrays.)

Often the serialization of the state is the last straw that causes an out of memory exception; the peak of our memory usage is always during
this serialization step.

I think we get large memory pool fragmentation when we de-serialize the object, I expect there are also other problems with large memory pool fragmentation given the size of the arrays. (This has not yet been investigated, as the person that first looked at this is a numerical processing expert, not a memory management expert.)

Our customers use a mix of SQL Server 2000, 2005 and 2008 and we would rather not have different code paths for each version of SQL Server if possible.

We can have many active models at a time (in different processes, across many machines), each model can have many saved states. Hence the saved state is stored in a database blob rather then a file.

As the spread of saving the state is important, I would rather not serialize the object to a file, and then put the file in a BLOB one block at a time.

Other related questions I have asked

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(7

和影子一齐双人舞 2024-08-25 11:30:51

没有内置的 ADO.Net 功能可以真正优雅地处理大数据。问题有两个:

  • 没有 API 可以像写入流一样“写入”SQL 命令或参数。接受流的参数类型(例如FileStream)接受流以从中READ,这与write的序列化语义不一致成一条流。无论你以哪种方式进行转换,你最终都会得到整个序列化对象的内存副本,这很糟糕。
  • 即使上述问题能够得到解决(而且不可能),TDS 协议和 SQL Server 接受参数的方式也不能很好地处理大参数,因为在启动执行之前必须首先接收整个请求,这将导致在 SQL Server 内创建该对象的附加副本。

所以你真的必须从不同的角度来处理这个问题。幸运的是,有一个相当简单的解决方案。诀窍是使用高效的 UPDATE .WRITE 语法,并在一系列 T-SQL 语句中逐个传递数据块。这是 MSDN 推荐的方法,请参阅 修改 Large-Value ( max) ADO.NET 中的数据。这看起来很复杂,但实际上做起来并插入 Stream 类中很简单。


BlobStream 类

这是解决方案的基础。 Stream 派生类,将 Write 方法实现为对 T-SQL BLOB WRITE 语法的调用。直接来说,唯一有趣的是它必须跟踪第一次更新,因为 UPDATE ... SET blob.WRITE(...) 语法在 NULL 字段上会失败:

class BlobStream: Stream
{
    private SqlCommand cmdAppendChunk;
    private SqlCommand cmdFirstChunk;
    private SqlConnection connection;
    private SqlTransaction transaction;

    private SqlParameter paramChunk;
    private SqlParameter paramLength;

    private long offset;

    public BlobStream(
        SqlConnection connection,
        SqlTransaction transaction,
        string schemaName,
        string tableName,
        string blobColumn,
        string keyColumn,
        object keyValue)
    {
        this.transaction = transaction;
        this.connection = connection;
        cmdFirstChunk = new SqlCommand(String.Format(@"
UPDATE [{0}].[{1}]
    SET [{2}] = @firstChunk
    WHERE [{3}] = @key"
            ,schemaName, tableName, blobColumn, keyColumn)
            , connection, transaction);
        cmdFirstChunk.Parameters.AddWithValue("@key", keyValue);
        cmdAppendChunk = new SqlCommand(String.Format(@"
UPDATE [{0}].[{1}]
    SET [{2}].WRITE(@chunk, NULL, NULL)
    WHERE [{3}] = @key"
            , schemaName, tableName, blobColumn, keyColumn)
            , connection, transaction);
        cmdAppendChunk.Parameters.AddWithValue("@key", keyValue);
        paramChunk = new SqlParameter("@chunk", SqlDbType.VarBinary, -1);
        cmdAppendChunk.Parameters.Add(paramChunk);
    }

    public override void Write(byte[] buffer, int index, int count)
    {
        byte[] bytesToWrite = buffer;
        if (index != 0 || count != buffer.Length)
        {
            bytesToWrite = new MemoryStream(buffer, index, count).ToArray();
        }
        if (offset == 0)
        {
            cmdFirstChunk.Parameters.AddWithValue("@firstChunk", bytesToWrite);
            cmdFirstChunk.ExecuteNonQuery();
            offset = count;
        }
        else
        {
            paramChunk.Value = bytesToWrite;
            cmdAppendChunk.ExecuteNonQuery();
            offset += count;
        }
    }

    // Rest of the abstract Stream implementation
 }

使用 BlobStream

要使用这个新创建的 Blob 流类,您需要将其插入到 BufferedStream 中。该类的设计很简单,仅处理将流写入表的列中。我将重用另一个示例中的表:

CREATE TABLE [dbo].[Uploads](
    [Id] [int] IDENTITY(1,1) NOT NULL,
    [FileName] [varchar](256) NULL,
    [ContentType] [varchar](256) NULL,
    [FileData] [varbinary](max) NULL)

我将添加一个要序列化的虚拟对象:

[Serializable]
class HugeSerialized
{
    public byte[] theBigArray { get; set; }
}

最后,实际的序列化。我们首先将一条新记录插入 Uploads 表中,然后在新插入的 Id 上创建一个 BlobStream 并将序列化直接调用到该流中:

using (SqlConnection conn = new SqlConnection(Settings.Default.connString))
{
    conn.Open();
    using (SqlTransaction trn = conn.BeginTransaction())
    {
        SqlCommand cmdInsert = new SqlCommand(
@"INSERT INTO dbo.Uploads (FileName, ContentType)
VALUES (@fileName, @contentType);
SET @id = SCOPE_IDENTITY();", conn, trn);
        cmdInsert.Parameters.AddWithValue("@fileName", "Demo");
        cmdInsert.Parameters.AddWithValue("@contentType", "application/octet-stream");
        SqlParameter paramId = new SqlParameter("@id", SqlDbType.Int);
        paramId.Direction = ParameterDirection.Output;
        cmdInsert.Parameters.Add(paramId);
        cmdInsert.ExecuteNonQuery();

        BlobStream blob = new BlobStream(
            conn, trn, "dbo", "Uploads", "FileData", "Id", paramId.Value);
        BufferedStream bufferedBlob = new BufferedStream(blob, 8040);

        HugeSerialized big = new HugeSerialized { theBigArray = new byte[1024 * 1024] };
        BinaryFormatter bf = new BinaryFormatter();
        bf.Serialize(bufferedBlob, big);

        trn.Commit();
    }
}

如果您监视执行情况在这个简单的示例中,您将看到没有任何地方创建大型序列化流。该示例将分配 [1024*1024] 的数组,但这只是出于演示目的,以便有一些内容可以序列化。此代码以缓冲方式逐块序列化,使用 SQL Server BLOB 建议的每次更新大小 8040 字节。

There is no built-in ADO.Net functionality to handle this really gracefully for large data. The problem is two fold:

  • there is no API to 'write' into a SQL command(s) or parameters as into a stream. The parameter types that accept a stream (like FileStream) accept the stream to READ from it, which does not agree with the serialization semantics of write into a stream. No matter which way you turn this, you end up with a in memory copy of the entire serialized object, bad.
  • even if the point above would be solved (and it cannot be), the TDS protocol and the way SQL Server accepts parameters do not work well with large parameters as the entire request has to be first received before it is launched into execution and this would create additional copies of the object inside SQL Server.

So you really have to approach this from a different angle. Fortunately, there is a fairly easy solution. The trick is to use the highly efficient UPDATE .WRITE syntax and pass in the chunks of data one by one, in a series of T-SQL statements. This is the MSDN recommended way, see Modifying Large-Value (max) Data in ADO.NET. This looks complicated, but is actually trivial to do and plug into a Stream class.


The BlobStream class

This is the bread and butter of the solution. A Stream derived class that implements the Write method as a call to the T-SQL BLOB WRITE syntax. Straight forward, the only thing interesting about it is that it has to keep track of the first update because the UPDATE ... SET blob.WRITE(...) syntax would fail on a NULL field:

class BlobStream: Stream
{
    private SqlCommand cmdAppendChunk;
    private SqlCommand cmdFirstChunk;
    private SqlConnection connection;
    private SqlTransaction transaction;

    private SqlParameter paramChunk;
    private SqlParameter paramLength;

    private long offset;

    public BlobStream(
        SqlConnection connection,
        SqlTransaction transaction,
        string schemaName,
        string tableName,
        string blobColumn,
        string keyColumn,
        object keyValue)
    {
        this.transaction = transaction;
        this.connection = connection;
        cmdFirstChunk = new SqlCommand(String.Format(@"
UPDATE [{0}].[{1}]
    SET [{2}] = @firstChunk
    WHERE [{3}] = @key"
            ,schemaName, tableName, blobColumn, keyColumn)
            , connection, transaction);
        cmdFirstChunk.Parameters.AddWithValue("@key", keyValue);
        cmdAppendChunk = new SqlCommand(String.Format(@"
UPDATE [{0}].[{1}]
    SET [{2}].WRITE(@chunk, NULL, NULL)
    WHERE [{3}] = @key"
            , schemaName, tableName, blobColumn, keyColumn)
            , connection, transaction);
        cmdAppendChunk.Parameters.AddWithValue("@key", keyValue);
        paramChunk = new SqlParameter("@chunk", SqlDbType.VarBinary, -1);
        cmdAppendChunk.Parameters.Add(paramChunk);
    }

    public override void Write(byte[] buffer, int index, int count)
    {
        byte[] bytesToWrite = buffer;
        if (index != 0 || count != buffer.Length)
        {
            bytesToWrite = new MemoryStream(buffer, index, count).ToArray();
        }
        if (offset == 0)
        {
            cmdFirstChunk.Parameters.AddWithValue("@firstChunk", bytesToWrite);
            cmdFirstChunk.ExecuteNonQuery();
            offset = count;
        }
        else
        {
            paramChunk.Value = bytesToWrite;
            cmdAppendChunk.ExecuteNonQuery();
            offset += count;
        }
    }

    // Rest of the abstract Stream implementation
 }

Using the BlobStream

To use this newly created blob stream class you plug into a BufferedStream. The class has a trivial design that handles only writing the stream into a column of a table. I'll reuse a table from another example:

CREATE TABLE [dbo].[Uploads](
    [Id] [int] IDENTITY(1,1) NOT NULL,
    [FileName] [varchar](256) NULL,
    [ContentType] [varchar](256) NULL,
    [FileData] [varbinary](max) NULL)

I'll add a dummy object to be serialized:

[Serializable]
class HugeSerialized
{
    public byte[] theBigArray { get; set; }
}

Finally, the actual serialization. We'll first insert a new record into the Uploads table, then create a BlobStream on the newly inserted Id and call the serialization straight into this stream:

using (SqlConnection conn = new SqlConnection(Settings.Default.connString))
{
    conn.Open();
    using (SqlTransaction trn = conn.BeginTransaction())
    {
        SqlCommand cmdInsert = new SqlCommand(
@"INSERT INTO dbo.Uploads (FileName, ContentType)
VALUES (@fileName, @contentType);
SET @id = SCOPE_IDENTITY();", conn, trn);
        cmdInsert.Parameters.AddWithValue("@fileName", "Demo");
        cmdInsert.Parameters.AddWithValue("@contentType", "application/octet-stream");
        SqlParameter paramId = new SqlParameter("@id", SqlDbType.Int);
        paramId.Direction = ParameterDirection.Output;
        cmdInsert.Parameters.Add(paramId);
        cmdInsert.ExecuteNonQuery();

        BlobStream blob = new BlobStream(
            conn, trn, "dbo", "Uploads", "FileData", "Id", paramId.Value);
        BufferedStream bufferedBlob = new BufferedStream(blob, 8040);

        HugeSerialized big = new HugeSerialized { theBigArray = new byte[1024 * 1024] };
        BinaryFormatter bf = new BinaryFormatter();
        bf.Serialize(bufferedBlob, big);

        trn.Commit();
    }
}

If you monitor the execution of this simple sample you'll see that nowhere is a large serialization stream created. The sample will allocate the array of [1024*1024] but that is for demo purposes to have something to serialize. This code serializes in a buffered manner, chunk by chunk, using the SQL Server BLOB recommended update size of 8040 bytes at a time.

生生漫 2024-08-25 11:30:51

您所需要的只是 .NET Framework 4.5 和流式传输。假设硬盘上有一个大文件,并且我们想要上传该文件。

SQL 代码:

CREATE TABLE BigFiles 
(
    [BigDataID] [int] IDENTITY(1,1) NOT NULL,
    [Data] VARBINARY(MAX) NULL
)

C# 代码:

using (FileStream sourceStream = new FileStream(filePath, FileMode.Open))
{
    using (SqlCommand cmd = new SqlCommand(string.Format("UPDATE BigFiles SET Data=@Data WHERE BigDataID = @BigDataID"), _sqlConn))
    {
        cmd.Parameters.AddWithValue("@Data", sourceStream);
        cmd.Parameters.AddWithValue("@BigDataID", entryId);

        cmd.ExecuteNonQuery();
    }
}

对我来说效果很好。我已经成功上传了 400 MB 的文件,而当我尝试将此文件加载到内存中时,MemoryStream 抛出异常。

UPD:此代码适用于 Windows 7,但在 Windows XP 和 2003 Server 上失败。

All you need is .NET Framework 4.5 and streaming. Let's assume we have a big file on HDD and we want to upload this file.

SQL code:

CREATE TABLE BigFiles 
(
    [BigDataID] [int] IDENTITY(1,1) NOT NULL,
    [Data] VARBINARY(MAX) NULL
)

C# code:

using (FileStream sourceStream = new FileStream(filePath, FileMode.Open))
{
    using (SqlCommand cmd = new SqlCommand(string.Format("UPDATE BigFiles SET Data=@Data WHERE BigDataID = @BigDataID"), _sqlConn))
    {
        cmd.Parameters.AddWithValue("@Data", sourceStream);
        cmd.Parameters.AddWithValue("@BigDataID", entryId);

        cmd.ExecuteNonQuery();
    }
}

Works good for me. I have successfully uploaded the file of 400 mb, while MemoryStream throwed an exception when I tried to load this file into memory.

UPD: This code works on Windows 7, but failed on Windows XP and 2003 Server.

被翻牌 2024-08-25 11:30:51

您始终可以使用 Microsoft 从第一天起就使用的有线协议 TDS(表格数据流)在较低级别写入 SQL Server。即使 SQLAzure 使用它,他们也不太可能随时更改它!

您可以从 Mono 项目和 freetds 项目中查看其工作原理的源代码

查看 tds_blob

You can always write to SQL Server at a lower level using the over the wire protocol TDS (tabular data stream) that Microsoft has used since day one. They are unlikely to change it any time soon as even SQLAzure uses it!

You can see source code of how this works from the Mono project and from the freetds project

Check out the tds_blob

吻安 2024-08-25 11:30:51

图表是什么样的?

这里的一个问题是流; SQL 2005 的要求很痛苦,否则您可以直接写入 SqlFileStream,但是,我认为编写自己的 Stream 实现不会太难缓冲 8040(或多个)字节并增量写入。但是,我不确定是否值得这种额外的复杂性 - 我非常很想只使用文件作为暂存缓冲区,然后then(一旦序列化)循环在文件上插入/附加块。我不认为文件系统会损害您的整体性能,并且它会避免您开始写入注定要写入的数据 - 即,在您已经知道要写入什么数据之前,您不会与数据库对话。它还将帮助您最大限度地减少连接打开的时间。

下一个问题是序列化本身。就我个人而言,我建议使用BinaryFormatter写入持久存储(仅用于传输),因为它是编码器本身和您的类型中特定于实现的(即如果您对数据类型进行看似无辜的更改,那么它会很脆弱)。

如果您的数据可以充分表示为(而不是完整的图),我会很想尝试协议缓冲区/protobuf-net。这种编码(由 Google 设计)比 BinaryFormatter 输出更小,读写速度更快,并且基于契约而不是基于字段,因此您可以稍后可靠地再次重新水化它(即使您完全切换平台)。

默认选项意味着它必须在每个对象之前写入对象长度(这在您的情况下可能会很昂贵),但是如果您有大型(深)对象的嵌套列表,您可以使用分组编码来避免这种需要 - 允许它以仅前向、单通道的方式写入流;这是一个使用分组编码的简短简单示例,但如果您想向我抛出一个更复杂的场景,请告诉我...

using System;
using System.Collections.Generic;
using System.IO;
using ProtoBuf;
[ProtoContract]
public class Foo {
    private readonly List<Bar> bars = new List<Bar>();
    [ProtoMember(1, DataFormat = DataFormat.Group)]
    public List<Bar> Bars { get { return bars;}}
}
[ProtoContract]
public class Bar {
    [ProtoMember(1)]
    public int Id { get; set; }
    [ProtoMember(2)]
    public string Name { get; set; }
}
static class Program {
    static void Main() {
        var obj = new Foo { Bars = {
            new Bar { Id = 123, Name = "abc"},
            new Bar { Id = 456, Name = "def"},
        } };
        // write it and show it
        using (MemoryStream ms = new MemoryStream()) {
            Serializer.Serialize(ms, obj);
            Console.WriteLine(BitConverter.ToString(ms.ToArray()));
        }
    }
}

注意:我确实一些理论关于如何破解 Google 的有线格式以支持完整图表,但需要一些时间来尝试。哦,关于“非常大的数组”——对于原始类型(不是对象),你可以使用“打包”编码; [DataMember(..., Options = MemberSerializationOptions.Packed)] - 可能很有用,但在模型不可见的情况下很难说。

What does the graph look like?

One problem here is the stream; the SQL 2005 requirement is a pain, as otherwise you could write directly to SqlFileStream, however, I don't think it would be too hard to write your own Stream implementation that buffers 8040 (or some multiple) bytes and writes it incrementally. However, I'm not sure that it is worth this extra complexity - I would be hugely tempted to just use a file as the scratch buffer and then (once serialized) loop over the file inserting/appending chunks. I don't think that the file system is going to hurt your overall performance here, and it will save you starting to write doomed data - i.e. you don't talk to the database until you already know what data you want to write. It will also help you minimise the time the connection is open.

The next problem is the serialization itself. Personally I don't recommend using BinaryFormatter to write to persistent stores (only for transport), since it is implementation specific both in the encoder itself, and in your types (i.e. it is brittle if you make innocent-looking changes to your data types).

If your data can be represented sufficiently as a tree (rather than a full graph), I would be very tempted to try protocol buffers / protobuf-net. This encoding (devised by Google) is smaller than the BinaryFormatter output, faster both for read and write, and is contract-based rather than field-based, so you can reliably rehydrate it again later (even if you switch platform entirely).

The default options mean that it has to write the object-length before each object (which might be expensive in your case), but if you have nested lists of large (deep) objects you can use grouped encoding to avoid this need - allowing it to write the stream in a forwards-only, single-pass way; here's a brief simple example using grouped encoding, but if you want to throw a more complex scenario at me, just let me know...

using System;
using System.Collections.Generic;
using System.IO;
using ProtoBuf;
[ProtoContract]
public class Foo {
    private readonly List<Bar> bars = new List<Bar>();
    [ProtoMember(1, DataFormat = DataFormat.Group)]
    public List<Bar> Bars { get { return bars;}}
}
[ProtoContract]
public class Bar {
    [ProtoMember(1)]
    public int Id { get; set; }
    [ProtoMember(2)]
    public string Name { get; set; }
}
static class Program {
    static void Main() {
        var obj = new Foo { Bars = {
            new Bar { Id = 123, Name = "abc"},
            new Bar { Id = 456, Name = "def"},
        } };
        // write it and show it
        using (MemoryStream ms = new MemoryStream()) {
            Serializer.Serialize(ms, obj);
            Console.WriteLine(BitConverter.ToString(ms.ToArray()));
        }
    }
}

Note: I do have some theories on how to hack Google's wire format to support full graphs, but it is going to need some time to try it. Oh, re the "very large arrays" - for primitive types (not objects) yuo can use "packed" encoding for this; [DataMember(..., Options = MemberSerializationOptions.Packed)] - might be useful, but hard to say without visibility of your model.

寂寞陪衬 2024-08-25 11:30:51

为什么不实现自己的 system::io:stream 派生类?这将允许您通过 将其直接附加到 SQL 列UpdateText 用于写入。

例如(伪代码)

插入带有 blob 列的数据库记录
“初始化”(参见上面的 UpdateText
文章)
创建您的流类型 /
将数据库连接与

将流传递给
序列化调用

它可以对它的调用进行分块(我认为一次是 8040 个字节的倍数),并在每个完整缓冲区上将其传递给具有正确偏移量的 DB UpdateText 调用。

在流关闭时,您将通过 UpdateText 刷新未完全填充缓冲区的剩余内容。

同样,您可以使用相同/相似的派生流来允许从数据库列读取,并将其传递以进行反序列化。

创建派生流并不是那么多工作 - 我已经在 C++/CLI 中完成了它,以提供与 IStream 的互操作性 - 如果我能做到:)...(我可以为您提供 C++/CLI 流代码 。

如果您将整个操作(插入初始行、调用通过流更新 blob)放入一个事务中,那么如果序列化步骤失败,您将避免任何潜在的数据库不一致

Why not implement your own system::io:stream derived class? which would allow you to attach it to the SQL column directly via UpdateText for writing.

eg (pseudo-code)

Insert DB Record with blob column
'initialized' (see above UpdateText
article)
Create Your Stream Type /
Associate DB connection with the
stream
Pass the stream to the
serialize call

It could chunk up (Multiple of 8040 bytes at a time, i presume) the calls to it and on each full buffer pass that to the DB UpdateText call with the proper offset.

On close of the stream you'd flush whatever was left that didn't fill the buffer entirely via UpdateText.

Likewise you could use the same/similar derived stream to allow reading from a DB column, passing that to be deserialized.

Creating a derived Stream is not all that much work - i've done it in C++/CLI to provide interoperability with IStream's -- and if i can do it :)... (i can provide you the C++/CLI stream code i've done as a sample if that would be helpful)

If you put the entire operation (Insert of initial row, calls to update the blob via the stream) into a transaction you would avoid any potential db inconsistencies if the serialization step fails.

夏日浅笑〃 2024-08-25 11:30:51

我会带着文件去。基本上使用文件系统作为 SQL Server 和应用程序之间的中间体。

  1. 序列化大对象时,将其序列化为FileStream
  2. 将其导入数据库,指示数据库在保存数据时直接使用该文件。可能看起来像这样:

    插入MyTable
    ([我的专栏])
    SELECT b.BulkColumn, FROM OPENROWSET(BULK N'C:\Path To My File\File.ext', SINGLE_BLOB) as b

  3. 读回数据时,指示 SQL 将大列保存回文件系统作为临时文件,将其反序列化到内存后将其删除(无需立即删除,因为可以在此处进行缓存)。不太确定 sql 命令的用途,因为我肯定不是数据库专家,但我很确定一定有一个。

  4. 再次使用FileStream对象将其反序列化回内存。

这个过程可以推广到一个帮助程序类来执行此操作,该类将知道何时删除这些临时文件,因为如果您确定 sql 数据记录的值没有更改,您可以重用它们。

I would go with files. Basically use the file system as an intermediate between the SQL Server and your application.

  1. When serializing a large object, serialize it into a FileStream.
  2. To import it into the database instruct the database to use the file directly when saving the data. Would probably look something like this:

    INSERT INTO MyTable
    ( [MyColumn] )
    SELECT b.BulkColumn, FROM OPENROWSET(BULK N'C:\Path To My File\File.ext', SINGLE_BLOB) as b

  3. When reading back the data, instruct the SQL to save the big column back into the file system as a temporary file, which you will delete after deserializing it into memory ( no need to delete it immediately, as possible caching can be done here ). Not really sure what the sql command is for that as I am for sure no DB expert, but I am pretty sure there must be one.

  4. Using again a FileStream object to deserialize it back into memory.

This procedure can be generalized into a helper class to do it, which will know when to delete those temporary files, as you can reuse them if you know for sure that the value of the sql data record hasn't changed.

酷到爆炸 2024-08-25 11:30:51

请注意,自 SQL Server 2012 以来,还有与 FILESTREAM 类似的 FileTable,只不过它也允许非事务性访问。

https://msdn.microsoft.com/en-us/library/hh403405 .aspx#CompareFileTable

Note that since SQL Server 2012 there's also FileTable what is similar to FILESTREAM except that allows non-transactional access too.

https://msdn.microsoft.com/en-us/library/hh403405.aspx#CompareFileTable

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文