C# 中具有 1000 条记录的二进制文件格式

发布于 2024-10-23 18:26:05 字数 324 浏览 2 评论 0原文

我希望将数组模型对象序列化为二进制流。模型类主要具有字符串和整数属性。

我相信我可以将类标记为 [Serialized] 并使用二进制格式化程序,但是我有兴趣知道您是否认为这是最好的方法,记住我的首要任务是传输尽可能小的文件通过低带宽连接(我也可以压缩/解压缩文件)。

该文件可能有 1000 条记录,因此理想情况下,我希望能够附加到磁盘并逐条记录地从磁盘读取,而不必立即将整个文件放入内存中。

所以我的优先事项是:小文件大小和高效的内存使用。

也许有一个预先编写的框架?使用 XML 和 CSV 文件似乎很容易做到!希望它也具有自定义二进制格式。

谢谢

I would like to have an array model objects to be serialized to a binary stream. The model class will mainly have string and integer properties.

I believe that I can mark the class as [Serializable] and use the binary formattter, however I'd be interested to know whether you think this is the best way bearing in mind that my priority is to have as smaller file as possible for transfer over a low bandwidth connection (I can zip/unzip the file too).

The file could have 1000s of records, so ideally I'd like to be able to append to disk and read from disk record by record, without ever having to have the entire file in memory at once.

So my priorities are: small file size and efficient memory use.

Maybe there is a pre-written framework for this? It seems easy to do with XML and CSV files! Hopefully it is with a custom binary format too.

thanks

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

蓝眼睛不忧郁 2024-10-30 18:26:05

我建议使用 protobuf.net,它非常高效。

话虽如此,这将无法处理集合中的单个对象的序列化/反序列化。这部分你需要自己实现。

  • 一种解决方案是: 将对象作为单独的文件存储在文件夹中。文件名将包含一个引用,以便根据名称,您可以找到所需的对象。

  • 另一种方法是拥有一个文件,但保留一个索引文件,该文件保留所有对象及其在文件中的位置的列表。这要复杂得多,因为当您保存位于文件中间的对象时,您必须移动所有其他地址,也许 B 树更有效。

I suggest protobuf.net which is very efficient.

Having said that, this will not be able to handle serialising/deserialsing individual objects in your collection. That part you need to implement yourself.

  • One solution is to: Store objects as individual files in a folder. File name will contain a reference so that based on name, you can find the object you need.

  • Another is to have one file but keep an index file which keeps a list of all objects and their positions in the file. This is a lot more complicated as when you are saving an object which is in the middle of the file, you have to move all other addresses, and perhaps a b-tree is more effective.

冰火雁神 2024-10-30 18:26:05

另一种选择是仅序列化为固定宽度的文本文件格式并让 ZIP 处理压缩。固定宽度意味着您可以轻松使用 MemoryMappedFile 遍历每条记录,而不需要将整个文件加载到内存中。

Another option is to just serialize to a fixed-width text file format and let ZIP handle the compression. Fixed-width means you can easily use a MemoryMappedFile to walk through each record without needing to load the entire file into memory.

潇烟暮雨 2024-10-30 18:26:05

我建议使用 Sql Server Compact 将对象存储为对象而无需序列化,它非常轻量且速度极快,我在高负载下使用它来处理服务器上的大量请求。

我也不建议以二进制格式(序列化)存储数据,因为当涉及到更改要存储的对象时,这将是一个非常痛苦的事情。如果您必须查看存储的内容,这也会很痛苦,因为您必须反序列化整个集合。

至于发送,如果需要,我更喜欢使用带有 zip 压缩的 XML 序列化。如果您需要查看发送的内容或进行一些测试,XML 格式会使调试变得更加容易。

I would recommend using Sql Server Compact to store your objects as objects without serializing, it's quite lightweight and extremely fast, I used it under high payload in serving a lot of requests on server.

I also don't recommend to store your data in binary format (serialized) because it would be a terrific pain when it comes to change the objects you are going to store. It's also painful if you have to see what you are storing, because you have to deserialize the whole collection.

As for sending I prefer using XML-serialization with zip-compression if necessary. XML format makes debugging much easier if you need to take a look at what you are sending or make some tests.

泼猴你往哪里跑 2024-10-30 18:26:05

您可以使用 BinaryFormatter 。对于需要小文件的情况来说,这是一个很好的解决方案,但只有您知道它是否是您域的最佳解决方案。不过,我认为您不能一次读取一张记录。

我目前拥有的唯一示例代码是 DataSet。这些扩展方法将(反)序列化自定义 DataSet,如果我没记错的话,这是拥有可以使用 BinaryFormatter 的类型的最简单方法。

public static TDataSet LoadBinary<TDataSet>(Stream stream) where TDataSet : DataSet
{
    var formatter = new BinaryFormatter();
    return (TDataSet)formatter.Deserialize(stream);
}

public static void WriteBinary<TDataSet>(this TDataSet dataSet, Stream stream) where TDataSet : DataSet
{
    dataSet.RemotingFormat = SerializationFormat.Binary;
    var formatter = new BinaryFormatter();
    formatter.Serialize(stream, dataSet);
}

您还可以查看 DataContractSerializer ,这是 .NET 处理序列化的新“标准”方式(根据 C# 4.0 In A Nutshell、Albarhari 和 Albahari)。在这种情况下,您还需要阅读最佳实践:数据合同版本控制。下面是如何在 XML 和 JSON 中进行序列化(反序列化)的示例,尽管它们不能直接适用于您的情况(因为您需要小文件)。但您可以压缩文件。

/// <summary>
/// Converts this instance to XML using the <see cref="DataContractSerializer"/>.
/// </summary>
/// <typeparam name="TSerializable">
/// A type that is serializable using the <see cref="DataContractSerializer"/>.
/// </typeparam>
/// <param name="value">
/// The object to be serialized to XML.
/// </param>
/// <returns>
/// Formatted XML representing this instance. Does not include the XML declaration.
/// </returns>
public static string ToXml<TSerializable>(this TSerializable value)
{
    var serializer = new DataContractSerializer(typeof(TSerializable));
    var output = new StringWriter();
    using (var writer = new XmlTextWriter(output) { Formatting = Formatting.Indented })
    {
        serializer.WriteObject(writer, value);
    }
    return output.GetStringBuilder().ToString();
}

/// <summary>
/// Converts this instance to XML using the <see cref="DataContractSerializer"/> and writes it to the specified file.
/// </summary>
/// <typeparam name="TSerializable">
/// A type that is serializable using the <see cref="DataContractSerializer"/>.
/// </typeparam>
/// <param name="value">
/// The object to be serialized to XML.
/// </param>
/// <param name="filePath">Path of the file to write to.</param>
public static void WriteXml<TSerializable>(this TSerializable value, string filePath)
{
    var serializer = new DataContractSerializer(typeof(TSerializable));
    using (var writer = XmlWriter.Create(filePath, new XmlWriterSettings { Indent = true }))
    {
        serializer.WriteObject(writer, value);
    }
}

/// <summary>
/// Creates from an instance of the specified class from XML.
/// </summary>
/// <typeparam name="TSerializable">The type of the serializable object.</typeparam>
/// <param name="xml">The XML representation of the instance.</param>
/// <returns>An instance created from the XML input.</returns>
public static TSerializable CreateFromXml<TSerializable>(string xml)
{
    var serializer = new DataContractSerializer(typeof(TSerializable));

    using (var stringReader = new StringReader(xml))
    using (var reader = XmlReader.Create(stringReader))
    {
        return (TSerializable)serializer.ReadObject(reader);
    }
}

/// <summary>
/// Creates from an instance of the specified class from the specified XML file.
/// </summary>
/// <param name="filePath">
/// Path to the XML file.
/// </param>
/// <typeparam name="TSerializable">
/// The type of the serializable object.
/// </typeparam>
/// <returns>
/// An instance created from the XML input.
/// </returns>
public static TSerializable CreateFromXmlFile<TSerializable>(string filePath)
{
    var serializer = new DataContractSerializer(typeof(TSerializable));

    using (var reader = XmlReader.Create(filePath))
    {
        return (TSerializable)serializer.ReadObject(reader);
    }
}

public static T LoadJson<T>(Stream stream) where T : class
{
    var serializer = new DataContractJsonSerializer(typeof(T));
    object readObject = serializer.ReadObject(stream);
    return (T)readObject;
}

public static void WriteJson<T>(this T value, Stream stream) where T : class
{
    var serializer = new DataContractJsonSerializer(typeof(T));
    serializer.WriteObject(stream, value);
}

You can use the BinaryFormatter. It's a good solution for wanting a small file, but only you know if it's the best solution for your domain. I don't think you can read one record at a time, though.

The only example code I have at this time is for a DataSet. These extension methods will (de)serialize a custom DataSet, which, if I recall correctly, was the easiest way to have a type that can use the BinaryFormatter.

public static TDataSet LoadBinary<TDataSet>(Stream stream) where TDataSet : DataSet
{
    var formatter = new BinaryFormatter();
    return (TDataSet)formatter.Deserialize(stream);
}

public static void WriteBinary<TDataSet>(this TDataSet dataSet, Stream stream) where TDataSet : DataSet
{
    dataSet.RemotingFormat = SerializationFormat.Binary;
    var formatter = new BinaryFormatter();
    formatter.Serialize(stream, dataSet);
}

You might also take a look at the DataContractSerializer, which is .NET's new 'standard' way of dealing with serialization (according to C# 4.0 In A Nutshell, Albahari & Albahari). In that case, you'll also want to read Best Practices: Data Contract Versioning. Below are examples of how to (de)serialize in XML and JSON, even though they wouldn't be directly applicable to your situation (since you wanted small files). But you could compress the files.

/// <summary>
/// Converts this instance to XML using the <see cref="DataContractSerializer"/>.
/// </summary>
/// <typeparam name="TSerializable">
/// A type that is serializable using the <see cref="DataContractSerializer"/>.
/// </typeparam>
/// <param name="value">
/// The object to be serialized to XML.
/// </param>
/// <returns>
/// Formatted XML representing this instance. Does not include the XML declaration.
/// </returns>
public static string ToXml<TSerializable>(this TSerializable value)
{
    var serializer = new DataContractSerializer(typeof(TSerializable));
    var output = new StringWriter();
    using (var writer = new XmlTextWriter(output) { Formatting = Formatting.Indented })
    {
        serializer.WriteObject(writer, value);
    }
    return output.GetStringBuilder().ToString();
}

/// <summary>
/// Converts this instance to XML using the <see cref="DataContractSerializer"/> and writes it to the specified file.
/// </summary>
/// <typeparam name="TSerializable">
/// A type that is serializable using the <see cref="DataContractSerializer"/>.
/// </typeparam>
/// <param name="value">
/// The object to be serialized to XML.
/// </param>
/// <param name="filePath">Path of the file to write to.</param>
public static void WriteXml<TSerializable>(this TSerializable value, string filePath)
{
    var serializer = new DataContractSerializer(typeof(TSerializable));
    using (var writer = XmlWriter.Create(filePath, new XmlWriterSettings { Indent = true }))
    {
        serializer.WriteObject(writer, value);
    }
}

/// <summary>
/// Creates from an instance of the specified class from XML.
/// </summary>
/// <typeparam name="TSerializable">The type of the serializable object.</typeparam>
/// <param name="xml">The XML representation of the instance.</param>
/// <returns>An instance created from the XML input.</returns>
public static TSerializable CreateFromXml<TSerializable>(string xml)
{
    var serializer = new DataContractSerializer(typeof(TSerializable));

    using (var stringReader = new StringReader(xml))
    using (var reader = XmlReader.Create(stringReader))
    {
        return (TSerializable)serializer.ReadObject(reader);
    }
}

/// <summary>
/// Creates from an instance of the specified class from the specified XML file.
/// </summary>
/// <param name="filePath">
/// Path to the XML file.
/// </param>
/// <typeparam name="TSerializable">
/// The type of the serializable object.
/// </typeparam>
/// <returns>
/// An instance created from the XML input.
/// </returns>
public static TSerializable CreateFromXmlFile<TSerializable>(string filePath)
{
    var serializer = new DataContractSerializer(typeof(TSerializable));

    using (var reader = XmlReader.Create(filePath))
    {
        return (TSerializable)serializer.ReadObject(reader);
    }
}

public static T LoadJson<T>(Stream stream) where T : class
{
    var serializer = new DataContractJsonSerializer(typeof(T));
    object readObject = serializer.ReadObject(stream);
    return (T)readObject;
}

public static void WriteJson<T>(this T value, Stream stream) where T : class
{
    var serializer = new DataContractJsonSerializer(typeof(T));
    serializer.WriteObject(stream, value);
}
不顾 2024-10-30 18:26:05

如果你想让它小一点,就自己做吧。确保只存储您需要的数据。例如,如果只有 255 个不同的值,则使用一个字节。

http://msdn.microsoft.com/en-us/library/system .bitconverter.aspx

我几乎总是使用像这样的简单结构来存储数据

id (ushort)

data_size (uint)

大小为 data_size 的数据

仅存储您必须拥有的信息,而不考虑它是如何的会习惯的。当您加载它时,您会考虑如何使用数据。

If you want it to be small do it yourself. Make sure to only store the data you need. For example, If you only have 255 different values use a byte.

http://msdn.microsoft.com/en-us/library/system.bitconverter.aspx

I almost always use a simple structure like this to store the data

id (ushort)

data_size (uint)

data of size data_size

Store only the info you have to have and don't think about how it is going to get used. When you load it then you consider how you want to use the data.

最近可好 2024-10-30 18:26:05

我很想坚持使用 BinaryFormatter 来处理对象本身,或者像其他地方建议的那样使用 protobuf.net 。

如果随机访问方面非常重要(按记录读取和附加记录),您可能需要创建一个包含索引文件的 zip 文件(或类似文件),并将每个对象序列化为 zip 中自己的文件(或者可能在小集合中)。

这样,您就可以有效地拥有一个经过压缩的迷你文件系统,并允许您单独访问您的记录。

I'd be tempted to stick with BinaryFormatter for the objects themselves, or perhaps protobuf.net as suggested elsewhere.

If the random access aspect of this is very important (reading and appending record by record) you might want to look at creating a zip file (or similar) containing an index file and each object serialized to its own file in the zip (or perhaps in small collections).

This way, you can effectively have a mini file system which is compressed and gives you access to your records individually.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文