.NET 二进制 XML 与预共享字典

发布于 2024-10-22 09:25:03 字数 2969 浏览 1 评论 0原文

我正在使用 XmlDictionaryWriter 通过数据协定序列化器将对象序列化到数据库。 它效果很好,大小和速度都是使用 text/xml 的 2 倍。

但是,我必须处理数据库中的大量记录,其中任何额外的字节都会直接转换为数据库大小的千兆字节。 这就是为什么我希望通过使用 XML 字典来进一步减小大小。

我该怎么做?

我看到 XmlDictionaryWriter.CreateBinaryWriter 静态方法接受 IXmlDictionary 类型的第二个参数。 MSDN 说“用作共享字典的 XmlDictionary”。

首先,我尝试使用系统提供的实现:

XmlDictionary dict = new XmlDictionary();
string[] dictEntries = new string[]
{
    "http://schemas.datacontract.org/2004/07/MyContracts",
    "http://www.w3.org/2001/XMLSchema-instance",
    "MyElementName1",
    "MyElementName2",
    "MyElementName3",
};
foreach ( string s in dictEntries )
        dict.Add( s );

结果是.NET框架完全忽略字典,并且仍然将上述字符串作为纯文本插入,而不是仅仅引用相应的字典条目。

然后我创建了自己的 IXmlDictionary 实现:

class MyDictionary : IXmlDictionary
{
    Dictionary<int, string> values = new Dictionary<int, string>();
    Dictionary<string, int> keys = new Dictionary<string, int>();

    MyDictionary()
    {
        string[] dictEntries = new string[]
        {
            "http://schemas.datacontract.org/2004/07/MyContracts",
            "http://www.w3.org/2001/XMLSchema-instance",
            "MyElementName1",
            "MyElementName2",
            "MyElementName3",
        };

        foreach ( var s in dictEntries )
            this.Add( s );
    }

    static IXmlDictionary s_instance = new MyDictionary();
    public static IXmlDictionary instance { get { return s_instance; } }

    void Add( string val )
    {
        if ( keys.ContainsKey( val ) )
            return;
        int id = values.Count + 1;
        values.Add( id, val );
        keys.Add( val, id );
    }

    bool IXmlDictionary.TryLookup( XmlDictionaryString value, out XmlDictionaryString result )
    {
        if ( value.Dictionary == this )
        {
            result = value;
            return true;
        }
        return this.TryLookup( value.Value, out result );
    }

    bool IXmlDictionary.TryLookup( int key, out XmlDictionaryString result )
    {
        string res;
        if ( !values.TryGetValue( key, out res ) )
        {
            result = null;
            return false;
        }
        result = new XmlDictionaryString( this, res, key );
        return true;
    }

    public bool /* IXmlDictionary. */ TryLookup( string value, out XmlDictionaryString result )
    {
        int key;
        if ( !keys.TryGetValue( value, out key ) )
        {
            result = null;
            return false;
        }

        result = new XmlDictionaryString( this, value, key );
        return true;
    }
}

结果是 - 我的 TryLookup 方法调用正常,但是 DataContractSerializer.WriteObject 生成一个空文档。

如何使用预共享词典?

提前致谢!

PS 我不想搞乱 XmlBinaryReaderSession/XmlBinaryWriterSession:我没有“会话”,而是有一个 10 GB 以上的数据库,可以同时由多个线程访问。我想要的只是静态的预定义字典。

更新: 好的,我发现我只需要调用“XmlDictionaryWriter.Flush”。唯一剩下的问题是 - 为什么系统提供的 IXmlDictionary 实现不能按预期工作?

I'm using XmlDictionaryWriter to serialize objects to a database with data contract serializer.
It works great, both size and speed are 2 times better then using text/xml.

However, I'll have to deal with enormous count of records in my database, where any extra bytes are directly translated into the gigabytes of the DB size.
That's why I'd love to reduce the size further, by using an XML dictionary.

How do I do that?

I see that XmlDictionaryWriter.CreateBinaryWriter static method accepts the 2-nd parameter of type IXmlDictionary. The MSDN says "The XmlDictionary to use as the shared dictionary".

First I've tried to use the system-supplied implementation:

XmlDictionary dict = new XmlDictionary();
string[] dictEntries = new string[]
{
    "http://schemas.datacontract.org/2004/07/MyContracts",
    "http://www.w3.org/2001/XMLSchema-instance",
    "MyElementName1",
    "MyElementName2",
    "MyElementName3",
};
foreach ( string s in dictEntries )
        dict.Add( s );

The result is .NET framework completely ignores the dictionary, and still inserts the above strings as plain text instead of just referencing a corresponding dictionary entry.

Then I've created my own implementation of IXmlDictionary:

class MyDictionary : IXmlDictionary
{
    Dictionary<int, string> values = new Dictionary<int, string>();
    Dictionary<string, int> keys = new Dictionary<string, int>();

    MyDictionary()
    {
        string[] dictEntries = new string[]
        {
            "http://schemas.datacontract.org/2004/07/MyContracts",
            "http://www.w3.org/2001/XMLSchema-instance",
            "MyElementName1",
            "MyElementName2",
            "MyElementName3",
        };

        foreach ( var s in dictEntries )
            this.Add( s );
    }

    static IXmlDictionary s_instance = new MyDictionary();
    public static IXmlDictionary instance { get { return s_instance; } }

    void Add( string val )
    {
        if ( keys.ContainsKey( val ) )
            return;
        int id = values.Count + 1;
        values.Add( id, val );
        keys.Add( val, id );
    }

    bool IXmlDictionary.TryLookup( XmlDictionaryString value, out XmlDictionaryString result )
    {
        if ( value.Dictionary == this )
        {
            result = value;
            return true;
        }
        return this.TryLookup( value.Value, out result );
    }

    bool IXmlDictionary.TryLookup( int key, out XmlDictionaryString result )
    {
        string res;
        if ( !values.TryGetValue( key, out res ) )
        {
            result = null;
            return false;
        }
        result = new XmlDictionaryString( this, res, key );
        return true;
    }

    public bool /* IXmlDictionary. */ TryLookup( string value, out XmlDictionaryString result )
    {
        int key;
        if ( !keys.TryGetValue( value, out key ) )
        {
            result = null;
            return false;
        }

        result = new XmlDictionaryString( this, value, key );
        return true;
    }
}

The result is - my TryLookup methods are called OK, however DataContractSerializer.WriteObject produces an empty document.

How do I use a pre-shared dictionary?

Thanks in advance!

P.S. I don't want to mess with XmlBinaryReaderSession/XmlBinaryWriterSession: I don't have "sessions", instead I have a 10 GB+ database accessed by many threads at once. What I want is just static pre-defined dictionary.

Update: OK I've figured out that I just need to call "XmlDictionaryWriter.Flush". The only remaining question is - why doesn't the system-supplied IXmlDictionary implementation work as expected?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

你列表最软的妹 2024-10-29 09:25:03

对于 XmlDictionaryWriter,您需要使用会话。
示例

   private static Stream SerializeBinaryWithDictionary(Person person,DataContractSerializer serializer)
    {
        var stream = new MemoryStream();
        var dictionary = new XmlDictionary();
        var session = new XmlBinaryWriterSession();
        var key = 0;
        session.TryAdd(dictionary.Add("FirstName"), out key);
        session.TryAdd(dictionary.Add("LastName"), out key);
        session.TryAdd(dictionary.Add("Birthday"), out key);
        session.TryAdd(dictionary.Add("Person"), out key);
        session.TryAdd(dictionary.Add("http://www.friseton.com/Name/2010/06"),out key);
        session.TryAdd(dictionary.Add("http://www.w3.org/2001/XMLSchema-instance"),out key);

        var writer = XmlDictionaryWriter.CreateBinaryWriter(stream, dictionary, session);
        serializer.WriteObject(writer, person);
        writer.Flush();
        return stream;
    }

for the XmlDictionaryWriter you need to use session.
example:

   private static Stream SerializeBinaryWithDictionary(Person person,DataContractSerializer serializer)
    {
        var stream = new MemoryStream();
        var dictionary = new XmlDictionary();
        var session = new XmlBinaryWriterSession();
        var key = 0;
        session.TryAdd(dictionary.Add("FirstName"), out key);
        session.TryAdd(dictionary.Add("LastName"), out key);
        session.TryAdd(dictionary.Add("Birthday"), out key);
        session.TryAdd(dictionary.Add("Person"), out key);
        session.TryAdd(dictionary.Add("http://www.friseton.com/Name/2010/06"),out key);
        session.TryAdd(dictionary.Add("http://www.w3.org/2001/XMLSchema-instance"),out key);

        var writer = XmlDictionaryWriter.CreateBinaryWriter(stream, dictionary, session);
        serializer.WriteObject(writer, person);
        writer.Flush();
        return stream;
    }
遥远的绿洲 2024-10-29 09:25:03

在未使用 IXmlDictionary 的情况下,我能够复制问题的唯一方法是当我的类未使用 DataContract 属性进行修饰时。以下应用程序显示了装饰类和未装饰类的大小差异。

using System;
using System.Runtime.Serialization;
using System.Xml;

namespace XmlPresharedDictionary
{
    class Program
    {
        static void Main(string[] args)
        {
            Console.WriteLine("Serialized sizes");
            Console.WriteLine("-------------------------");
            TestSerialization<MyXmlClassUndecorated>("Undecorated: ");
            TestSerialization<MyXmlClassDecorated>("Decorated:   ");
            Console.ReadLine();
        }

        private static void TestSerialization<T>(string lineComment) where T : new()
        {
            XmlDictionary xmlDict = new XmlDictionary();
            xmlDict.Add("MyElementName1");

            DataContractSerializer serializer = new DataContractSerializer(typeof(T));

            using (System.IO.MemoryStream stream = new System.IO.MemoryStream())
            using (var writer = XmlDictionaryWriter.CreateBinaryWriter(stream, xmlDict))
            {
                serializer.WriteObject(writer, new T());
                writer.Flush();
                Console.WriteLine(lineComment + stream.Length.ToString());
            }
        }
    }

    //[DataContract]
    public class MyXmlClassUndecorated
    {
        public MyElementName1[] MyElementName1 { get; set; }

        public MyXmlClassUndecorated()
        {
            MyElementName1 = new MyElementName1[] { new MyElementName1("A A A A A"), new MyElementName1("A A A A A") };
        }
    }

    [DataContract]
    public class MyXmlClassDecorated
    {
        public MyElementName1[] MyElementName1 { get; set; }

        public MyXmlClassDecorated()
        {
            MyElementName1 = new MyElementName1[] { new MyElementName1("A A A A A"), new MyElementName1("A A A A A") };
        }
    }

    [DataContract]
    public class MyElementName1
    {
        [DataMember]
        public string Value { get; set; }

        public MyElementName1(string value) { Value = value; }
    }
}

The only way I way able to replicate the issue with the IXmlDictionary not being used was when my class wasn't decorated with a DataContract attribute. The following app displays the difference in sizes with decorated and undecorated classes.

using System;
using System.Runtime.Serialization;
using System.Xml;

namespace XmlPresharedDictionary
{
    class Program
    {
        static void Main(string[] args)
        {
            Console.WriteLine("Serialized sizes");
            Console.WriteLine("-------------------------");
            TestSerialization<MyXmlClassUndecorated>("Undecorated: ");
            TestSerialization<MyXmlClassDecorated>("Decorated:   ");
            Console.ReadLine();
        }

        private static void TestSerialization<T>(string lineComment) where T : new()
        {
            XmlDictionary xmlDict = new XmlDictionary();
            xmlDict.Add("MyElementName1");

            DataContractSerializer serializer = new DataContractSerializer(typeof(T));

            using (System.IO.MemoryStream stream = new System.IO.MemoryStream())
            using (var writer = XmlDictionaryWriter.CreateBinaryWriter(stream, xmlDict))
            {
                serializer.WriteObject(writer, new T());
                writer.Flush();
                Console.WriteLine(lineComment + stream.Length.ToString());
            }
        }
    }

    //[DataContract]
    public class MyXmlClassUndecorated
    {
        public MyElementName1[] MyElementName1 { get; set; }

        public MyXmlClassUndecorated()
        {
            MyElementName1 = new MyElementName1[] { new MyElementName1("A A A A A"), new MyElementName1("A A A A A") };
        }
    }

    [DataContract]
    public class MyXmlClassDecorated
    {
        public MyElementName1[] MyElementName1 { get; set; }

        public MyXmlClassDecorated()
        {
            MyElementName1 = new MyElementName1[] { new MyElementName1("A A A A A"), new MyElementName1("A A A A A") };
        }
    }

    [DataContract]
    public class MyElementName1
    {
        [DataMember]
        public string Value { get; set; }

        public MyElementName1(string value) { Value = value; }
    }
}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文