.NET 二进制 XML 与预共享字典
我正在使用 XmlDictionaryWriter 通过数据协定序列化器将对象序列化到数据库。 它效果很好,大小和速度都是使用 text/xml 的 2 倍。
但是,我必须处理数据库中的大量记录,其中任何额外的字节都会直接转换为数据库大小的千兆字节。 这就是为什么我希望通过使用 XML 字典来进一步减小大小。
我该怎么做?
我看到 XmlDictionaryWriter.CreateBinaryWriter 静态方法接受 IXmlDictionary 类型的第二个参数。 MSDN 说“用作共享字典的 XmlDictionary”。
首先,我尝试使用系统提供的实现:
XmlDictionary dict = new XmlDictionary();
string[] dictEntries = new string[]
{
"http://schemas.datacontract.org/2004/07/MyContracts",
"http://www.w3.org/2001/XMLSchema-instance",
"MyElementName1",
"MyElementName2",
"MyElementName3",
};
foreach ( string s in dictEntries )
dict.Add( s );
结果是.NET框架完全忽略字典,并且仍然将上述字符串作为纯文本插入,而不是仅仅引用相应的字典条目。
然后我创建了自己的 IXmlDictionary 实现:
class MyDictionary : IXmlDictionary
{
Dictionary<int, string> values = new Dictionary<int, string>();
Dictionary<string, int> keys = new Dictionary<string, int>();
MyDictionary()
{
string[] dictEntries = new string[]
{
"http://schemas.datacontract.org/2004/07/MyContracts",
"http://www.w3.org/2001/XMLSchema-instance",
"MyElementName1",
"MyElementName2",
"MyElementName3",
};
foreach ( var s in dictEntries )
this.Add( s );
}
static IXmlDictionary s_instance = new MyDictionary();
public static IXmlDictionary instance { get { return s_instance; } }
void Add( string val )
{
if ( keys.ContainsKey( val ) )
return;
int id = values.Count + 1;
values.Add( id, val );
keys.Add( val, id );
}
bool IXmlDictionary.TryLookup( XmlDictionaryString value, out XmlDictionaryString result )
{
if ( value.Dictionary == this )
{
result = value;
return true;
}
return this.TryLookup( value.Value, out result );
}
bool IXmlDictionary.TryLookup( int key, out XmlDictionaryString result )
{
string res;
if ( !values.TryGetValue( key, out res ) )
{
result = null;
return false;
}
result = new XmlDictionaryString( this, res, key );
return true;
}
public bool /* IXmlDictionary. */ TryLookup( string value, out XmlDictionaryString result )
{
int key;
if ( !keys.TryGetValue( value, out key ) )
{
result = null;
return false;
}
result = new XmlDictionaryString( this, value, key );
return true;
}
}
结果是 - 我的 TryLookup 方法调用正常,但是 DataContractSerializer.WriteObject 生成一个空文档。
如何使用预共享词典?
提前致谢!
PS 我不想搞乱 XmlBinaryReaderSession/XmlBinaryWriterSession:我没有“会话”,而是有一个 10 GB 以上的数据库,可以同时由多个线程访问。我想要的只是静态的预定义字典。
更新: 好的,我发现我只需要调用“XmlDictionaryWriter.Flush”。唯一剩下的问题是 - 为什么系统提供的 IXmlDictionary 实现不能按预期工作?
I'm using XmlDictionaryWriter to serialize objects to a database with data contract serializer.
It works great, both size and speed are 2 times better then using text/xml.
However, I'll have to deal with enormous count of records in my database, where any extra bytes are directly translated into the gigabytes of the DB size.
That's why I'd love to reduce the size further, by using an XML dictionary.
How do I do that?
I see that XmlDictionaryWriter.CreateBinaryWriter static method accepts the 2-nd parameter of type IXmlDictionary. The MSDN says "The XmlDictionary to use as the shared dictionary".
First I've tried to use the system-supplied implementation:
XmlDictionary dict = new XmlDictionary();
string[] dictEntries = new string[]
{
"http://schemas.datacontract.org/2004/07/MyContracts",
"http://www.w3.org/2001/XMLSchema-instance",
"MyElementName1",
"MyElementName2",
"MyElementName3",
};
foreach ( string s in dictEntries )
dict.Add( s );
The result is .NET framework completely ignores the dictionary, and still inserts the above strings as plain text instead of just referencing a corresponding dictionary entry.
Then I've created my own implementation of IXmlDictionary:
class MyDictionary : IXmlDictionary
{
Dictionary<int, string> values = new Dictionary<int, string>();
Dictionary<string, int> keys = new Dictionary<string, int>();
MyDictionary()
{
string[] dictEntries = new string[]
{
"http://schemas.datacontract.org/2004/07/MyContracts",
"http://www.w3.org/2001/XMLSchema-instance",
"MyElementName1",
"MyElementName2",
"MyElementName3",
};
foreach ( var s in dictEntries )
this.Add( s );
}
static IXmlDictionary s_instance = new MyDictionary();
public static IXmlDictionary instance { get { return s_instance; } }
void Add( string val )
{
if ( keys.ContainsKey( val ) )
return;
int id = values.Count + 1;
values.Add( id, val );
keys.Add( val, id );
}
bool IXmlDictionary.TryLookup( XmlDictionaryString value, out XmlDictionaryString result )
{
if ( value.Dictionary == this )
{
result = value;
return true;
}
return this.TryLookup( value.Value, out result );
}
bool IXmlDictionary.TryLookup( int key, out XmlDictionaryString result )
{
string res;
if ( !values.TryGetValue( key, out res ) )
{
result = null;
return false;
}
result = new XmlDictionaryString( this, res, key );
return true;
}
public bool /* IXmlDictionary. */ TryLookup( string value, out XmlDictionaryString result )
{
int key;
if ( !keys.TryGetValue( value, out key ) )
{
result = null;
return false;
}
result = new XmlDictionaryString( this, value, key );
return true;
}
}
The result is - my TryLookup methods are called OK, however DataContractSerializer.WriteObject produces an empty document.
How do I use a pre-shared dictionary?
Thanks in advance!
P.S. I don't want to mess with XmlBinaryReaderSession/XmlBinaryWriterSession: I don't have "sessions", instead I have a 10 GB+ database accessed by many threads at once. What I want is just static pre-defined dictionary.
Update: OK I've figured out that I just need to call "XmlDictionaryWriter.Flush". The only remaining question is - why doesn't the system-supplied IXmlDictionary implementation work as expected?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
对于 XmlDictionaryWriter,您需要使用会话。
示例:
for the XmlDictionaryWriter you need to use session.
example:
在未使用
IXmlDictionary
的情况下,我能够复制问题的唯一方法是当我的类未使用DataContract
属性进行修饰时。以下应用程序显示了装饰类和未装饰类的大小差异。The only way I way able to replicate the issue with the
IXmlDictionary
not being used was when my class wasn't decorated with aDataContract
attribute. The following app displays the difference in sizes with decorated and undecorated classes.