protobuf-csharp-port - 从文件流式传输记录,有点像 LINQ-to-XML 中的轴函数
我已经构建了 protobuf-csharp-port 附带的标准地址簿教程,我的代码如下:
class Program
{
static void Main(string[] args)
{
CreateData();
ShowData();
}
private static void CreateData()
{
AddressBook.Builder abb = new AddressBook.Builder();
for (int i = 0; i < 2000000; i++)
{
Person.Builder pb = new Person.Builder();
pb.Id = i;
pb.Email = "[email protected]";
pb.Name = "John" + i;
abb.AddPerson(pb.Build());
}
var ab = abb.Build();
var fs = File.Create("c:\\testaddressbook.bin");
ab.WriteTo(fs);
fs.Close();
fs.Dispose();
}
private static void ShowData()
{
var fs = File.Open("c:\\testaddressbook.bin", FileMode.Open, FileAccess.Read, FileShare.Read);
CodedInputStream cis = CodedInputStream.CreateInstance(fs);
cis.SetSizeLimit(Int32.MaxValue);
AddressBook ab = AddressBook.ParseFrom(cis);
Console.WriteLine("Person count: {0}", ab.PersonCount);
for (int i = 0; i < ab.PersonCount; i++)
Console.WriteLine("Name: " + ab.GetPerson(i).Name);
Console.WriteLine("Person count: {0}", ab.PersonCount);
fs.Close();
}
}
在写入数据时,它会占用 300 MB RAM 来存储 2m 条记录。读取时会占用大约 415 MB RAM。
在 XML 世界中,我将使用轴函数流式传输元素。是否可以流式传输地址簿模型对象内的记录?或者也许有另一种方法来实现这一点以更有效地使用内存?
谢谢
I have built the standard address book tutorial that comes with protobuf-csharp-port and my code is as follows:
class Program
{
static void Main(string[] args)
{
CreateData();
ShowData();
}
private static void CreateData()
{
AddressBook.Builder abb = new AddressBook.Builder();
for (int i = 0; i < 2000000; i++)
{
Person.Builder pb = new Person.Builder();
pb.Id = i;
pb.Email = "[email protected]";
pb.Name = "John" + i;
abb.AddPerson(pb.Build());
}
var ab = abb.Build();
var fs = File.Create("c:\\testaddressbook.bin");
ab.WriteTo(fs);
fs.Close();
fs.Dispose();
}
private static void ShowData()
{
var fs = File.Open("c:\\testaddressbook.bin", FileMode.Open, FileAccess.Read, FileShare.Read);
CodedInputStream cis = CodedInputStream.CreateInstance(fs);
cis.SetSizeLimit(Int32.MaxValue);
AddressBook ab = AddressBook.ParseFrom(cis);
Console.WriteLine("Person count: {0}", ab.PersonCount);
for (int i = 0; i < ab.PersonCount; i++)
Console.WriteLine("Name: " + ab.GetPerson(i).Name);
Console.WriteLine("Person count: {0}", ab.PersonCount);
fs.Close();
}
}
On writing the data it takes up 300 MB of RAM for 2m records. On reading it takes up about 415 MB of RAM.
In the XML world, I would stream the elements using an axis function. Is it possible to stream the records inside the address book model object? Or maybe there's another way to implement this for more efficient memory-use?
thanks
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
是的,您可以流式传输阅读和写作。
官方 Java API 和我的 C# API 中都有一个使用
WriteDelimitedTo
/ParseDelimitedFrom
支持的版本。或者,您可以使用
MessageStreamWriter
和MessageStreamIterator
,它们是我在分隔 API 出现之前引入到我的 API 中的。Yes, you can stream both reading and writing.
There's a version supported by the official Java API and also in my C# API, using
WriteDelimitedTo
/ParseDelimitedFrom
.Alternatively, you can use
MessageStreamWriter
andMessageStreamIterator
, which I introduced into my API before the delimited API came along.我无法评论该实现,但在 protobuf-net 中流式传输是完全可能的。如果您想要流式传输的所有对象都是根对象的第一级子对象,那么您可以简单地迭代外部序列;如果它们都是相同类型,则使用
Serializer.DeserializeItems
;如果涉及不同类型的对象,则使用Serializer.NonGeneric.TryDeaerializeWithLengthPrefix
。如果您想要将其视为流的项目位于树的中间,您可以提供替代接收模型;只需在假集合上实现 IEnumerable 和 Add(),它就可以通过您想要的任何 API 推送数据(基于事件,例如 - SAX 之类)。
我还应该注意到,您可以以完全相同的方式序列化流数据。任何时候都不需要有完整的对象模型。
如果您想要更完整的示例,请告诉我。
I can't comment on that implementation, but in protobuf-net streaming is fully possible. If all the objects you want to stream are first-level children of the root object, then you can simply iterate over the outer sequence; using
Serializer.DeserializeItems<T>
if they are all the same type, orSerializer.NonGeneric.TryDeaerializeWithLengthPrefix
if there are different types of objects involved.If the item you want to treat as a stream is in the middle of the tree, you can provide an alternative receiving model; by just implementing IEnumerable and Add() on a fake collection, it can push data through any API you want (event-based, for example - SAX like).
I should also note that you can serialize streaming data in exactly the same ways. It is not required to have a complete object model at any point.
If you want a more complete example, let me know.