Cassandra System.OutOfMemoryException,这是 Thrift 错误吗?
我正在使用 Cassandra 0.8.7、Aquiles 作为 C# 客户端和 Thrift 0.7,我试图从具有以下定义的 SuperColumnFamily 中获取大量数据:
create column family SCF with column_type=Super and comparator=TimeUUIDType and subcomparator=AsciiType;
我想将从 Cassandra 获取的数据插入到 DataTable 中,以便我将能够过滤行并基于此生成一些报告,但我总是收到 OutOfMemoryException。
[OutOfMemoryException: Exception of type 'System.OutOfMemoryException' was thrown.]
Thrift.Transport.TFramedTransport.ReadFrame() +191
Thrift.Transport.TFramedTransport.Read(Byte[] buf, Int32 off, Int32 len) +101
Thrift.Transport.TTransport.ReadAll(Byte[] buf, Int32 off, Int32 len) +76
Thrift.Protocol.TBinaryProtocol.ReadAll(Byte[] buf, Int32 off, Int32 len) +66
Thrift.Protocol.TBinaryProtocol.ReadI32() +47
Thrift.Protocol.TBinaryProtocol.ReadMessageBegin() +75
Apache.Cassandra.Client.recv_multiget_slice() in D:\apache-cassandra-0.8.0-beta2\interface\gen-csharp\Apache\Cassandra\Cassandra.cs:304
Apache.Cassandra.Client.multiget_slice(List`1 keys, ColumnParent column_parent, SlicePredicate predicate, ConsistencyLevel consistency_level) in D:\apache-cassandra-0.8.0-beta2\interface\gen-csharp\Apache\Cassandra\Cassandra.cs:286
我尝试了几种方法来优化我的代码,我的最终版本是分割时间段(以及键的数量,如果它们超过前缀数),我用来将 SuperColumn 切片在较小的范围内,但什么也没有,最终我总是得到同样的例外。
这可能是 Thrift 库的错误吗?当我收到异常时,它总是指向 Thrift.Transport.TFramedTransport 内的以下代码部分:
private void ReadFrame()
{
byte[] i32rd = new byte[header_size];
transport.ReadAll(i32rd, 0, header_size);
int size =
((i32rd[0] & 0xff) << 24) |
((i32rd[1] & 0xff) << 16) |
((i32rd[2] & 0xff) << 8) |
((i32rd[3] & 0xff));
byte[] buff = new byte[size]; //Here the exception is thrown
transport.ReadAll(buff, 0, size);
readBuffer = new MemoryStream(buff);
}
以下是我尝试运行的代码:
string columnFamily = "SCF";
ICluster cluster = AquilesHelper.RetrieveCluster(ConfigurationManager.AppSettings["CLUSTERNAME"].ToString());
ColumnParent columnParent = new ColumnParent()
{
Column_family = columnFamily
};
List<byte[]> keys = //Function that return the list of the key i want to query
SlicePredicate predicate = new SlicePredicate();
foreach (DateTime[] dates in dateList)
{
from = GuidGenerator.GenerateTimeBasedGuid(dates[0]);
to = GuidGenerator.GenerateTimeBasedGuid(dates[1]);
predicate = new SlicePredicate()
{
Slice_range = new SliceRange()
{
Count = int.MaxValue,
Reversed = false,
Start = Aquiles.Helpers.Encoders.ByteEncoderHelper.UUIDEnconder.ToByteArray(from),
Finish = Aquiles.Helpers.Encoders.ByteEncoderHelper.UUIDEnconder.ToByteArray(to)
},
};
cluster.Execute(new ExecutionBlock(delegate(CassandraClient client)
{
int maxKeys = Convert.ToInt32(ConfigurationManager.AppSettings["maxKeys"]);
CassandraMethods.TableCreator(ref dt, columnParent, predicate, keys, client, maxKeys);
return null;
}), ConfigurationManager.AppSettings["KEYSPACE"].ToString());
}
这是应该将数据从 cassandra 插入 DataTable 的函数:
public static DataTable TableCreator(ref DataTable dt, ColumnParent columnParent, SlicePredicate predicate, List<byte[]> keys, CassandraClient client, int maxKeys)
{
int keyCount = keys.Count;
if (keyCount < maxKeys)
CassandraMethods.CassandraToDataTable(ref dt, client.multiget_slice(keys, columnParent, predicate, ConsistencyLevel.ONE));
else
{
int counter = 0;
while (counter < keyCount)
{
if (counter + maxKeys <= keyCount)
CassandraMethods.CassandraToDataTable(ref dt, client.multiget_slice(keys.GetRange(counter, maxKeys), columnParent, predicate, ConsistencyLevel.ONE));
else
CassandraMethods.CassandraToDataTable(ref dt, client.multiget_slice(keys.GetRange(counter, keyCount - counter), columnParent, predicate, ConsistencyLevel.ONE));
counter += maxKeys;
}
}
return dt;
}
我错过了什么吗?我做错了什么?
更新 1: 我也尝试了 Cassandra 1.0、Aquiles 1.0、Thrift 0.6 和 0.7 版本,但没有任何结果,仍然是相同的异常。
更新2:问题已解决,请阅读下面我的回答
I am using Cassandra 0.8.7, Aquiles as C# client and Thrift 0.7 and I am trying to get a quite big amount of data out of a SuperColumnFamily that has the following definition:
create column family SCF with column_type=Super and comparator=TimeUUIDType and subcomparator=AsciiType;
I want to insert the data fetched from Cassandra into a DataTable so i would be able to filter the rows and generate some reports based on that, but I am always getting an OutOfMemoryException.
[OutOfMemoryException: Exception of type 'System.OutOfMemoryException' was thrown.]
Thrift.Transport.TFramedTransport.ReadFrame() +191
Thrift.Transport.TFramedTransport.Read(Byte[] buf, Int32 off, Int32 len) +101
Thrift.Transport.TTransport.ReadAll(Byte[] buf, Int32 off, Int32 len) +76
Thrift.Protocol.TBinaryProtocol.ReadAll(Byte[] buf, Int32 off, Int32 len) +66
Thrift.Protocol.TBinaryProtocol.ReadI32() +47
Thrift.Protocol.TBinaryProtocol.ReadMessageBegin() +75
Apache.Cassandra.Client.recv_multiget_slice() in D:\apache-cassandra-0.8.0-beta2\interface\gen-csharp\Apache\Cassandra\Cassandra.cs:304
Apache.Cassandra.Client.multiget_slice(List`1 keys, ColumnParent column_parent, SlicePredicate predicate, ConsistencyLevel consistency_level) in D:\apache-cassandra-0.8.0-beta2\interface\gen-csharp\Apache\Cassandra\Cassandra.cs:286
I tried several approaches to optimize my code, my final version was to split the period of time (and the number of keys if they exceed a prefixed number) I am using to slice the SuperColumn in smaller ranges but nothing, eventually I always get the same exception.
Can it be a bug of the Thrift library? When I get the exception it always point to the following portion of the code inside Thrift.Transport.TFramedTransport:
private void ReadFrame()
{
byte[] i32rd = new byte[header_size];
transport.ReadAll(i32rd, 0, header_size);
int size =
((i32rd[0] & 0xff) << 24) |
((i32rd[1] & 0xff) << 16) |
((i32rd[2] & 0xff) << 8) |
((i32rd[3] & 0xff));
byte[] buff = new byte[size]; //Here the exception is thrown
transport.ReadAll(buff, 0, size);
readBuffer = new MemoryStream(buff);
}
Following is the code I am trying to run:
string columnFamily = "SCF";
ICluster cluster = AquilesHelper.RetrieveCluster(ConfigurationManager.AppSettings["CLUSTERNAME"].ToString());
ColumnParent columnParent = new ColumnParent()
{
Column_family = columnFamily
};
List<byte[]> keys = //Function that return the list of the key i want to query
SlicePredicate predicate = new SlicePredicate();
foreach (DateTime[] dates in dateList)
{
from = GuidGenerator.GenerateTimeBasedGuid(dates[0]);
to = GuidGenerator.GenerateTimeBasedGuid(dates[1]);
predicate = new SlicePredicate()
{
Slice_range = new SliceRange()
{
Count = int.MaxValue,
Reversed = false,
Start = Aquiles.Helpers.Encoders.ByteEncoderHelper.UUIDEnconder.ToByteArray(from),
Finish = Aquiles.Helpers.Encoders.ByteEncoderHelper.UUIDEnconder.ToByteArray(to)
},
};
cluster.Execute(new ExecutionBlock(delegate(CassandraClient client)
{
int maxKeys = Convert.ToInt32(ConfigurationManager.AppSettings["maxKeys"]);
CassandraMethods.TableCreator(ref dt, columnParent, predicate, keys, client, maxKeys);
return null;
}), ConfigurationManager.AppSettings["KEYSPACE"].ToString());
}
And this is the function that is supposed to insert the data from cassandra into the DataTable:
public static DataTable TableCreator(ref DataTable dt, ColumnParent columnParent, SlicePredicate predicate, List<byte[]> keys, CassandraClient client, int maxKeys)
{
int keyCount = keys.Count;
if (keyCount < maxKeys)
CassandraMethods.CassandraToDataTable(ref dt, client.multiget_slice(keys, columnParent, predicate, ConsistencyLevel.ONE));
else
{
int counter = 0;
while (counter < keyCount)
{
if (counter + maxKeys <= keyCount)
CassandraMethods.CassandraToDataTable(ref dt, client.multiget_slice(keys.GetRange(counter, maxKeys), columnParent, predicate, ConsistencyLevel.ONE));
else
CassandraMethods.CassandraToDataTable(ref dt, client.multiget_slice(keys.GetRange(counter, keyCount - counter), columnParent, predicate, ConsistencyLevel.ONE));
counter += maxKeys;
}
}
return dt;
}
Am I missing anything? What am I doing wrong?
Update 1: I tried also with Cassandra 1.0, Aquiles 1.0, both version 0.6 and 0.7 of Thrift but nothing, still same exception.
Update 2: Problem solved, read my answer below
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
问题解决了:)
我研究了内存使用和垃圾收集器,并解决了问题。
发生的情况是,每当我的应用程序达到 1.5 GB RAM 时,就会引发异常,因为 Visual Studio 将其编译为 32 位应用程序。
作为 x64 编译和运行解决了问题,为了确保不使用太多内存,现在我在每个 Cassandra multiget_slice 调用之前添加了以下 3 行代码。
谢谢,N。
Problem solved :)
I played around with memory usage and garbage collector and I fixed the problem.
What happened was that whenever my application reached 1.5 GB of Ram the exception was thrown due to the fact that visual studio compiled it as a 32bit application.
Compiling and running as x64 solved the issued, to make sure to not use too much memory now i added the following 3 lines of code before each Cassandra multiget_slice call.
Thanks, N.
您的 SuperColumnFamily 中的数据有多大? Thrift 的默认最大帧大小为 15 Mb。这是在
/etc/cassandra/conf/cassandra.yaml
中设置的 - 您可以尝试增加它吗?请注意,不可能将数据分割得小于单个超级列。
How big is the data in your SuperColumnFamily? Thrift has a default maximum frame size of 15 Mb. This is set in
/etc/cassandra/conf/cassandra.yaml
- you could try increasing this?Note that it's not possible to split your data smaller than a single supercolumn.