在downcast_iter和Polars中切片的偏移
在Polars中,我看到的返回结果与使用系列切片并试图获得偏移时所期望的结果不同。
我正在创建一个系列,然后切片:
// Make a vec of 3 items, called foo, bar baz
let string_values: Vec<&str> = vec!["foo", "bar", "baz"];
// Add it to a series, this is without dataframes
let series = Series::new("string_values", string_values);
//shape: (3,)
// Series: 'string_values' [str]
// [
// "foo"
// "bar"
// "baz"
// ]
println!("{:?}", series);
这返回了一个新系列。
然后,我可以使用downcast_iter()获取偏移:
// Now we should be able to downcast iter to get the offsets.
// returns [0, 3, 6, 9]
// 0-3 = foo
// 3-6 = bar
// 6-9 = baz
series.utf8().unwrap().downcast_iter().for_each(|array| {
println!("{:?}", array.offsets());
});
到目前为止很棒。
然后,我将其切成薄片:
//shape: (2,)
// Series: 'string_values' [str]
// [
// "bar"
// "baz"
// ]
let series_slice = series.slice(1, 2);
println!("{:?}", series_slice);
这返回正确的值。
然后,我尝试再次使用downcast_iter()
:
// Now we should be able to downcast iter to get the offsets for the slice.
// This returns [3, 6, 9]
// Is "foo" still referenced?
series_slice.utf8().unwrap().downcast_iter().for_each(|array| {
println!("{:?}", array.offsets());
});
它返回3、6、9。为什么返回9?系列的长度为6。
In Polars, I'm seeing a return result different than what I would expect when using slicing with series and trying to get the offsets.
I'm creating a Series, then slicing it:
// Make a vec of 3 items, called foo, bar baz
let string_values: Vec<&str> = vec!["foo", "bar", "baz"];
// Add it to a series, this is without dataframes
let series = Series::new("string_values", string_values);
//shape: (3,)
// Series: 'string_values' [str]
// [
// "foo"
// "bar"
// "baz"
// ]
println!("{:?}", series);
This returns a new series.
I can then using downcast_iter() to get the offsets:
// Now we should be able to downcast iter to get the offsets.
// returns [0, 3, 6, 9]
// 0-3 = foo
// 3-6 = bar
// 6-9 = baz
series.utf8().unwrap().downcast_iter().for_each(|array| {
println!("{:?}", array.offsets());
});
Great so far.
I then slice it:
//shape: (2,)
// Series: 'string_values' [str]
// [
// "bar"
// "baz"
// ]
let series_slice = series.slice(1, 2);
println!("{:?}", series_slice);
This returns the correct values.
I then try and use downcast_iter()
again:
// Now we should be able to downcast iter to get the offsets for the slice.
// This returns [3, 6, 9]
// Is "foo" still referenced?
series_slice.utf8().unwrap().downcast_iter().for_each(|array| {
println!("{:?}", array.offsets());
});
It returns 3, 6, 9. Why is 9 returned? The length of the series is 6.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
可以共享箭头中的缓冲区。除了数据外,它们还具有
offset
和长度
。您的原始箭头字符串阵列包含以下数据:
检索元素
i
使用pseudocode中的以下算法:切片阵列时,我们不复制任何数据。我们仅更新
offset
和length
,以便我们有所有信息来表示切片数组:Buffers in arrow can be shared. Besides the data they also have an
offset
and alength
.You original arrow string array contains of the following data:
Retrieving element
i
uses the following algorithm in pseudocode:When you slice an array, we don't copy any data. We only update the
offset
and thelength
such that we have all information to represent the sliced array: