在downcast_iter和Polars中切片的偏移

发布于 2025-02-13 03:55:24 字数 1461 浏览 3 评论 0原文

在Polars中,我看到的返回结果与使用系列切片并试图获得偏移时所期望的结果不同。

我正在创建一个系列,然后切片:

// Make a vec of 3 items, called foo, bar baz
let string_values: Vec<&str> = vec!["foo", "bar", "baz"];
// Add it to a series, this is without dataframes
let series = Series::new("string_values", string_values);

//shape: (3,)
// Series: 'string_values' [str]
// [
//  "foo"
//  "bar"
//  "baz"
// ]
println!("{:?}", series);

这返回了一个新系列。

然后,我可以使用downcast_iter()获取偏移:

// Now we should be able to downcast iter to get the offsets.
// returns [0, 3, 6, 9]
// 0-3 = foo
// 3-6 = bar
// 6-9 = baz
series.utf8().unwrap().downcast_iter().for_each(|array| {
    println!("{:?}", array.offsets());
});

到目前为止很棒。

然后,我将其切成薄片:

//shape: (2,)
// Series: 'string_values' [str]
// [
//  "bar"
//  "baz"
// ]
let series_slice = series.slice(1, 2);
println!("{:?}", series_slice);

这返回正确的值。

然后,我尝试再次使用downcast_iter()

// Now we should be able to downcast iter to get the offsets for the slice.
// This returns [3, 6, 9]
// Is "foo" still referenced?
series_slice.utf8().unwrap().downcast_iter().for_each(|array| {
    println!("{:?}", array.offsets());
});

它返回3、6、9。为什么返回9?系列的长度为6。

In Polars, I'm seeing a return result different than what I would expect when using slicing with series and trying to get the offsets.

I'm creating a Series, then slicing it:

// Make a vec of 3 items, called foo, bar baz
let string_values: Vec<&str> = vec!["foo", "bar", "baz"];
// Add it to a series, this is without dataframes
let series = Series::new("string_values", string_values);

//shape: (3,)
// Series: 'string_values' [str]
// [
//  "foo"
//  "bar"
//  "baz"
// ]
println!("{:?}", series);

This returns a new series.

I can then using downcast_iter() to get the offsets:

// Now we should be able to downcast iter to get the offsets.
// returns [0, 3, 6, 9]
// 0-3 = foo
// 3-6 = bar
// 6-9 = baz
series.utf8().unwrap().downcast_iter().for_each(|array| {
    println!("{:?}", array.offsets());
});

Great so far.

I then slice it:

//shape: (2,)
// Series: 'string_values' [str]
// [
//  "bar"
//  "baz"
// ]
let series_slice = series.slice(1, 2);
println!("{:?}", series_slice);

This returns the correct values.

I then try and use downcast_iter() again:

// Now we should be able to downcast iter to get the offsets for the slice.
// This returns [3, 6, 9]
// Is "foo" still referenced?
series_slice.utf8().unwrap().downcast_iter().for_each(|array| {
    println!("{:?}", array.offsets());
});

It returns 3, 6, 9. Why is 9 returned? The length of the series is 6.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

羁〃客ぐ 2025-02-20 03:55:24

可以共享箭头中的缓冲区。除了数据外,它们还具有offset长度

您的原始箭头字符串阵列包含以下数据:

data:     foobarbaz
offsets:  0, 3, 6, 9
offset:   0
length:   3

检索元素i使用pseudocode中的以下算法:

let offset = array.offset
let start_index = offsets[offset + i]
let end_index = offsets[offset + i + 1]

let string_value = data[start_index..end_index]

切片阵列时,我们不复制任何数据。我们仅更新offsetlength,以便我们有所有信息来表示切片数组:

data:     foobarbaz
offsets:  0, 3, 6, 9
offset:   1
length:   2

Buffers in arrow can be shared. Besides the data they also have an offset and a length.

You original arrow string array contains of the following data:

data:     foobarbaz
offsets:  0, 3, 6, 9
offset:   0
length:   3

Retrieving element i uses the following algorithm in pseudocode:

let offset = array.offset
let start_index = offsets[offset + i]
let end_index = offsets[offset + i + 1]

let string_value = data[start_index..end_index]

When you slice an array, we don't copy any data. We only update the offset and the length such that we have all information to represent the sliced array:

data:     foobarbaz
offsets:  0, 3, 6, 9
offset:   1
length:   2
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文