如何在 Python 3 中透视/交叉表数据？

发布于 2024-12-28 01:45:02 字数 603 浏览 2 评论 0原文

Python 3 中数据透视表/交叉表的最佳解决方案是什么？是否有内置函数可以执行此操作？理想情况下，我正在寻找一个没有外部依赖项的 Python 3 解决方案。例如，给定一个嵌套列表：

nl = [["apples", 2 "New York"], 
      ["peaches", 6, "New York"],
      ["apples", 6, "New York"],
      ["peaches", 1, "Vermont"]]

我希望能够重新排列行数据和 groupby 字段：

             apples    peaches
New York        2         6
Vermont         6         1

上面是一个简单的示例，但是有没有比使用 itertools.groupby 更容易的解决方案每次需要枢轴时？理想情况下，该解决方案将允许行数据在任何列上旋转。我正在争论是否使用 pandas，但它是一个外部库，并且仅具有有限的 Python 3 支持。

原文

What is the best solution to pivot/cross-tab tables in Python 3? Is there a built-in function that will do this? Ideally, I'm looking for a Python 3 solution that does not have external dependencies. For example, given a nested list:

nl = [["apples", 2 "New York"], 
      ["peaches", 6, "New York"],
      ["apples", 6, "New York"],
      ["peaches", 1, "Vermont"]]

I would like to be able to rearrange rowed data and groupby fields:

             apples    peaches
New York        2         6
Vermont         6         1

The above is a trivial example, but is there a solution that would be easier than using itertools.groupby everytime a pivot is desired? Ideally, the solution would allow rowed data to be pivoted on any column. I was debating about using pandas, but it is an external library and only has limited Python 3 support.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

瑶笙 2025-01-04 01:45:02

这是一些简单的代码。提供行/列/总计作为读者的练习。

class CrossTab(object):

    def __init__(
        self,
        missing=0, # what to return for an empty cell.
                   # Alternatives: '', 0.0, None, 'NULL'
        ):
        self.missing = missing
        self.col_key_set = set()
        self.cell_dict = {}
        self.headings_OK = False

    def add_item(self, row_key, col_key, value):
        self.col_key_set.add(col_key)
        try:
            self.cell_dict[row_key][col_key] += value
        except KeyError:
            try:
                self.cell_dict[row_key][col_key] = value
            except KeyError:
                self.cell_dict[row_key] = {col_key: value}

    def _process_headings(self):
        if self.headings_OK:
            return
        self.row_headings = list(sorted(self.cell_dict.keys()))
        self.col_headings = list(sorted(self.col_key_set))
        self.headings_OK = True

    def get_col_headings(self):
        self._process_headings()
        return self.col_headings

    def generate_row_info(self):
        self._process_headings()
        for row_key in self.row_headings:
            row_dict = self.cell_dict[row_key]
            row_vals = [
                row_dict.get(col_key, self.missing)
                for col_key in self.col_headings
                ]
            yield row_key, row_vals

if __name__ == "__main__":

    data = [["apples", 2, "New York"], 
      ["peaches", 6, "New York"],
      ["apples", 6, "New York"],
      ["peaches", 1, "Vermont"]]  

    ctab = CrossTab(missing='uh-oh')
    for s in data:
        ctab.add_item(row_key=s[2], col_key=s[0], value=s[1])
    print()
    print('Column headings:', ctab.get_col_headings())
    for row_heading, row_values in ctab.generate_row_info():
        print(repr(row_heading), row_values)

输出：

Column headings: ['apples', 'peaches']
'New York' [8, 6]
'Vermont' ['uh-oh', 1]

另请参阅此答案。

还有这个，我已经忘记了。

Here is some simple code. Providing row/column/grand totals is left as an exercise for the reader.

class CrossTab(object):

    def __init__(
        self,
        missing=0, # what to return for an empty cell.
                   # Alternatives: '', 0.0, None, 'NULL'
        ):
        self.missing = missing
        self.col_key_set = set()
        self.cell_dict = {}
        self.headings_OK = False

    def add_item(self, row_key, col_key, value):
        self.col_key_set.add(col_key)
        try:
            self.cell_dict[row_key][col_key] += value
        except KeyError:
            try:
                self.cell_dict[row_key][col_key] = value
            except KeyError:
                self.cell_dict[row_key] = {col_key: value}

    def _process_headings(self):
        if self.headings_OK:
            return
        self.row_headings = list(sorted(self.cell_dict.keys()))
        self.col_headings = list(sorted(self.col_key_set))
        self.headings_OK = True

    def get_col_headings(self):
        self._process_headings()
        return self.col_headings

    def generate_row_info(self):
        self._process_headings()
        for row_key in self.row_headings:
            row_dict = self.cell_dict[row_key]
            row_vals = [
                row_dict.get(col_key, self.missing)
                for col_key in self.col_headings
                ]
            yield row_key, row_vals

if __name__ == "__main__":

    data = [["apples", 2, "New York"], 
      ["peaches", 6, "New York"],
      ["apples", 6, "New York"],
      ["peaches", 1, "Vermont"]]  

    ctab = CrossTab(missing='uh-oh')
    for s in data:
        ctab.add_item(row_key=s[2], col_key=s[0], value=s[1])
    print()
    print('Column headings:', ctab.get_col_headings())
    for row_heading, row_values in ctab.generate_row_info():
        print(repr(row_heading), row_values)

Output:

Column headings: ['apples', 'peaches']
'New York' [8, 6]
'Vermont' ['uh-oh', 1]

关于作者

把昨日还给我

暂无简介

文章

431 人气

关注发私信

友情链接

文江博客

如何在 Python 3 中透视/交叉表数据？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

达拉崩吧

PANGOO

kkgtx

WordPress小学生

酷炫老祖宗

硪扪都還晓

友情链接

如何在 Python 3 中透视/交叉表数据？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（2）

关于作者

相关话题

热门标签

推荐作者

达拉崩吧

PANGOO

kkgtx

WordPress小学生

酷炫老祖宗

硪扪都還晓

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。