我如何将函数（在本例中是抓取关键字）应用于 df 中列的每一行，并将其放入新列中？

发布于 2025-01-16 06:52:49 字数 790 浏览 0 评论 0原文

我有一个关于同源蛋白质的数据框（近 3000 个！），其中包括每个蛋白质功能的描述。从这个描述中，我想从每个单元格中获取一个关键字并将其放在单独的列中。这是为了创建蛋白质的分类。

我正在创建一个函数，使用 yake!: 从“描述”列的每个单独行的文本中提取关键字：

def generate_keyword():
kw_extractor = yake.KeywordExtractor(n=2, top=40)
keywords = kw_extractor.extract_keywords(data["description"])

    for kw in keywords:
        print(kw)

然后我尝试将此信息放入数据框中的新列（“关键字”）中，例如所以：

data["keyword"] = data["description"].apply(generate_keyword())

然后，当我尝试运行它时，它会给出这两条消息：

Warning! Exception: 'Series' object has no attribute 'split' generated by the following text: '0       Mitochondrial malate dehydrogenase;catalyzes i...

.......

TypeError: 'NoneType' object is not callable

我认为错误出在我为函数标记参数的方式中，但我不知道如何修复它。非常感谢任何帮助！

原文

I have a dataframe about homologous proteins (almost 3000 of them!), which includes the description of each proteins' function. From this description I want to grab a key word from each cell and put it in a separate column. This is in order to create a classification of the proteins.

I am creating a function to extract key-words from the text of each individual row of the 'description' column, using yake!:

def generate_keyword():
kw_extractor = yake.KeywordExtractor(n=2, top=40)
keywords = kw_extractor.extract_keywords(data["description"])

    for kw in keywords:
        print(kw)

And then I am trying to put this information into a new column ('keyword') in the dataframe like so:

data["keyword"] = data["description"].apply(generate_keyword())

It then gives these two messages when I try to run it:

Warning! Exception: 'Series' object has no attribute 'split' generated by the following text: '0       Mitochondrial malate dehydrogenase;catalyzes i...

.......

TypeError: 'NoneType' object is not callable

I think the mistake is somewhere in how I'm labelling the parameters for my function, but I have no clue how to fix it. Any help is greatly appreciated!

分享到QQ

分享到微博