基于分类数据的分类

发布于 2025-01-25 05:39:25 字数 199 浏览 2 评论 0原文

我有一个

Inp1    Inp2        Output
A,B,C   AI,UI,JI    Animals
L,M,N   LI,DO,LI    Noun
X,Y     AI,UI       Extras

用于这些值的数据集,我需要应用ML算法。哪种算法最适合在这些组之间找到关系以将输出类分配给它们?

I have a dataset

Inp1    Inp2        Output
A,B,C   AI,UI,JI    Animals
L,M,N   LI,DO,LI    Noun
X,Y     AI,UI       Extras

For these values, I need to apply a ML algorithm. Which algorithm would be best suited to find relations in between these groups to assign an output class to them?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

冰雪梦之恋 2025-02-01 05:39:25

假设每个单元格是一个列表(因为您在每个单元中存储了多个字符串),并且您不在寻找特定的编码。以下应该有效。它也可以调整以适合不同的编码。

import pandas as pd
A = [["Inp1", "Inp2", "Inp3", "Output"],
[["A","B","C"], ["AI","UI","JI"],["Apple","Bat","Dog"],["Animals"]],
[["L","M","N"], ["LI","DO","LI"], ["Lawn", "Moon", "Noon"], ["Noun"]]]

dataframe = pd.DataFrame(A[1:], columns=A[0])

def my_encoding(row):
    encoded_row = []
    for ls in row:
        encoded_ls = []
        for s in ls:
            sbytes = s.encode('utf-8')
            sint = int.from_bytes(sbytes, 'little')
            encoded_ls.append(sint)
        encoded_row.append(encoded_ls)
    return encoded_row

print(dataframe.apply(my_encoding))

输出:

           Inp1  ...               Output
0  [65, 66, 67]  ...  [32488788024979009]
1  [76, 77, 78]  ...         [1853189966]

如果我的假设不正确,或者这不是您正在寻找的,请告诉我。

Assuming each cell is a list (as you have multiple strings stored in each), and that you are not looking for a specific encoding. The following should work. It can also be adjusted to suit different encodings.

import pandas as pd
A = [["Inp1", "Inp2", "Inp3", "Output"],
[["A","B","C"], ["AI","UI","JI"],["Apple","Bat","Dog"],["Animals"]],
[["L","M","N"], ["LI","DO","LI"], ["Lawn", "Moon", "Noon"], ["Noun"]]]

dataframe = pd.DataFrame(A[1:], columns=A[0])

def my_encoding(row):
    encoded_row = []
    for ls in row:
        encoded_ls = []
        for s in ls:
            sbytes = s.encode('utf-8')
            sint = int.from_bytes(sbytes, 'little')
            encoded_ls.append(sint)
        encoded_row.append(encoded_ls)
    return encoded_row

print(dataframe.apply(my_encoding))

output:

           Inp1  ...               Output
0  [65, 66, 67]  ...  [32488788024979009]
1  [76, 77, 78]  ...         [1853189966]

if my assumptions are incorrect or this is not what you're looking for let me know.

恋你朝朝暮暮 2025-02-01 05:39:25

如您提到的,您将应用ML算法(例如分类),我认为一个热编码是您想要的。

请求格式:

Inp1     Inp2    Inp3      Output
7,44,87  4,65,2  47,36,20  45

此格式无法帮助您将模型训练为单个单元格中的多个标签。但是,您必须像 ohe 一样再次预处理。

建议格式:< / strong>

A  B  C  L  M  N  X  Y  AI  DO  JI  LI  UI  Apple  Bat  Dog  Lawn  Moon  Noon  Yemen  Zombie
1  1  1  0  0  0  0  0   1   0   1   0   1      1    1    1     0     0     0      0       0
0  0  0  1  1  1  0  0   0   1   0   1   0      0    0    0     1     1     1      0       0
0  0  0  0  0  0  1  1   1   0   0   0   1      0    0    0     0     0     0      1       1

以后您可以按照您的模型要求编码 / OHE输出字段。

愉快的学习!

As you mentioned, you are going to apply ML algorithm (say classification), I think One Hot Encoding is what you are looking for.

Requested format:

Inp1     Inp2    Inp3      Output
7,44,87  4,65,2  47,36,20  45

This format can't help you to train your model as multiple labels in a single cell. However you have to pre-process again like OHE.

Suggesting format:

A  B  C  L  M  N  X  Y  AI  DO  JI  LI  UI  Apple  Bat  Dog  Lawn  Moon  Noon  Yemen  Zombie
1  1  1  0  0  0  0  0   1   0   1   0   1      1    1    1     0     0     0      0       0
0  0  0  1  1  1  0  0   0   1   0   1   0      0    0    0     1     1     1      0       0
0  0  0  0  0  0  1  1   1   0   0   0   1      0    0    0     0     0     0      1       1

Hereafter you can label encode / ohe the output field as per your model requires.

Happy learning !

动次打次papapa 2025-02-01 05:39:25

BCE用于多标签分类,而分类CE则用于每个示例属于单个类的多类分类。在您的任务中,您需要了解一个示例,如果您仅在单个类中结束(CE)或单个示例可能以多个类(BCE)结束。可能第二个是正确的,因为动物可以是名词。 )

BCE is for multi-label classifications, whereas categorical CE is for multi-class classification where each example belongs to a single class. In your task you need to understand if for a single example you end in a single class only (CE) or single example may end in multiple classes (BCE). Probable the second is true since animal can be a noun. ;)

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文