生成 one-hot 编码的字符串表示形式

发布于 2024-08-07 13:39:41 字数 788 浏览 4 评论 0原文

在Python中,我需要生成一个dict,将字母映射到预定义的“one-hot" 表示形式。作为说明,dict 应该如下所示:

{ 'A': '1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0',
  'B': '0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0', # ...
}

字母表中的每个字母都有一个位(表示为一个字符)。因此,每个字符串将包含 25 个 0 和 1 个 1。1 的位置由字母表中相应字母的位置决定。

我想出了一些生成此代码的代码:

# Character set is explicitly specified for fine grained control
_letters = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
n = len(_letters)
one_hot = [' '.join(['0']*a + ['1'] + ['0']*b)
            for a, b in zip(range(n), range(n-1, -1, -1))]
outputs = dict(zip(_letters, one_hot))

是否有更高效/更干净/更Pythonic的方法来完成同样的事情?

In Python, I need to generate a dict that maps a letter to a pre-defined "one-hot" representation of that letter. By way of illustration, the dict should look like this:

{ 'A': '1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0',
  'B': '0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0', # ...
}

There is one bit (represented as a character) per letter of the alphabet. Hence each string will contain 25 zeros and one 1. The position of the 1 is determined by the position of the corresponding letter in the alphabet.

I came up with some code that generates this:

# Character set is explicitly specified for fine grained control
_letters = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
n = len(_letters)
one_hot = [' '.join(['0']*a + ['1'] + ['0']*b)
            for a, b in zip(range(n), range(n-1, -1, -1))]
outputs = dict(zip(_letters, one_hot))

Is there a more efficient/cleaner/more pythonic way to do the same thing?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

述情 2024-08-14 13:39:41

我发现这更具可读性:

from string import ascii_uppercase

one_hot = {}
for i, l in enumerate(ascii_uppercase):
    bits = ['0']*26; bits[i] = '1'
    one_hot[l] = ' '.join(bits)

如果您需要更通用的字母表,只需枚举字符串,然后将 ['0']*26 替换为 ['0'] *len(字母)

I find this to be more readable:

from string import ascii_uppercase

one_hot = {}
for i, l in enumerate(ascii_uppercase):
    bits = ['0']*26; bits[i] = '1'
    one_hot[l] = ' '.join(bits)

If you need a more general alphabet, just enumerate over a string of the characters, and replace ['0']*26 with ['0']*len(alphabet).

你是我的挚爱i 2024-08-14 13:39:41

在 Python 2.5 及更高版本中,您可以使用条件运算符:

from string import ascii_uppercase

one_hot = {}
for i, c in enumerate(ascii_uppercase):
    one_hot[c] = ' '.join('1' if j == i else '0' for j in range(26))

In Python 2.5 and up you can use the conditional operator:

from string import ascii_uppercase

one_hot = {}
for i, c in enumerate(ascii_uppercase):
    one_hot[c] = ' '.join('1' if j == i else '0' for j in range(26))
似最初 2024-08-14 13:39:41
one_hot = [' '.join(['0']*a + ['1'] + ['0']*b)
            for a, b in zip(range(n), range(n-1, -1, -1))]
outputs = dict(zip(_letters, one_hot))

特别是,这两行中包含了很多代码。您可以尝试引入解释变量重构。或者可能是提取方法

这是一个例子:

def single_onehot(a, b):
    return ' '.join(['0']*a + ['1'] + ['0']*b)

range_zip = zip(range(n), range(n-1, -1, -1))
one_hot = [ single_onehot(a, b) for a, b in range_zip]
outputs = dict(zip(_letters, one_hot))

尽管你可能不同意我的命名。

one_hot = [' '.join(['0']*a + ['1'] + ['0']*b)
            for a, b in zip(range(n), range(n-1, -1, -1))]
outputs = dict(zip(_letters, one_hot))

In particular, there's a lot of code packed into these two lines. You might try the Introduce Explaining Variable refactoring. Or maybe an extract method.

Here's one example:

def single_onehot(a, b):
    return ' '.join(['0']*a + ['1'] + ['0']*b)

range_zip = zip(range(n), range(n-1, -1, -1))
one_hot = [ single_onehot(a, b) for a, b in range_zip]
outputs = dict(zip(_letters, one_hot))

Although you might disagree with my naming.

梦过后 2024-08-14 13:39:41

对我来说,这似乎非常清晰、简洁、Pythonic。

That seems pretty clear, concise, and Pythonic to me.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文