用于将纯文本(ASCII)转换为 GSM 7 位字符集的 Python 库?

发布于 2024-08-24 21:56:55 字数 58 浏览 6 评论 0原文

是否有用于将 ascii 数据编码为 7 位 GSM 字符集(用于发送 SMS)的 python 库?

Is there a python library for encoding ascii data to 7-bit GSM character set (for sending SMS)?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

吹梦到西洲 2024-08-31 21:56:55

现在有了 :)

感谢 Chad 指出这不是完全正确的

Python2 版本

# -*- coding: utf8 -*- 
gsm = (u"@£$¥èéùìòÇ\nØø\rÅåΔ_ΦΓΛΩΠΨΣΘΞ\x1bÆæßÉ !\"#¤%&'()*+,-./0123456789:;<=>"
       u"?¡ABCDEFGHIJKLMNOPQRSTUVWXYZÄÖÑܧ¿abcdefghijklmnopqrstuvwxyzäöñüà")
ext = (u"````````````````````^```````````````````{}`````\\````````````[~]`"
       u"|````````````````````````````````````€``````````````````````````")

def gsm_encode(plaintext):
    res = ""
    for c in plaintext:
        idx = gsm.find(c)
        if idx != -1:
            res += chr(idx)
            continue
        idx = ext.find(c)
        if idx != -1:
            res += chr(27) + chr(idx)
    return res.encode('hex')

print gsm_encode(u"Hello World")

输出是十六进制。显然如果你想要二进制流

Python3版本你可以跳过

# -*- coding: utf8 -*- 
import binascii
gsm = ("@£$¥èéùìòÇ\nØø\rÅåΔ_ΦΓΛΩΠΨΣΘΞ\x1bÆæßÉ !\"#¤%&'()*+,-./0123456789:;<=>?"
       "¡ABCDEFGHIJKLMNOPQRSTUVWXYZÄÖÑܧ¿abcdefghijklmnopqrstuvwxyzäöñüà")
ext = ("````````````````````^```````````````````{}`````\\````````````[~]`"
       "|````````````````````````````````````€``````````````````````````")

def gsm_encode(plaintext):
    res = ""
    for c in plaintext:
        idx = gsm.find(c);
        if idx != -1:
            res += chr(idx)
            continue
        idx = ext.find(c)
        if idx != -1:
            res += chr(27) + chr(idx)
    return binascii.b2a_hex(res.encode('utf-8'))

print(gsm_encode("Hello World"))

There is now :)

Thanks to Chad for pointing out that this wasn't quite right

Python2 version

# -*- coding: utf8 -*- 
gsm = (u"@£$¥èéùìòÇ\nØø\rÅåΔ_ΦΓΛΩΠΨΣΘΞ\x1bÆæßÉ !\"#¤%&'()*+,-./0123456789:;<=>"
       u"?¡ABCDEFGHIJKLMNOPQRSTUVWXYZÄÖÑܧ¿abcdefghijklmnopqrstuvwxyzäöñüà")
ext = (u"````````````````````^```````````````````{}`````\\````````````[~]`"
       u"|````````````````````````````````````€``````````````````````````")

def gsm_encode(plaintext):
    res = ""
    for c in plaintext:
        idx = gsm.find(c)
        if idx != -1:
            res += chr(idx)
            continue
        idx = ext.find(c)
        if idx != -1:
            res += chr(27) + chr(idx)
    return res.encode('hex')

print gsm_encode(u"Hello World")

The output is hex. Obviously you can skip that if you want the binary stream

Python3 version

# -*- coding: utf8 -*- 
import binascii
gsm = ("@£$¥èéùìòÇ\nØø\rÅåΔ_ΦΓΛΩΠΨΣΘΞ\x1bÆæßÉ !\"#¤%&'()*+,-./0123456789:;<=>?"
       "¡ABCDEFGHIJKLMNOPQRSTUVWXYZÄÖÑܧ¿abcdefghijklmnopqrstuvwxyzäöñüà")
ext = ("````````````````````^```````````````````{}`````\\````````````[~]`"
       "|````````````````````````````````````€``````````````````````````")

def gsm_encode(plaintext):
    res = ""
    for c in plaintext:
        idx = gsm.find(c);
        if idx != -1:
            res += chr(idx)
            continue
        idx = ext.find(c)
        if idx != -1:
            res += chr(27) + chr(idx)
    return binascii.b2a_hex(res.encode('utf-8'))

print(gsm_encode("Hello World"))
无人问我粥可暖 2024-08-31 21:56:55

我从 gnibbler 的回答中得到了提示。这是我在查看在线转换器后以某种方式编写的脚本:http://smstools3。 kekekasvi.com/topic.php?id=288,它对我来说工作正常。编码和解码都可以。

#!/usr/bin/env python
# -*- coding: utf-8 -*-

gsm = (u"@£$¥èéùìòÇ\nØø\rÅåΔ_ΦΓΛΩΠΨΣΘΞ\x1bÆæßÉ !\"#¤%&'()*+,-./0123456789:;<=>"
   u"?¡ABCDEFGHIJKLMNOPQRSTUVWXYZÄÖÑÜ`¿abcdefghijklmnopqrstuvwxyzäöñüà")
ext = (u"````````````````````^```````````````````{}`````\\````````````[~]`"
   u"|````````````````````````````````````€``````````````````````````")

def get_encode(currentByte, index, bitRightCount, position, nextPosition, leftShiftCount, bytesLength, bytes):
    if index < 8:
        byte = currentByte >> bitRightCount
        if nextPosition < bytesLength:
            idx2 = bytes[nextPosition]
            byte = byte | ((idx2) << leftShiftCount)
            byte = byte & 0x000000FF
        else:
            byte = byte & 0x000000FF
        return chr(byte).encode('hex').upper()
    return ''

def getBytes(plaintext):
    if type(plaintext) != str:
         plaintext = str(plaintext)
    bytes = []
    for c in plaintext.decode('utf-8'):
        idx = gsm.find(c)
        if idx != -1:
            bytes.append(idx)
        else:
            idx = ext.find(c)
            if idx != -1:
                bytes.append(27)
                bytes.append(idx)
    return bytes

def gsm_encode(plaintext):
    res = ""
    f = -1
    t = 0
    bytes = getBytes(plaintext)
    bytesLength = len(bytes)
    for b in bytes:
        f = f+1
        t = (f%8)+1
        res += get_encode(b, t, t-1, f, f+1, 8-t, bytesLength, bytes)

    return res


def chunks(l, n):
    if n < 1:
        n = 1
    return [l[i:i + n] for i in range(0, len(l), n)]

def gsm_decode(codedtext):
    hexparts = chunks(codedtext, 2)
    number   = 0
    bitcount = 0
    output   = ''
    found_external = False
    for byte in hexparts:
    byte = int(byte, 16);
        # add data on to the end
        number = number + (byte << bitcount)
        # increase the counter
        bitcount = bitcount + 1
        # output the first 7 bits
        if number % 128 == 27:
             '''skip'''
             found_external = True
        else:
            if found_external == True:                
                 character = ext[number % 128]
                 found_external = False
            else:
                 character = gsm[number % 128]
            output = output + character

        # then throw them away
        number = number >> 7
        # every 7th letter you have an extra one in the buffer
        if bitcount == 7:
            if number % 128 == 27:
                '''skip'''
                found_external = True
            else:
                if found_external == True:                
                    character = ext[number % 128]
                    found_external = False
                else:
                    character = gsm[number % 128]
                output = output + character

            bitcount = 0
            number = 0
    return output

I got tips from gnibbler's answer. Here is a script I somehow made up after looking at an online converter: http://smstools3.kekekasvi.com/topic.php?id=288, and it works correctly for me. Both encoding and decoding.

#!/usr/bin/env python
# -*- coding: utf-8 -*-

gsm = (u"@£$¥èéùìòÇ\nØø\rÅåΔ_ΦΓΛΩΠΨΣΘΞ\x1bÆæßÉ !\"#¤%&'()*+,-./0123456789:;<=>"
   u"?¡ABCDEFGHIJKLMNOPQRSTUVWXYZÄÖÑÜ`¿abcdefghijklmnopqrstuvwxyzäöñüà")
ext = (u"````````````````````^```````````````````{}`````\\````````````[~]`"
   u"|````````````````````````````````````€``````````````````````````")

def get_encode(currentByte, index, bitRightCount, position, nextPosition, leftShiftCount, bytesLength, bytes):
    if index < 8:
        byte = currentByte >> bitRightCount
        if nextPosition < bytesLength:
            idx2 = bytes[nextPosition]
            byte = byte | ((idx2) << leftShiftCount)
            byte = byte & 0x000000FF
        else:
            byte = byte & 0x000000FF
        return chr(byte).encode('hex').upper()
    return ''

def getBytes(plaintext):
    if type(plaintext) != str:
         plaintext = str(plaintext)
    bytes = []
    for c in plaintext.decode('utf-8'):
        idx = gsm.find(c)
        if idx != -1:
            bytes.append(idx)
        else:
            idx = ext.find(c)
            if idx != -1:
                bytes.append(27)
                bytes.append(idx)
    return bytes

def gsm_encode(plaintext):
    res = ""
    f = -1
    t = 0
    bytes = getBytes(plaintext)
    bytesLength = len(bytes)
    for b in bytes:
        f = f+1
        t = (f%8)+1
        res += get_encode(b, t, t-1, f, f+1, 8-t, bytesLength, bytes)

    return res


def chunks(l, n):
    if n < 1:
        n = 1
    return [l[i:i + n] for i in range(0, len(l), n)]

def gsm_decode(codedtext):
    hexparts = chunks(codedtext, 2)
    number   = 0
    bitcount = 0
    output   = ''
    found_external = False
    for byte in hexparts:
    byte = int(byte, 16);
        # add data on to the end
        number = number + (byte << bitcount)
        # increase the counter
        bitcount = bitcount + 1
        # output the first 7 bits
        if number % 128 == 27:
             '''skip'''
             found_external = True
        else:
            if found_external == True:                
                 character = ext[number % 128]
                 found_external = False
            else:
                 character = gsm[number % 128]
            output = output + character

        # then throw them away
        number = number >> 7
        # every 7th letter you have an extra one in the buffer
        if bitcount == 7:
            if number % 128 == 27:
                '''skip'''
                found_external = True
            else:
                if found_external == True:                
                    character = ext[number % 128]
                    found_external = False
                else:
                    character = gsm[number % 128]
                output = output + character

            bitcount = 0
            number = 0
    return output
纵性 2024-08-31 21:56:55

以上所有解决方案都不正确。 GSM 03.38 编码仅使用 7 位来表示一个字符,并且所有上述解决方案都使用字节对齐输出,这在大多数情况下与 ASCII 的结果相同。这是使用位串的正确解决方案。

我正在使用 Python 附加模块:

pip3 install gsm0338

gsmencode.py:

import sys

import gsm0338


def __create_septets__(octets: bytes) -> (bytes, int):
    num_bits = 0
    data = 0
    septets = bytearray()
    for i in range(len(octets)):
        gsm_char = octets[i]
        data |= (gsm_char << num_bits)
        num_bits += 7
        while num_bits >= 8:
            septets.append(data & 0xff)
            data >>= 8
            num_bits -= 8
    if num_bits > 0:
        septets.append(data & 0xff)
    return bytes(septets), len(octets) % 8


if __name__ == '__main__':
    octets = sys.argv[1].encode('gsm03.38')
    septets, sparse = __create_septets__(octets)
    print("sparse bits: %d" % sparse)
    print("encoded (hex): %s" % septets.hex())
python3 gsmencode.py Sample

输出:

sparse bits: 6
encoded (hex): d3701bce2e03

All the above solutions are not correct. A GSM 03.38 encoding is using only 7 bits for a character and all above solutions are using byte aligned output, which is identical to ASCII in most cases as the result. Here is a proper solution using a bit string.

I'm using the Python the additional module:

pip3 install gsm0338

gsmencode.py:

import sys

import gsm0338


def __create_septets__(octets: bytes) -> (bytes, int):
    num_bits = 0
    data = 0
    septets = bytearray()
    for i in range(len(octets)):
        gsm_char = octets[i]
        data |= (gsm_char << num_bits)
        num_bits += 7
        while num_bits >= 8:
            septets.append(data & 0xff)
            data >>= 8
            num_bits -= 8
    if num_bits > 0:
        septets.append(data & 0xff)
    return bytes(septets), len(octets) % 8


if __name__ == '__main__':
    octets = sys.argv[1].encode('gsm03.38')
    septets, sparse = __create_septets__(octets)
    print("sparse bits: %d" % sparse)
    print("encoded (hex): %s" % septets.hex())
python3 gsmencode.py Sample

Output:

sparse bits: 6
encoded (hex): d3701bce2e03
哥,最终变帅啦 2024-08-31 21:56:55

我找不到任何图书馆。但我认为这应该不需要图书馆。它有点容易做到。

这里乔恩·斯基特本人也谈到了同一主题。

示例:

s = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'

def ascii_to_gsm(ch):
    return bin(65 + s.index(ch))

print ascii_to_gsm('A')
print '--'

binary_stream = ''.join([str(ascii_to_gsm(ch))[2:] for ch in s])
print binary_stream

您还可以使用 dict 存储 ASCII 和 GSM 7 之间的映射位字符集

I could not find any library. But I think this should not need a library. Its somewhat easy to do.

Here is Jon Skeet himself on the same topic.

Example:

s = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'

def ascii_to_gsm(ch):
    return bin(65 + s.index(ch))

print ascii_to_gsm('A')
print '--'

binary_stream = ''.join([str(ascii_to_gsm(ch))[2:] for ch in s])
print binary_stream

You can also use dict to store mapping between ASCII and GSM 7-bit character set.

空城缀染半城烟沙 2024-08-31 21:56:55

我最近遇到了类似的问题,我们从聚合器获取 gsm7bit 解码的短信,主要是带有西班牙语字符的 Verizon 运营商,但我们无法成功解码。
这是我在论坛中其他答案的帮助下创建的。这是针对 Python 2.7.x 的。

def gsm7bitdecode(text):
    gsm = (u"@£$¥èéùìòÇ\nØø\rÅåΔ_ΦΓΛΩΠΨΣΘΞ\x1bÆæßÉ !\"#¤%&'()*+,-./0123456789:;<=>"
           u"?¡ABCDEFGHIJKLMNOPQRSTUVWXYZÄÖÑÜ`¿abcdefghijklmnopqrstuvwxyzäöñüà")
    ext = (u"````````````````````^```````````````````{}`````\\````````````[~]`"
           u"|````````````````````````````````````€``````````````````````````")

    text = ''.join(["{0:08b}".format(int(text[i:i+2], 16)) for i in range(0, len(text), 2)][::-1])

    text = [(int(text[::-1][i:i+7][::-1], 2)) for i in range(0, len(text), 7)]
    text = text[:len(text)-1] if text[-1] == 0 else text
    text =iter(text)

    result = []
    for i in text:
        if i == 27:
            i = next(text)
            result.append(ext[i])
        else:
            result.append(gsm[i])

    return "".join(result).rstrip()

I faced a similar issue recently where we were getting gsm7bit decoded text messages, mostly for Verizon carrier with Spanish characters, from the aggregator and we were not able to decode it successfully.
Here is the one I created with the help of other answers in the forum. This is for Python 2.7.x.

def gsm7bitdecode(text):
    gsm = (u"@£$¥èéùìòÇ\nØø\rÅåΔ_ΦΓΛΩΠΨΣΘΞ\x1bÆæßÉ !\"#¤%&'()*+,-./0123456789:;<=>"
           u"?¡ABCDEFGHIJKLMNOPQRSTUVWXYZÄÖÑÜ`¿abcdefghijklmnopqrstuvwxyzäöñüà")
    ext = (u"````````````````````^```````````````````{}`````\\````````````[~]`"
           u"|````````````````````````````````````€``````````````````````````")

    text = ''.join(["{0:08b}".format(int(text[i:i+2], 16)) for i in range(0, len(text), 2)][::-1])

    text = [(int(text[::-1][i:i+7][::-1], 2)) for i in range(0, len(text), 7)]
    text = text[:len(text)-1] if text[-1] == 0 else text
    text =iter(text)

    result = []
    for i in text:
        if i == 27:
            i = next(text)
            result.append(ext[i])
        else:
            result.append(gsm[i])

    return "".join(result).rstrip()

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文