检测发送文本所需的短信数量的最佳方法

发布于 2024-12-18 20:43:13 字数 2623 浏览 2 评论 0原文

我正在 php 中寻找一个代码/lib,我将调用它并向其传递文本,它会告诉我:

  1. 我需要使用什么编码才能将此文本作为短信发送(7,8,16位)
  2. 我将使用多少条短信来发送此文本(计算“分段信息”必须很聪明,如 http://ozekisms.com/index.php?owpn=612

您知道是否存在可以为我执行此操作的代码/lib?

再说一次,我不是在寻找发送短信或转换短信,只是为了向我提供有关文本的信息

更新:

好的,我执行了以下代码,它似乎工作正常,如果您有,请告诉我更好/优化的代码/解决方案/lib

$text = '\@£$¥èéùìòÇØøÅåΔ_ΦΓΛΩΠΨΣΘΞÆæßÉ -./0123456789:;<=>?¡ABCDEFGHIJKLMNOPQRSTUVWXYZÄÖÑܧ¿abcdefghijklmnopqrstuvwxyzäöñüà^{}[~]|€' ; //"\\". //'"';//' ';

print $text . "\n";
print isGsm7bit($text). "\n";
print getNumberOfSMSsegments($text). "\n";




function getNumberOfSMSsegments($text,$MaxSegments=6){
/*
http://en.wikipedia.org/wiki/SMS

Larger content (concatenated SMS, multipart or segmented SMS, or "long SMS") can be sent using multiple messages, 
in which case each message will start with a user data header (UDH) containing segmentation information. 
Since UDH is part of the payload, the number of available characters per segment is lower: 
153 for 7-bit encoding, 
134 for 8-bit encoding and 
67 for 16-bit encoding. 
The receiving handset is then responsible for reassembling the message and presenting it to the user as one long message. 
While the standard theoretically permits up to 255 segments,[35] 6 to 8 segment messages are the practical maximum, 
and long messages are often billed as equivalent to multiple SMS messages. See concatenated SMS for more information. 
Some providers have offered length-oriented pricing schemes for messages, however, the phenomenon is disappearing.
*/
$TotalSegment=0;
$textlen = mb_strlen($text);
if($textlen==0) return false; //I can see most mobile devices will not allow you to send empty sms, with this check we make sure we don't allow empty SMS

if(isGsm7bit($text)){ //7-bit
    $SingleMax=160;
    $ConcatMax=153;
}else{ //UCS-2 Encoding (16-bit)
    $SingleMax=70;
    $ConcatMax=67;
}

if($textlen<=$SingleMax){
    $TotalSegment = 1;
}else{
    $TotalSegment = ceil($textlen/$ConcatMax);
}

if($TotalSegment>$MaxSegments) return false; //SMS is very big.
return $TotalSegment;
}

function isGsm7bit($text){
$gsm7bitChars = "\\\@£\$¥èéùìòÇ\nØø\rÅåΔ_ΦΓΛΩΠΨΣΘΞÆæßÉ !\"#¤%&'()*+,-./0123456789:;<=>?¡ABCDEFGHIJKLMNOPQRSTUVWXYZÄÖÑܧ¿abcdefghijklmnopqrstuvwxyzäöñüà^{}[~]|€";
$textlen = mb_strlen($text);
for ($i = 0;$i < $textlen; $i++){
    if ((strpos($gsm7bitChars, $text[$i])==false) && ($text[$i]!="\\")){return false;} //strpos not able to detect \ in string
}
return true;
}

I'm looking for a code/lib in php that I will call it and pass a text to it and it will tell me:

  1. What is the encode I need to use in order to send this text as SMS (7,8,16 bit)
  2. How many SMS message I will use to send this text (it must be smart to count "segmenation information" like in http://ozekisms.com/index.php?owpn=612)

do you have any idea of any code/lib exists that will do this for me?

Again I'm not looking for sending SMS or converting SMS, just to give me information about the text

Update:

Ok I did the below code and it seems to be working fine, let me know if you have better/optimized code/solution/lib

$text = '\@£$¥èéùìòÇØøÅåΔ_ΦΓΛΩΠΨΣΘΞÆæßÉ -./0123456789:;<=>?¡ABCDEFGHIJKLMNOPQRSTUVWXYZÄÖÑܧ¿abcdefghijklmnopqrstuvwxyzäöñüà^{}[~]|€' ; //"\\". //'"';//' ';

print $text . "\n";
print isGsm7bit($text). "\n";
print getNumberOfSMSsegments($text). "\n";




function getNumberOfSMSsegments($text,$MaxSegments=6){
/*
http://en.wikipedia.org/wiki/SMS

Larger content (concatenated SMS, multipart or segmented SMS, or "long SMS") can be sent using multiple messages, 
in which case each message will start with a user data header (UDH) containing segmentation information. 
Since UDH is part of the payload, the number of available characters per segment is lower: 
153 for 7-bit encoding, 
134 for 8-bit encoding and 
67 for 16-bit encoding. 
The receiving handset is then responsible for reassembling the message and presenting it to the user as one long message. 
While the standard theoretically permits up to 255 segments,[35] 6 to 8 segment messages are the practical maximum, 
and long messages are often billed as equivalent to multiple SMS messages. See concatenated SMS for more information. 
Some providers have offered length-oriented pricing schemes for messages, however, the phenomenon is disappearing.
*/
$TotalSegment=0;
$textlen = mb_strlen($text);
if($textlen==0) return false; //I can see most mobile devices will not allow you to send empty sms, with this check we make sure we don't allow empty SMS

if(isGsm7bit($text)){ //7-bit
    $SingleMax=160;
    $ConcatMax=153;
}else{ //UCS-2 Encoding (16-bit)
    $SingleMax=70;
    $ConcatMax=67;
}

if($textlen<=$SingleMax){
    $TotalSegment = 1;
}else{
    $TotalSegment = ceil($textlen/$ConcatMax);
}

if($TotalSegment>$MaxSegments) return false; //SMS is very big.
return $TotalSegment;
}

function isGsm7bit($text){
$gsm7bitChars = "\\\@£\$¥èéùìòÇ\nØø\rÅåΔ_ΦΓΛΩΠΨΣΘΞÆæßÉ !\"#¤%&'()*+,-./0123456789:;<=>?¡ABCDEFGHIJKLMNOPQRSTUVWXYZÄÖÑܧ¿abcdefghijklmnopqrstuvwxyzäöñüà^{}[~]|€";
$textlen = mb_strlen($text);
for ($i = 0;$i < $textlen; $i++){
    if ((strpos($gsm7bitChars, $text[$i])==false) && ($text[$i]!="\\")){return false;} //strpos not able to detect \ in string
}
return true;
}

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

灰色世界里的红玫瑰 2024-12-25 20:43:13

我在这里添加一些额外的信息,因为之前的答案不太正确。

这些问题是:

  • 需要将当前字符串编码指定为 mb_string,否则可能会错误地收集
  • 在 7 位 GSM 编码中,基本字符集扩展字符 (^{}\[~] |€) 每个都需要 14 位进行编码,因此它们各算作两个字符。
  • 在 UCS-2 编码中,您必须警惕 16 位 BMP 之外的表情符号和其他字符,因为...
  • 带有 UCS-2 的 GSM 计算 16 位字符,因此如果您有

I'm adding some extra information here because the previous answer isn't quite correct.

These are the issues:

  • You need to be specifying the current string encoding to mb_string, otherwise this may be incorrectly gathered
  • In 7-bit GSM encoding, the Basic Charset Extended characters (^{}\[~]|€) require 14-bits each to encode, so they count as two characters each.
  • In UCS-2 encoding, you have to be wary of emoji and other characters outside the 16-bit BMP, because...
  • GSM with UCS-2 counts 16-bit characters, so if you have a ???? character (U+1F4A9), and your carrier and phone sneakily support UTF-16 and not just UCS-2, it will be encoded as a surrogate pair of 16-bit characters in UTF-16, and thus be counted as TWO 16-bit characters toward your string length. mb_strlen will count this as a single character only.

How to count 7-bit characters:

What I've come up with so far is the following to count 7-bit characters:

// Internal encoding must be set to UTF-8,
// and the input string must be UTF-8 encoded for this to work correctly
protected function count_gsm_string($str)
{
    // Basic GSM character set (one 7-bit encoded char each)
    $gsm_7bit_basic = "@£$¥èéùìòÇ\nØø\rÅåΔ_ΦΓΛΩΠΨΣΘΞÆæßÉ !\"#¤%&'()*+,-./0123456789:;<=>?¡ABCDEFGHIJKLMNOPQRSTUVWXYZÄÖÑܧ¿abcdefghijklmnopqrstuvwxyzäöñüà";

    // Extended set (requires escape code before character thus 2x7-bit encodings per)
    $gsm_7bit_extended = "^{}\\[~]|€";

    $len = 0;

    for($i = 0; $i < mb_strlen($str); $i++) {
        $c = mb_substr($str, i, 1);
        if(mb_strpos($gsm_7bit_basic, $c) !== FALSE) {
            $len++;
        } else if(mb_strpos($gsm_7bit_extended, $c) !== FALSE) {
            $len += 2;
        } else {
            return -1; // cannot be encoded as GSM, immediately return -1
        }
    }

    return $len;
}

How to count 16-bit characters:

  • Convert the string into UTF-16 representation (to preserve the emoji characters with mb_convert_encoding($str, 'UTF-16', 'UTF-8').
  • do not convert into UCS-2 as this is lossy with mb_convert_encoding)
  • Count bytes with count(unpack('C*', $utf16str)) and divide by two to get the number of UCS-2 16-bit characters that count toward the GSM multipart length

*caveat emptor, a word on counting bytes:

  • Do not use strlen to count the number of bytes. While it may work, strlen is often overloaded in PHP installations with a multibyte-capable version, and is also a candidate for API change in the future
  • Avoid mb_strlen($str, 'UCS-2'). While it does currently work, and will return, correctly, 2 for a pile of poo character (as it looks like two 16-bit UCS-2 characters), its stablemate mb_convert_encoding is lossy when converting from >16-bit to UCS-2. Who's to say that mb_strlen won't be lossy in the future?
  • Avoid mb_strlen($str, '8bit') / 2. It also currently works, and is recommended in a PHP docs comment as a way to count bytes. But IMO it suffers from the same issue as the above UCS-2 technique.
  • That leaves the safest current way (IMO) as unpacking into a byte array, and counting that.

So, what does this look like?

// Internal encoding must be set to UTF-8,
// and the input string must be UTF-8 encoded for this to work correctly
protected function count_ucs2_string($str)
{
    $utf16str = mb_convert_encoding($str, 'UTF-16', 'UTF-8');
    // C* option gives an unsigned 16-bit integer representation of each byte
    // which option you choose doesn't actually matter as long as you get one value per byte
    $byteArray = unpack('C*', $utf16str);
    return count($byteArray) / 2;
}

Putting it all together:

function multipart_count($str)
{
    $one_part_limit = 160; // use a constant i.e. GSM::SMS_SINGLE_7BIT
    $multi_limit = 153; // again, use a constant
    $max_parts = 3; // ... constant

    $str_length = count_gsm_string($str);
    if($str_length === -1) {
        $one_part_limit = 70; // ... constant
        $multi_limit = 67; // ... constant
        $str_length = count_ucs2_string($str);
    }

    if($str_length <= $one_part_limit) {
        // fits in one part
        return 1;
    }

    if($str_length > ($max_parts * $multi_limit)) {
            // too long
        return -1; // or throw exception, or false, etc.
    }

    // divide the string length by multi_limit and round up to get number of parts
    return ceil($str_length / $multi_limit);
}

Turned this into a library...

https://bitbucket.org/solvam/smstools

快乐很简单 2024-12-25 20:43:13

迄今为止我拥有的最佳解决方案:

$text = '\@£$¥èéùìòÇØøÅåΔ_ΦΓΛΩΠΨΣΘΞÆæßÉ -./0123456789:;<=>?¡ABCDEFGHIJKLMNOPQRSTUVWXYZÄÖÑܧ¿abcdefghijklmnopqrstuvwxyzäöñüà^{}[~]|€' ; //"\\". //'"';//' ';

print $text . "\n";
print isGsm7bit($text). "\n";
print getNumberOfSMSsegments($text). "\n";

function getNumberOfSMSsegments($text,$MaxSegments=6){
/*
http://en.wikipedia.org/wiki/SMS

Larger content (concatenated SMS, multipart or segmented SMS, or "long SMS") can be sent using multiple messages, 
in which case each message will start with a user data header (UDH) containing segmentation information. 
Since UDH is part of the payload, the number of available characters per segment is lower: 
153 for 7-bit encoding, 
134 for 8-bit encoding and 
67 for 16-bit encoding. 
The receiving handset is then responsible for reassembling the message and presenting it to the user as one long message. 
While the standard theoretically permits up to 255 segments,[35] 6 to 8 segment messages are the practical maximum, 
and long messages are often billed as equivalent to multiple SMS messages. See concatenated SMS for more information. 
Some providers have offered length-oriented pricing schemes for messages, however, the phenomenon is disappearing.
*/
$TotalSegment=0;
$textlen = mb_strlen($text);
if($textlen==0) return false; //I can see most mobile devices will not allow you to send empty sms, with this check we make sure we don't allow empty SMS

if(isGsm7bit($text)){ //7-bit
    $SingleMax=160;
    $ConcatMax=153;
}else{ //UCS-2 Encoding (16-bit)
    $SingleMax=70;
    $ConcatMax=67;
}

if($textlen<=$SingleMax){
    $TotalSegment = 1;
}else{
    $TotalSegment = ceil($textlen/$ConcatMax);
}

if($TotalSegment>$MaxSegments) return false; //SMS is very big.
return $TotalSegment;
}

function isGsm7bit($text){
$gsm7bitChars = "\\\@£\$¥èéùìòÇ\nØø\rÅåΔ_ΦΓΛΩΠΨΣΘΞÆæßÉ !\"#¤%&'()*+,-./0123456789:;<=>?¡ABCDEFGHIJKLMNOPQRSTUVWXYZÄÖÑܧ¿abcdefghijklmnopqrstuvwxyzäöñüà^{}[~]|€";
$textlen = mb_strlen($text);
for ($i = 0;$i < $textlen; $i++){
    if ((strpos($gsm7bitChars, $text[$i])==false) && ($text[$i]!="\\")){return false;} //strpos not     able to detect \ in string
}
return true;
}

The best solution I have so far:

$text = '\@£$¥èéùìòÇØøÅåΔ_ΦΓΛΩΠΨΣΘΞÆæßÉ -./0123456789:;<=>?¡ABCDEFGHIJKLMNOPQRSTUVWXYZÄÖÑܧ¿abcdefghijklmnopqrstuvwxyzäöñüà^{}[~]|€' ; //"\\". //'"';//' ';

print $text . "\n";
print isGsm7bit($text). "\n";
print getNumberOfSMSsegments($text). "\n";

function getNumberOfSMSsegments($text,$MaxSegments=6){
/*
http://en.wikipedia.org/wiki/SMS

Larger content (concatenated SMS, multipart or segmented SMS, or "long SMS") can be sent using multiple messages, 
in which case each message will start with a user data header (UDH) containing segmentation information. 
Since UDH is part of the payload, the number of available characters per segment is lower: 
153 for 7-bit encoding, 
134 for 8-bit encoding and 
67 for 16-bit encoding. 
The receiving handset is then responsible for reassembling the message and presenting it to the user as one long message. 
While the standard theoretically permits up to 255 segments,[35] 6 to 8 segment messages are the practical maximum, 
and long messages are often billed as equivalent to multiple SMS messages. See concatenated SMS for more information. 
Some providers have offered length-oriented pricing schemes for messages, however, the phenomenon is disappearing.
*/
$TotalSegment=0;
$textlen = mb_strlen($text);
if($textlen==0) return false; //I can see most mobile devices will not allow you to send empty sms, with this check we make sure we don't allow empty SMS

if(isGsm7bit($text)){ //7-bit
    $SingleMax=160;
    $ConcatMax=153;
}else{ //UCS-2 Encoding (16-bit)
    $SingleMax=70;
    $ConcatMax=67;
}

if($textlen<=$SingleMax){
    $TotalSegment = 1;
}else{
    $TotalSegment = ceil($textlen/$ConcatMax);
}

if($TotalSegment>$MaxSegments) return false; //SMS is very big.
return $TotalSegment;
}

function isGsm7bit($text){
$gsm7bitChars = "\\\@£\$¥èéùìòÇ\nØø\rÅåΔ_ΦΓΛΩΠΨΣΘΞÆæßÉ !\"#¤%&'()*+,-./0123456789:;<=>?¡ABCDEFGHIJKLMNOPQRSTUVWXYZÄÖÑܧ¿abcdefghijklmnopqrstuvwxyzäöñüà^{}[~]|€";
$textlen = mb_strlen($text);
for ($i = 0;$i < $textlen; $i++){
    if ((strpos($gsm7bitChars, $text[$i])==false) && ($text[$i]!="\\")){return false;} //strpos not     able to detect \ in string
}
return true;
}
阳光的暖冬 2024-12-25 20:43:13
  • 第 1 页:160 字节
  • 第 2 页:146 个字节
  • 第 3 页:153 个字节
  • 第 4 页:153 个字节
  • 第 5 页:153 个字节,....

所以无论语言如何:

// strlen($text) show bytes 

           $count = 0;
           $len = strlen($text);
                if ($len > 306) {
                    $len = $len - 306;
                    $count = floor($len / 153) + 3;
                } else if($len>160){
                    $count = 2;
                }else{
                    $count = 1;
                }
  • page 1 : 160 byte
  • page 2 : 146 byte
  • page 3 : 153 byte
  • page 4 : 153 byte
  • page 5 : 153 byte, ....

So regardless of language :

// strlen($text) show bytes 

           $count = 0;
           $len = strlen($text);
                if ($len > 306) {
                    $len = $len - 306;
                    $count = floor($len / 153) + 3;
                } else if($len>160){
                    $count = 2;
                }else{
                    $count = 1;
                }
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文