编码/压缩重复整数序列

发布于 2024-08-27 16:09:22 字数 1285 浏览 5 评论 0原文

我有非常长的整数序列,如下所示(任意长度!):

0000000001110002220033333

现在我需要一些算法来将此字符串转换为压缩的内容,例如

a9b3a3c3a2d5

“a 9 次,然后 b 3 次,然后 a 3 次”,依此类推,其中“a”代表 0,“b”代表 1,“c”代表 2,“d”代表 3。

你会怎么做? 到目前为止,我还没有想到合适的东西,而且我在谷歌上也没有运气,因为我真的不知道要搜索什么。这种编码/压缩叫什么?

PS:我将使用PHP进行编码,并使用JavaScript进行解码。

编辑:谢谢大家!

我最终得到了这个用于编码的函数:

protected function numStringToRle($s){          
        $rle    = '';
        $count = 1;
        $len    = strlen($s);
        for($i = 0; $i < $len; $i++){
            if($i != $len && isset($s[$i+1]) && $s[$i] == $s[$i+1]){
                $count++;                
            } else {
                $rle .= chr($s[$i] + 97).( $count == 1 ? '' : $count);                                
                $count = 1;
            }
        }
        return $rle;            
}

以及用于解码的函数:

var decodeCoords = function(str) {

   str = str.replace(/(.)(\d+)/g, function(_, x, n) {
       return new Array(parseInt(n, 10) + 1).join(x);
   });

   return str.
     replace(/a/g, '0').
     replace(/b/g, '1').
     replace(/c/g, '2').
     replace(/d/g, '3');     
};

I have very long integer sequences that look like this (arbitrary length!):

0000000001110002220033333

Now I need some algorithm to convert this string into something compressed like

a9b3a3c3a2d5

Which means "a 9 times, then b 3 times, then a 3 times" and so on, where "a" stands for 0, "b" for 1, "c" for 2 and "d" for 3.

How would you do that?
So far nothing suitable came to my mind, and I had no luck with google because I didn't really know what to search for. What is this kind of encoding / compression called?

PS: I am going to do the encoding with PHP, and the decoding in JavaScript.

Edit: Thank you all!

I ended up with this function for encoding:

protected function numStringToRle($s){          
        $rle    = '';
        $count = 1;
        $len    = strlen($s);
        for($i = 0; $i < $len; $i++){
            if($i != $len && isset($s[$i+1]) && $s[$i] == $s[$i+1]){
                $count++;                
            } else {
                $rle .= chr($s[$i] + 97).( $count == 1 ? '' : $count);                                
                $count = 1;
            }
        }
        return $rle;            
}

And that for decoding:

var decodeCoords = function(str) {

   str = str.replace(/(.)(\d+)/g, function(_, x, n) {
       return new Array(parseInt(n, 10) + 1).join(x);
   });

   return str.
     replace(/a/g, '0').
     replace(/b/g, '1').
     replace(/c/g, '2').
     replace(/d/g, '3');     
};

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

怕倦 2024-09-03 16:09:22

它被称为 运行长度编码

PHP 中的基本编码器:

function numStringToRle($s){
    $rle = '';
    $count = 1;
    $len = strlen($s);
    for ( $i = 0; $i < $len; $i++ ){
        if ( $i != $len && $s[$i] == $s[$i+1] ){
            $count++;                
        }else{
          $rle .= chr($s[$i] + 97).$count;    
          $count = 1;
        }
    }
    return $rle;
}

请注意,它会执行严重问题 如果您要处理一个可能有很多单独的单个字符的字符串

 123456789123456789

,那么您最好添加一些复杂性,并且如果游程长度为 1,则不要写入游程长度。

//change
$rle .= chr($s[$i] + 97).$count;    

//to
$rle .= chr($s[$i] + 97).( $count == 1 ? '' : $count );   

//or
$rle .= chr($s[$i] + 97)
if ( $count != 1 ){
    $rle .= $count;
}

It is called Run Length Encoding

Basic encoder in PHP:

function numStringToRle($s){
    $rle = '';
    $count = 1;
    $len = strlen($s);
    for ( $i = 0; $i < $len; $i++ ){
        if ( $i != $len && $s[$i] == $s[$i+1] ){
            $count++;                
        }else{
          $rle .= chr($s[$i] + 97).$count;    
          $count = 1;
        }
    }
    return $rle;
}

Be warned it will preform badly issues with a string like

 123456789123456789

If you were going to be handling a string that may have a lot of individual single characters you would be better to add some complexity and not write the length of the run if the length of the run is 1.

//change
$rle .= chr($s[$i] + 97).$count;    

//to
$rle .= chr($s[$i] + 97).( $count == 1 ? '' : $count );   

//or
$rle .= chr($s[$i] + 97)
if ( $count != 1 ){
    $rle .= $count;
}
长安忆 2024-09-03 16:09:22

这是您想要的一个简单的实现。

$toEncode = '0000000001110002220033333';
$currentChar = '-1';
$length = strlen($toEncode);
$encoded = '';
$currentNbrChar = 0;
for($i = 0; $i < $length; $i++){
  if($toEncode[$i] != $currentChar){
    if($currentChar != '-1'){
      $encoded .= chr(97 + $currentChar).$currentNbrChar;
    }
    $currentNbrChar = 0;
    $currentChar = $toEncode[$i];
  }
  $currentNbrChar ++;
}
if($currentChar != '-1'){
  $encoded .= chr(97 + $currentChar).$currentNbrChar;
}
echo $encoded;

Here is a naive implementation of what you want.

$toEncode = '0000000001110002220033333';
$currentChar = '-1';
$length = strlen($toEncode);
$encoded = '';
$currentNbrChar = 0;
for($i = 0; $i < $length; $i++){
  if($toEncode[$i] != $currentChar){
    if($currentChar != '-1'){
      $encoded .= chr(97 + $currentChar).$currentNbrChar;
    }
    $currentNbrChar = 0;
    $currentChar = $toEncode[$i];
  }
  $currentNbrChar ++;
}
if($currentChar != '-1'){
  $encoded .= chr(97 + $currentChar).$currentNbrChar;
}
echo $encoded;
花辞树 2024-09-03 16:09:22

这是一个较短的版本:

function smush(str) {
  return str.replace(/((.)\2*)/g, function(_, w, x) {
    return x + w.length;
  });
}

编辑哦,我看到你想用 php 进行编码;抱歉我不知道。这是一个具有类似精神的解码器:

function unsmush(str) {
  return str.replace(/(.)(\d+)/g, function(_, x, n) {
    return new Array(parseInt(n, 10) + 1).join(x);
  });
}

Here's a shorter version:

function smush(str) {
  return str.replace(/((.)\2*)/g, function(_, w, x) {
    return x + w.length;
  });
}

edit oh I see you want to encode with php; sorry I don't know that. Here's a decoder in a similar spirit:

function unsmush(str) {
  return str.replace(/(.)(\d+)/g, function(_, x, n) {
    return new Array(parseInt(n, 10) + 1).join(x);
  });
}
夜吻♂芭芘 2024-09-03 16:09:22

仅供参考,您可能可以对数据进行 gzip 压缩,浏览器会自动解压缩它。对于大多数实现来说,这比 RLE 效果更好。但显然没那么有趣。

Just FYI, you could probably gzip your data and the browse will automatically unzip it. For most implementations this is going to work better than RLE. But less fun obviously.

姜生凉生 2024-09-03 16:09:22
$str="0000000001110002220033333";

//$c will count the number of occurances.

$c=1;

$lastInt=substr($str,0,1);

$str=substr($str,1);

$resultStr='';

$loopEnd=strlen($str);


for($i=1; $i<=$loopEnd+1;$i++)

{

    $nowInt=substr($str,0,1);   
    if($lastInt==$nowInt)
    {
        $c++;
        $str=substr($str,1);
    }
    else
    {
        $char=chr((int)$lastInt + 97);
        $resultStr=$resultStr.$char.$c;
        $str=substr($str,1);
        $c=1;
        $lastInt=$nowInt;
    }
}

// we use if condition since for loop will not take the last integer if it repeats.

if($c>1)
{

$char=chr((int)$lastInt + 97);

$resultStr=$resultStr.$char.$c;

}

echo $resultStr;
$str="0000000001110002220033333";

//$c will count the number of occurances.

$c=1;

$lastInt=substr($str,0,1);

$str=substr($str,1);

$resultStr='';

$loopEnd=strlen($str);


for($i=1; $i<=$loopEnd+1;$i++)

{

    $nowInt=substr($str,0,1);   
    if($lastInt==$nowInt)
    {
        $c++;
        $str=substr($str,1);
    }
    else
    {
        $char=chr((int)$lastInt + 97);
        $resultStr=$resultStr.$char.$c;
        $str=substr($str,1);
        $c=1;
        $lastInt=$nowInt;
    }
}

// we use if condition since for loop will not take the last integer if it repeats.

if($c>1)
{

$char=chr((int)$lastInt + 97);

$resultStr=$resultStr.$char.$c;

}

echo $resultStr;
仙女山的月亮 2024-09-03 16:09:22
function compress( $str) {
$strArr = str_split($str.'0');
$count = 0;
$resStr = '';
$strCheck = $strArr[0];
foreach($strArr as $key => $value)
{
    if($strCheck == $value)
    {
       $count++;
    } 
    else
    {
        if($count == 1)
        {
            $strCheck = $value;
            $resStr .= $strArr[$key-1];
            $count=1;
        }
        elseif($count == 2)
        {
            $strCheck = $value;
            $resStr .= $strArr[$key-1].$strArr[$key-1];
            $count=1;
        }
        else
        {
            $strCheck = $value;
            $resStr .= $strArr[$key-1].$count;
            $count=1;
        }
    } 

} 
return $resStr;

}

function compress( $str) {
$strArr = str_split($str.'0');
$count = 0;
$resStr = '';
$strCheck = $strArr[0];
foreach($strArr as $key => $value)
{
    if($strCheck == $value)
    {
       $count++;
    } 
    else
    {
        if($count == 1)
        {
            $strCheck = $value;
            $resStr .= $strArr[$key-1];
            $count=1;
        }
        elseif($count == 2)
        {
            $strCheck = $value;
            $resStr .= $strArr[$key-1].$strArr[$key-1];
            $count=1;
        }
        else
        {
            $strCheck = $value;
            $resStr .= $strArr[$key-1].$count;
            $count=1;
        }
    } 

} 
return $resStr;

}

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文