编码/压缩重复整数序列
我有非常长的整数序列,如下所示(任意长度!):
0000000001110002220033333
现在我需要一些算法来将此字符串转换为压缩的内容,例如
a9b3a3c3a2d5
“a 9 次,然后 b 3 次,然后 a 3 次”,依此类推,其中“a”代表 0,“b”代表 1,“c”代表 2,“d”代表 3。
你会怎么做? 到目前为止,我还没有想到合适的东西,而且我在谷歌上也没有运气,因为我真的不知道要搜索什么。这种编码/压缩叫什么?
PS:我将使用PHP进行编码,并使用JavaScript进行解码。
编辑:谢谢大家!
我最终得到了这个用于编码的函数:
protected function numStringToRle($s){
$rle = '';
$count = 1;
$len = strlen($s);
for($i = 0; $i < $len; $i++){
if($i != $len && isset($s[$i+1]) && $s[$i] == $s[$i+1]){
$count++;
} else {
$rle .= chr($s[$i] + 97).( $count == 1 ? '' : $count);
$count = 1;
}
}
return $rle;
}
以及用于解码的函数:
var decodeCoords = function(str) {
str = str.replace(/(.)(\d+)/g, function(_, x, n) {
return new Array(parseInt(n, 10) + 1).join(x);
});
return str.
replace(/a/g, '0').
replace(/b/g, '1').
replace(/c/g, '2').
replace(/d/g, '3');
};
I have very long integer sequences that look like this (arbitrary length!):
0000000001110002220033333
Now I need some algorithm to convert this string into something compressed like
a9b3a3c3a2d5
Which means "a 9 times, then b 3 times, then a 3 times" and so on, where "a" stands for 0, "b" for 1, "c" for 2 and "d" for 3.
How would you do that?
So far nothing suitable came to my mind, and I had no luck with google because I didn't really know what to search for. What is this kind of encoding / compression called?
PS: I am going to do the encoding with PHP, and the decoding in JavaScript.
Edit: Thank you all!
I ended up with this function for encoding:
protected function numStringToRle($s){
$rle = '';
$count = 1;
$len = strlen($s);
for($i = 0; $i < $len; $i++){
if($i != $len && isset($s[$i+1]) && $s[$i] == $s[$i+1]){
$count++;
} else {
$rle .= chr($s[$i] + 97).( $count == 1 ? '' : $count);
$count = 1;
}
}
return $rle;
}
And that for decoding:
var decodeCoords = function(str) {
str = str.replace(/(.)(\d+)/g, function(_, x, n) {
return new Array(parseInt(n, 10) + 1).join(x);
});
return str.
replace(/a/g, '0').
replace(/b/g, '1').
replace(/c/g, '2').
replace(/d/g, '3');
};
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
它被称为 运行长度编码
PHP 中的基本编码器:
请注意,它会执行严重问题 如果您要处理一个可能有很多单独的单个字符的字符串
,那么您最好添加一些复杂性,并且如果游程长度为 1,则不要写入游程长度。
It is called Run Length Encoding
Basic encoder in PHP:
Be warned it will preform badly issues with a string like
If you were going to be handling a string that may have a lot of individual single characters you would be better to add some complexity and not write the length of the run if the length of the run is 1.
这是您想要的一个简单的实现。
Here is a naive implementation of what you want.
这是一个较短的版本:
编辑哦,我看到你想用 php 进行编码;抱歉我不知道。这是一个具有类似精神的解码器:
Here's a shorter version:
edit oh I see you want to encode with php; sorry I don't know that. Here's a decoder in a similar spirit:
仅供参考,您可能可以对数据进行 gzip 压缩,浏览器会自动解压缩它。对于大多数实现来说,这比 RLE 效果更好。但显然没那么有趣。
Just FYI, you could probably gzip your data and the browse will automatically unzip it. For most implementations this is going to work better than RLE. But less fun obviously.
}
}