就地游程长度解码?

发布于 2024-12-26 02:04:20 字数 343 浏览 0 评论 0原文

给定一个游程长度编码的字符串,例如“A3B1C2D1E1”,就地解码该字符串。 编码字符串的答案是“AAABCCDE”。假设编码数组足够大以容纳解码字符串,即您可以假设数组大小 = MAX[length(encodedstirng),length(decodedstring)]。

这看起来并不简单,因为仅仅将 A3 解码为“AAA”就会导致覆盖原始字符串的“B”。

此外,不能假设解码的字符串总是大于编码的字符串。 例如:编码字符串 - 'A1B1',解码字符串为 'AB'。有什么想法吗?

并且它始终是字母数字对,即不会要求您将 0515 转换为 0000055555

Given a run length encoded string, say "A3B1C2D1E1", decode the string in-place.
The answer for the encoded string is "AAABCCDE". Assume that the encoded array is large enough to accommodate the decoded string, i.e. you may assume that the array size = MAX[length(encodedstirng),length(decodedstring)].

This does not seem trivial, since merely decoding A3 as 'AAA' will lead to over-writing 'B' of the original string.

Also, one cannot assume that the decoded string is always larger than the encoded string.
Eg: Encoded string - 'A1B1', Decoded string is 'AB'. Any thoughts?

And it will always be a letter-digit pair, i.e. you will not be asked to converted 0515 to 0000055555

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

决绝 2025-01-02 02:04:20

如果我们还不知道,我们应该首先扫描,将数字相加,以计算解码字符串的长度。

它始终是字母数字对,因此您可以从字符串中删除 1 而不会产生任何混淆。

A3B1C2D1E1

变成

A3BC2DE

这里是一些 C++ 代码,用于从字符串中删除 1(复杂度为 O(n))。

// remove 1s
int i = 0; // read from here
int j = 0; // write to here
while(i < str.length) {
    assert(j <= i); // optional check
    if(str[i] != '1') {
        str[j] = str[i];
        ++ j;
    }
    ++ i;
}
str.resize(j); // to discard the extra space now that we've got our shorter string

现在,该字符串保证比最终解码的字符串短或长度相同。我们不能对原始字符串做出这样的声明,但我们可以对这个修改后的字符串做出这样的声明。

(现在一个可选的、简单的步骤是将每个 2 替换为前一个字母。A3BCCDE,但我们不需要这样做)。

现在我们可以从最后开始工作了。我们已经计算了解码字符串的长度,因此我们确切地知道最终字符将在哪里。我们可以简单地将字符从短字符串的末尾复制到它们的最终位置。

在从右到左的复制过程中,如果我们遇到一个数字,我们必须复制该数字左侧的字母的多个副本。您可能担心这可能会覆盖太多数据。但我们之前证明了我们的编码字符串或其任何子字符串永远不会比其相应的解码字符串长;这意味着总会有足够的空间。

If we don't already know, we should scan through first, adding up the digits, in order to calculate the length of the decoded string.

It will always be a letter-digit pair, hence you can delete the 1s from the string without any confusion.

A3B1C2D1E1

becomes

A3BC2DE

Here is some code, in C++, to remove the 1s from the string (O(n) complexity).

// remove 1s
int i = 0; // read from here
int j = 0; // write to here
while(i < str.length) {
    assert(j <= i); // optional check
    if(str[i] != '1') {
        str[j] = str[i];
        ++ j;
    }
    ++ i;
}
str.resize(j); // to discard the extra space now that we've got our shorter string

Now, this string is guaranteed to be shorter than, or the same length as, the final decoded string. We can't make that claim about the original string, but we can make it about this modified string.

(An optional, trivial, step now is to replace every 2 with the previous letter. A3BCCDE, but we don't need to do that).

Now we can start working from the end. We have already calculated the length of the decoded string, and hence we know exactly where the final character will be. We can simply copy the characters from the end of our short string to their final location.

During this copy process from right-to-left, if we come across a digit, we must make multiple copies of the letter that is just to the left of the digit. You might be worried that this might risk overwriting too much data. But we proved earlier that our encoded string, or any substring thereof, will never be longer than its corresponding decoded string; this means that there will always be enough space.

羅雙樹 2025-01-02 02:04:20

以下解决方案是 O(n) 且就地解决。该算法不应该访问它不应该访问的内存,无论是读还是写。我做了一些调试,对于我提供的示例测试来说,它似乎是正确的。


高级概述:

  • 确定编码长度。
  • 通过读取所有数字并将其相加来确定解码长度。
  • 缓冲区末尾为 MAX(解码长度,编码长度)。
  • 从字符串末尾开始解码字符串。从缓冲区末尾开始写入。
  • 由于解码的长度可能大于编码的长度,因此解码的字符串可能不会从缓冲区的开头开始。如果需要,请通过将字符串移至开头来纠正此问题。

int isDigit (char c) {
    return '0' <= c && c <= '9';
}

unsigned int toDigit (char c) {
    return c - '0';
}

unsigned int intLen (char * str) {
    unsigned int n = 0;
    while (isDigit(*str++)) {
        ++n;
    }
    return n;
}

unsigned int forwardParseInt (char ** pStr) {
    unsigned int n = 0;
    char * pChar = *pStr;
    while (isDigit(*pChar)) {
        n = 10 * n + toDigit(*pChar);
        ++pChar;
    }
    *pStr = pChar;
    return n;
}

unsigned int backwardParseInt (char ** pStr, char * beginStr) {
    unsigned int len, n;
    char * pChar = *pStr;
    while (pChar != beginStr && isDigit(*pChar)) {
        --pChar;
    }
    ++pChar;
    len = intLen(pChar);
    n = forwardParseInt(&pChar);
    *pStr = pChar - 1 - len;
    return n;
}

unsigned int encodedSize (char * encoded) {
    int encodedLen = 0;
    while (*encoded++ != '\0') {
        ++encodedLen;
    }
    return encodedLen;
}

unsigned int decodedSize (char * encoded) {
    int decodedLen = 0;
    while (*encoded++ != '\0') {
        decodedLen += forwardParseInt(&encoded);
    }
    return decodedLen;
}

void shift (char * str, int n) {
    do {
        str[n] = *str;
    } while (*str++ != '\0');
}

unsigned int max (unsigned int x, unsigned int y) {
    return x > y ? x : y;
}

void decode (char * encodedBegin) {
    int shiftAmount;
    unsigned int eSize = encodedSize(encodedBegin);
    unsigned int dSize = decodedSize(encodedBegin);
    int writeOverflowed = 0;
    char * read = encodedBegin + eSize - 1;
    char * write = encodedBegin + max(eSize, dSize);
    *write-- = '\0';
    while (read != encodedBegin) {
        unsigned int i;
        unsigned int n = backwardParseInt(&read, encodedBegin);
        char c = *read;
        for (i = 0; i < n; ++i) {
            *write = c;
            if (write != encodedBegin) {
                write--;
            }
            else {
                writeOverflowed = 1;
            }
        }
        if (read != encodedBegin) {
            read--;
        }
    }
    if (!writeOverflowed) {
        write++;
    }
    shiftAmount = encodedBegin - write;
    if (write != encodedBegin) {
        shift(write, shiftAmount);
    }
    return;
}

int main (int argc, char ** argv) {
    //char buff[256] = { "!!!A33B1C2D1E1\0!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!" };
    char buff[256] = { "!!!A2B12C1\0!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!" };
    //char buff[256] = { "!!!A1B1C1\0!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!" };
    char * str = buff + 3;
    //char buff[256] = { "A1B1" };
    //char * str = buff;
    decode(str);
    return 0;
}

The following solution is O(n) and in-place. The algorithm should not access memory it shouldn't, both read and write. I did some debugging, and it appears correct to the sample tests I fed it.


High level overview:

  • Determine the encoded length.
  • Determine the decoded length by reading all the numbers and summing them up.
  • End of buffer is MAX(decoded length, encoded length).
  • Decode the string by starting from the end of the string. Write from the end of the buffer.
  • Since the decoded length might be greater than the encoded length, the decoded string might not start at the start of the buffer. If needed, correct for this by shifting the string over to the start.

int isDigit (char c) {
    return '0' <= c && c <= '9';
}

unsigned int toDigit (char c) {
    return c - '0';
}

unsigned int intLen (char * str) {
    unsigned int n = 0;
    while (isDigit(*str++)) {
        ++n;
    }
    return n;
}

unsigned int forwardParseInt (char ** pStr) {
    unsigned int n = 0;
    char * pChar = *pStr;
    while (isDigit(*pChar)) {
        n = 10 * n + toDigit(*pChar);
        ++pChar;
    }
    *pStr = pChar;
    return n;
}

unsigned int backwardParseInt (char ** pStr, char * beginStr) {
    unsigned int len, n;
    char * pChar = *pStr;
    while (pChar != beginStr && isDigit(*pChar)) {
        --pChar;
    }
    ++pChar;
    len = intLen(pChar);
    n = forwardParseInt(&pChar);
    *pStr = pChar - 1 - len;
    return n;
}

unsigned int encodedSize (char * encoded) {
    int encodedLen = 0;
    while (*encoded++ != '\0') {
        ++encodedLen;
    }
    return encodedLen;
}

unsigned int decodedSize (char * encoded) {
    int decodedLen = 0;
    while (*encoded++ != '\0') {
        decodedLen += forwardParseInt(&encoded);
    }
    return decodedLen;
}

void shift (char * str, int n) {
    do {
        str[n] = *str;
    } while (*str++ != '\0');
}

unsigned int max (unsigned int x, unsigned int y) {
    return x > y ? x : y;
}

void decode (char * encodedBegin) {
    int shiftAmount;
    unsigned int eSize = encodedSize(encodedBegin);
    unsigned int dSize = decodedSize(encodedBegin);
    int writeOverflowed = 0;
    char * read = encodedBegin + eSize - 1;
    char * write = encodedBegin + max(eSize, dSize);
    *write-- = '\0';
    while (read != encodedBegin) {
        unsigned int i;
        unsigned int n = backwardParseInt(&read, encodedBegin);
        char c = *read;
        for (i = 0; i < n; ++i) {
            *write = c;
            if (write != encodedBegin) {
                write--;
            }
            else {
                writeOverflowed = 1;
            }
        }
        if (read != encodedBegin) {
            read--;
        }
    }
    if (!writeOverflowed) {
        write++;
    }
    shiftAmount = encodedBegin - write;
    if (write != encodedBegin) {
        shift(write, shiftAmount);
    }
    return;
}

int main (int argc, char ** argv) {
    //char buff[256] = { "!!!A33B1C2D1E1\0!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!" };
    char buff[256] = { "!!!A2B12C1\0!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!" };
    //char buff[256] = { "!!!A1B1C1\0!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!" };
    char * str = buff + 3;
    //char buff[256] = { "A1B1" };
    //char * str = buff;
    decode(str);
    return 0;
}
落花随流水 2025-01-02 02:04:20

这是一个很模糊的问题,不过仔细想想也不是特别困难。正如您所说,将 A3 解码为 AAA 并将其写入到位将覆盖字符 B1,那么为什么不先将它们沿着阵列移得更远呢?

例如,一旦您读过 A3,您就知道需要为一个额外字符腾出空间,如果是 A4,则需要两个,依此类推。为了实现这一点,您需要在数组中找到字符串的末尾(预先执行此操作并存储它的索引)。

然后循环,将字符移动到新的位置:

开始:A|3|B|1|C|2|||||||
有一个名为 end 的变量来存储索引 5,即最后一个非空白条目。

您将读取第一对,使用名为 cursor 的变量来存储您当前的位置 - 因此在读取 A3 后它将被设置为 1(带有 3 的插槽)。

移动的伪代码:

var n = array[cursor] - 2; // n = 1,A3 中的 3,然后减去 2 以允许配对。

for(i = 结束; i > 光标; i++)
{
数组[i + n] = 数组[i];

会给你留下:

A|3|A|3|B|1|C|2|||||

现在 A 已经在那里了,所以现在你想从 cursor 中存储的索引开始写入 n + 1 A

for(i = cursor; i < cursor + n + 1; i++)
{
  array[i] = array[cursor - 1];
}

// increment the cursor afterwards!
cursor += n + 1;

A|A|A|A|B|1|C|2|||||

然后,您将指向下一对值的开头,准备再次开始。我意识到这个答案有一些漏洞,尽管这是故意的,因为这是一个面试问题!例如,在您指定 A1B1 的边缘情况下,您将需要一个不同的循环来向后而不是向前移动后续字符。

This is a very vague question, though it's not particularly difficult if you think about it. As you say, decoding A3 as AAA and just writing it in place will overwrite the chars B and 1, so why not just move those farther along the array first?

For instance, once you've read A3, you know that you need to make space for one extra character, if it was A4 you'd need two, and so on. To achieve this you'd find the end of the string in the array (do this upfront and store it's index).

Then loop though, moving the characters to their new slots:

To start: A|3|B|1|C|2|||||||
Have a variable called end storing the index 5, i.e. the last, non-blank, entry.

You'd read in the first pair, using a variable called cursor to store your current position - so after reading in the A and the 3 it would be set to 1 (the slot with the 3).

Pseudocode for the move:

var n = array[cursor] - 2; // n = 1, the 3 from A3, and then minus 2 to allow for the pair.

for(i = end; i > cursor; i++)
{
array[i + n] = array[i];
}

This would leave you with:

A|3|A|3|B|1|C|2|||||

Now the A is there once already, so now you want to write n + 1 A's starting at the index stored in cursor:

for(i = cursor; i < cursor + n + 1; i++)
{
  array[i] = array[cursor - 1];
}

// increment the cursor afterwards!
cursor += n + 1;

Giving:

A|A|A|A|B|1|C|2|||||

Then you're pointing at the start of the next pair of values, ready to go again. I realise there are some holes in this answer, though that is intentional as it's an interview question! For instance, in the edge cases you specified A1B1, you'll need a different loop to move subsequent characters backwards rather than forwards.

时光与爱终年不遇 2025-01-02 02:04:20

接下来是另一个 O(n^2) 解决方案。

鉴于答案的复杂性没有限制,这个简单的解决方案似乎工作得很好。

while ( there is an expandable element ):
    expand that element
    adjust (shift) all of the elements on the right side of the expanded element

其中:

  • 可用空间大小是数组中剩余的空元素数量。

  • 可扩展元素是这样的元素:

    扩展大小 - 编码大小 <= 可用空间大小
    

要点是,在从游程码到达扩展字符串的过程中,每一步至少有
一个可以扩展的元素(易于证明)。

Another O(n^2) solution follows.

Given that there is no limit on the complexity of the answer, this simple solution seems to work perfectly.

while ( there is an expandable element ):
    expand that element
    adjust (shift) all of the elements on the right side of the expanded element

Where:

  • Free space size is the number of empty elements left in the array.

  • An expandable element is an element that:

    expanded size - encoded size <= free space size
    

The point is that in the process of reaching from the run-length code to the expanded string, at each step, there is at least
one element that can be expanded (easy to prove).

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文