NSTask 输出格式

发布于 2024-10-14 09:54:51 字数 458 浏览 1 评论 0原文

我正在使用 NSTask 来获取 /usr/bin/man 的输出。我得到了输出,但没有格式化(粗体、下划线)。应该显示为这样的内容:

带有下划线的粗体文本

(请注意,斜体文本实际上带有下划线,这里只是没有格式)

而是像这样返回:

BBoolldd文本与 _u_n_d_e_r_l_i_n_e

我有一个最小的测试项目 http://cl.ly/052u2z2i2R280T3r1K3c 您可以下载并运行;请注意,窗口不执行任何操作;输出被记录到控制台。

我想我需要以某种方式手动解释 NSData 对象,但我不知道从哪里开始。理想情况下,我想将其转换为 NSAttributedString 但首要任务实际上是消除重复项和下划线。有什么想法吗?

I'm using an NSTask to grab the output from /usr/bin/man. I'm getting the output but without formatting (bold, underline). Something that should appear like this:

Bold text with underline

(note the italic text is actually underlined, there's just no formatting for it here)

Instead gets returned like this:

BBoolldd text with _u_n_d_e_r_l_i_n_e

I have a minimal test project at http://cl.ly/052u2z2i2R280T3r1K3c that you can download and run; note the window does nothing; the output gets logged to the Console.

I presume I need to somehow interpret the NSData object manually but I have no idea where to start on that. I'd ideally like to translate it to an NSAttributedString but the first order of business is actually eliminating the duplicates and underscores. Any thoughts?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

李白 2024-10-21 09:54:51

你的实际目的是什么?如果您想要显示手册页,一种选择是将其转换为 HTML 并使用 Web 视图呈现。

解析 man 的输出可能很棘手,因为它是由 groff 默认使用终端处理器处理的。这意味着输出经过定制以显示在终端设备上。

一种替代解决方案是确定手册页源文件的实际位置,例如

$ man -w bash
/usr/share/man/man1/bash.1.gz

,使用 -a(ASCII 近似值)和 -c 手动调用 groff(禁用颜色输出),例如

$ gunzip -c /usr/share/man/man1/bash.1.gz | groff -c -a -Tascii -man

这将生成一个没有大部分格式的 ASCII 文件。要生成 HTML 输出,

$ gunzip -c /usr/share/man/man1/bash.1.gz | groff -Thtml -man

您还可以在 man 的自定义配置文件(例如 parseman.conf)中指定这些选项,并告诉 man 使用该配置文件和 man 来生成 HTML 输出。 >-C 选项,而不是调用 man -wgunzipgroff。默认配置文件是/private/etc/man.conf

此外,您还可以通过将适当的选项传递给 grotty 来定制终端设备处理器的输出。

What is your actual purpose? If you want to show a man page, one option is to convert it to HTML and render it with a Web view.

Parsing man’s output can be tricky because it is processed by groff using a terminal processor by default. This means that the output is tailored to be shown on terminal devices.

One alternative solution is to determine the actual location of the man page source file, e.g.

$ man -w bash
/usr/share/man/man1/bash.1.gz

and manually invoke groff on it with -a (ASCII approximation) and -c (disable colour output), e.g.

$ gunzip -c /usr/share/man/man1/bash.1.gz | groff -c -a -Tascii -man

This will result in an ASCII file without most of the formatting. To generate HTML output,

$ gunzip -c /usr/share/man/man1/bash.1.gz | groff -Thtml -man

You can also specify these options in a custom configuration file for man, e.g. parseman.conf, and tell man to use that configuration file with the -C option instead of invoking man -w, gunzip, and groff. The default configuration file is /private/etc/man.conf.

Also, you can probably tailor the output of the terminal device processor by passing appropriate options to grotty.

请远离我 2024-10-21 09:54:51

好的,这是我的解决方案的开始,尽管我对任何其他(更简单的?)方法感兴趣。

从终端返回的输出是 UTF-8 编码,但 NSUTF8StringEncoding 无法正确解释该字符串。原因是 NSTask 输出的格式化方式。

字母 N 在 UTF-8 中是 0x4e。但对应的NSData是0x4e 0x08 0x4e。 0x08 对应于退格键。因此,对于粗体字母,终端将打印字母-退格-字母。

对于斜体 c,它是 UTF-8 格式的 0x63。 NSData 包含 0x5f 0x08 0x63,其中 0x5f 对应下划线。因此,对于斜体,终端打印下划线退格字母。

除了扫描原始 NSData 中的这些序列之外,我现在确实没有看到任何解决此问题的方法。一旦完成,我可能会将源代码发布到我的解析器,除非任何人都有任何现有代码。正如常见的编程短语所说,永远不要自己编写可以复制的内容。 :)

<块引用>

后续:

我有一个很好的、快速的解析器,用于获取 man 输出并用 NSMutableAttributedString 中的粗体/下划线格式替换粗体/下划线输出。如果其他人需要解决同样的问题,这是代码:

NSMutableIndexSet *boldChars = [[NSMutableIndexSet alloc] init];
NSMutableIndexSet *underlineChars = [[NSMutableIndexSet alloc] init];

char* bBytes = malloc(1);
bBytes[0] = (char)0x08;
NSData *bData = [NSData dataWithBytes:bBytes length:1];
free(bBytes); bBytes = nil;
NSRange testRange = NSMakeRange(1, [inputData length] - 1);
NSRange bRange = NSMakeRange(0, 0);

do {
    bRange = [inputData rangeOfData:bData options:(NSDataSearchOptions)NULL range:testRange];
    if (bRange.location == NSNotFound || bRange.location > [inputData length] - 2) break;
    const char * buff = [inputData bytes];

    if (buff[bRange.location - 1] == 0x5f) {

        // it's an underline
        //NSLog(@"Undr %c\n", buff[bRange.location + 1]);
        [inputData replaceBytesInRange:NSMakeRange(bRange.location - 1, 2) withBytes:NULL length:0];
        [underlineChars addIndex:bRange.location - 1];
        testRange = NSMakeRange(bRange.location, [inputData length] - (bRange.location));

    } else if (buff[bRange.location - 1] == buff[bRange.location + 1]) {

        // It's a bold
        //NSLog(@"Bold %c\n", buff[bRange.location + 1]);
        [inputData replaceBytesInRange:NSMakeRange(bRange.location - 1, 2) withBytes:NULL length:0];
        [boldChars addIndex:bRange.location - 1];
        testRange = NSMakeRange(bRange.location, [inputData length] - (bRange.location));

    } else {

        testRange.location = bRange.location + 1;
        testRange.length = [inputData length] - testRange.location;
    }
} while (testRange.location <= [inputData length] - 3);

NSMutableAttributedString *str = [[NSMutableAttributedString alloc] initWithString:[[NSString alloc] initWithData:inputData encoding:NSUTF8StringEncoding]];

NSFont *font = [NSFont fontWithDescriptor:[NSFontDescriptor fontDescriptorWithName:@"Menlo" size:12] size:12];
NSFont *boldFont = [[NSFontManager sharedFontManager] convertFont:font toHaveTrait:NSBoldFontMask];

[str addAttribute:NSFontAttributeName value:font range:NSMakeRange(0, [str length])];

__block NSUInteger begin = [underlineChars firstIndex];
__block NSUInteger end = begin;
[underlineChars enumerateIndexesUsingBlock:^(NSUInteger idx, BOOL *stop) {
    if (idx - end < 2) {
        // it's the next item to the previous one
        end = idx;
    } else {
        // it's a split, so drop in the accumulated range and reset
        [str addAttribute:NSUnderlineStyleAttributeName value:[NSNumber numberWithInt:NSSingleUnderlineStyle] range:NSMakeRange(begin, (end-begin)+1)];
        begin = idx;
        end = begin;
    }
    if (idx == [underlineChars lastIndex]) {
        [str addAttribute:NSUnderlineStyleAttributeName value:[NSNumber numberWithInt:NSSingleUnderlineStyle] range:NSMakeRange(begin, (end-begin)+1)];
    }
}];

begin = [boldChars firstIndex];
end = begin;
[boldChars enumerateIndexesUsingBlock:^(NSUInteger idx, BOOL *stop) {
    if (idx - end < 2) {
        // it's the next item to the previous one
        end = idx;
    } else {
        // it's a split, so drop in the accumulated range and reset
        [str addAttribute:NSFontAttributeName value:boldFont range:NSMakeRange(begin, (end-begin)+1)];
        begin = idx;
        end = begin;
    }
    if (idx == [underlineChars lastIndex]) {
        [str addAttribute:NSFontAttributeName value:boldFont range:NSMakeRange(begin, (end-begin)+1)];
    }
}];

Okay, here's the start of my solution, though I would be interested in any additional (easier?) ways to do this.

The output returned from the Terminal is UTF-8 encoding, but the NSUTF8StringEncoding doesn't interpret the string properly. The reason is the way NSTask output is formatted.

The letter N is 0x4e in UTF-8. But the NSData corresponding to that is 0x4e 0x08 0x4e. 0x08 corresponds to a Backspace. So for a bold letter, Terminal prints letter-backspace-letter.

For an italic c, it's 0x63 in UTF-8. The NSData contains 0x5f 0x08 0x63, with 0x5f corresponding to an underscore. So for italics, Terminal prints underscore-backspace-letter.

I really don't see any way around this at this point besides just scanning the raw NSData for these sequences. I'll probably post the source to my parser here once I finish it, unless anybody has any existing code. As the common programming phrase goes, never write yourself what you can copy. :)

Follow-Up:

I've got a good, fast parser together for taking man output and replacing the bold/underlined output with bold/underlined formatting in an NSMutableAttributedString. Here's the code if anybody else needs to solve the same problem:

NSMutableIndexSet *boldChars = [[NSMutableIndexSet alloc] init];
NSMutableIndexSet *underlineChars = [[NSMutableIndexSet alloc] init];

char* bBytes = malloc(1);
bBytes[0] = (char)0x08;
NSData *bData = [NSData dataWithBytes:bBytes length:1];
free(bBytes); bBytes = nil;
NSRange testRange = NSMakeRange(1, [inputData length] - 1);
NSRange bRange = NSMakeRange(0, 0);

do {
    bRange = [inputData rangeOfData:bData options:(NSDataSearchOptions)NULL range:testRange];
    if (bRange.location == NSNotFound || bRange.location > [inputData length] - 2) break;
    const char * buff = [inputData bytes];

    if (buff[bRange.location - 1] == 0x5f) {

        // it's an underline
        //NSLog(@"Undr %c\n", buff[bRange.location + 1]);
        [inputData replaceBytesInRange:NSMakeRange(bRange.location - 1, 2) withBytes:NULL length:0];
        [underlineChars addIndex:bRange.location - 1];
        testRange = NSMakeRange(bRange.location, [inputData length] - (bRange.location));

    } else if (buff[bRange.location - 1] == buff[bRange.location + 1]) {

        // It's a bold
        //NSLog(@"Bold %c\n", buff[bRange.location + 1]);
        [inputData replaceBytesInRange:NSMakeRange(bRange.location - 1, 2) withBytes:NULL length:0];
        [boldChars addIndex:bRange.location - 1];
        testRange = NSMakeRange(bRange.location, [inputData length] - (bRange.location));

    } else {

        testRange.location = bRange.location + 1;
        testRange.length = [inputData length] - testRange.location;
    }
} while (testRange.location <= [inputData length] - 3);

NSMutableAttributedString *str = [[NSMutableAttributedString alloc] initWithString:[[NSString alloc] initWithData:inputData encoding:NSUTF8StringEncoding]];

NSFont *font = [NSFont fontWithDescriptor:[NSFontDescriptor fontDescriptorWithName:@"Menlo" size:12] size:12];
NSFont *boldFont = [[NSFontManager sharedFontManager] convertFont:font toHaveTrait:NSBoldFontMask];

[str addAttribute:NSFontAttributeName value:font range:NSMakeRange(0, [str length])];

__block NSUInteger begin = [underlineChars firstIndex];
__block NSUInteger end = begin;
[underlineChars enumerateIndexesUsingBlock:^(NSUInteger idx, BOOL *stop) {
    if (idx - end < 2) {
        // it's the next item to the previous one
        end = idx;
    } else {
        // it's a split, so drop in the accumulated range and reset
        [str addAttribute:NSUnderlineStyleAttributeName value:[NSNumber numberWithInt:NSSingleUnderlineStyle] range:NSMakeRange(begin, (end-begin)+1)];
        begin = idx;
        end = begin;
    }
    if (idx == [underlineChars lastIndex]) {
        [str addAttribute:NSUnderlineStyleAttributeName value:[NSNumber numberWithInt:NSSingleUnderlineStyle] range:NSMakeRange(begin, (end-begin)+1)];
    }
}];

begin = [boldChars firstIndex];
end = begin;
[boldChars enumerateIndexesUsingBlock:^(NSUInteger idx, BOOL *stop) {
    if (idx - end < 2) {
        // it's the next item to the previous one
        end = idx;
    } else {
        // it's a split, so drop in the accumulated range and reset
        [str addAttribute:NSFontAttributeName value:boldFont range:NSMakeRange(begin, (end-begin)+1)];
        begin = idx;
        end = begin;
    }
    if (idx == [underlineChars lastIndex]) {
        [str addAttribute:NSFontAttributeName value:boldFont range:NSMakeRange(begin, (end-begin)+1)];
    }
}];
罪歌 2024-10-21 09:54:51

另一种方法是将手册页转换为 PostScript 源代码,通过 PostScript-to-PDF 转换器运行,然后将其放入 PDFView。

实现与 Bvarious 的答案类似,只是 groff 的参数不同(-Tps 而不是 -Thtml)。

这将是最慢的解决方案,但也可能是打印的最佳解决方案。

Another method would be to convert the man page to PostScript source code, run that through the PostScript-to-PDF converter, and put that into a PDFView.

The implementation would be similar to Bavarious's answer, just with different arguments to groff (-Tps instead of -Thtml).

This would be the slowest solution, but also probably the best for printing.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文