使用 Pygments 过滤空格和换行符

发布于 2024-10-25 05:42:34 字数 1030 浏览 6 评论 0原文

我一直在尝试向我的 django 站点添加语法突出显示。问题是我也对  
字符进行了格式化。有没有办法保留这些字符?这是我正在使用的代码:

from BeautifulSoup  import BeautifulSoup
from django import template
from django.template.defaultfilters import stringfilter
import pygments
import pygments.formatters
import pygments.lexers


register = template.Library()

@register.filter
@stringfilter
def pygmentized(html):
    soup = BeautifulSoup(html)
    codeblocks = soup.findAll('code')
    for block in codeblocks:
        if block.has_key('class'):
            try:
                code = ''.join([unicode(item) for item in block.contents])
                lexer = pygments.lexers.get_lexer_by_name(block['class'], stripall=True)
                formatter = pygments.formatters.HtmlFormatter()
                code_hl = pygments.highlight(code, lexer, formatter)
                block.contents = [BeautifulSoup(code_hl)]
                block.name = 'code'
            except:
                raise
    return unicode(soup)

I've been trying to add syntax highlighting to my django site. The problem is I'm getting the   and <br /> characters formatted as well. Is there a way to preserve these characterss? Here is the code I'm using:

from BeautifulSoup  import BeautifulSoup
from django import template
from django.template.defaultfilters import stringfilter
import pygments
import pygments.formatters
import pygments.lexers


register = template.Library()

@register.filter
@stringfilter
def pygmentized(html):
    soup = BeautifulSoup(html)
    codeblocks = soup.findAll('code')
    for block in codeblocks:
        if block.has_key('class'):
            try:
                code = ''.join([unicode(item) for item in block.contents])
                lexer = pygments.lexers.get_lexer_by_name(block['class'], stripall=True)
                formatter = pygments.formatters.HtmlFormatter()
                code_hl = pygments.highlight(code, lexer, formatter)
                block.contents = [BeautifulSoup(code_hl)]
                block.name = 'code'
            except:
                raise
    return unicode(soup)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

娇纵 2024-11-01 05:42:34

嗯,Petri 是对的,pre 用于代码块。在他指出我刚刚编写了一个函数来清理第一个输出之前,它很混乱,但也许只需要从最终输出中删除某些内容的人可能会发现它没问题:

from BeautifulSoup  import BeautifulSoup
from django import template
from django.template.defaultfilters import stringfilter
import pygments
import pygments.formatters
import pygments.lexers


register = template.Library()
wanted = {'br': '<br />', 'BR': '<BR />', 'nbsp': ' ', 'NBSP': '&NBSP;', '/>': ''}

def uglyfilter(html):
    content = BeautifulSoup(html)
    for node in content.findAll('span'):
        data = ''.join(node.findAll(text=True))
        if wanted.has_key(data):
            node.replaceWith(wanted.get(data))
    return unicode(content)     


@register.filter
@stringfilter
def pygmentized(html):
    soup = BeautifulSoup(html)
    codeblocks = soup.findAll('pre')
    for block in codeblocks:
        if block.has_key('class'):
            try:
                code = ''.join([unicode(item) for item in block.contents])
                lexer = pygments.lexers.get_lexer_by_name(block['class'], stripall=True)
                formatter = pygments.formatters.HtmlFormatter()
                code_hl = pygments.highlight(code, lexer, formatter)
                clean = uglyfilter(code_hl)
                block.contents = [BeautifulSoup(clean)]
                block.name = 'pre'
            except:
                raise
    return unicode(soup)

Well, Petri is right, pre is for code blocks. Before he pointed that out I just wrote a function to clean the first output, it's messy but maybe someone who just needs to remove certain things from the final output might find it ok:

from BeautifulSoup  import BeautifulSoup
from django import template
from django.template.defaultfilters import stringfilter
import pygments
import pygments.formatters
import pygments.lexers


register = template.Library()
wanted = {'br': '<br />', 'BR': '<BR />', 'nbsp': ' ', 'NBSP': '&NBSP;', '/>': ''}

def uglyfilter(html):
    content = BeautifulSoup(html)
    for node in content.findAll('span'):
        data = ''.join(node.findAll(text=True))
        if wanted.has_key(data):
            node.replaceWith(wanted.get(data))
    return unicode(content)     


@register.filter
@stringfilter
def pygmentized(html):
    soup = BeautifulSoup(html)
    codeblocks = soup.findAll('pre')
    for block in codeblocks:
        if block.has_key('class'):
            try:
                code = ''.join([unicode(item) for item in block.contents])
                lexer = pygments.lexers.get_lexer_by_name(block['class'], stripall=True)
                formatter = pygments.formatters.HtmlFormatter()
                code_hl = pygments.highlight(code, lexer, formatter)
                clean = uglyfilter(code_hl)
                block.contents = [BeautifulSoup(clean)]
                block.name = 'pre'
            except:
                raise
    return unicode(soup)
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文