为 (La)TeX 重载 Jinja2 自动转义

发布于 2024-12-13 14:02:47 字数 3219 浏览 3 评论 0原文

是否可以重载 Jinja2 的 autoescape 以便它以用户指定的方式转义某些内容（即 HTML 之外的内容，例如 LaTeX）？

下面是一个尝试转义 TeX 的示例。

import jinja2

class MyEnv(jinja2.Environment):
    def __init__(self, filters={}, globals={}, tests={},
        loader=None, extensions=[], **kwargs):

        super(MyEnv, self).__init__(
            autoescape          = True,
        )

template = MyEnv().from_string("""\documentclass[{{ class }}]
         \\begin{document}
         {{ content }}
         \end{document}
     """)

print template.render({
    'class':'memoir',
    'content': '{bob} <-- is escaped',
    })

当您运行上面的代码时，它会输出：

 \documentclass[memoir]
     \begin{document}
     {bob} &lt;-- is escaped
     \end{document}

这里的问题是使用了 HTML 转义。因此 { 和 } 应该被转义，但它们没有，并且 < 被转换为 < 但它不应该是。

我想重载 Jinja2 用于转义变量的转义函数。

我的第一个想法是重载finalize并禁用autoescape。例如，

import jinja2

class MyEnv(jinja2.Environment):
    def __init__(self, filters={}, globals={}, tests={},
        loader=None, extensions=[], **kwargs):

        super(MyEnv, self).__init__(
            autoescape          = False, # turn off autoescape
            finalize            = self.finalize,
        )

    def finalize(self, s):
        import re
        if isinstance(s, jinja2.Markup):
            return s
        s = s.replace('\\', '')
        s = s.replace('~', '\\textasciitilde')
        s = re.sub(r'([#|^|$|&|%|{|}])', r'\\\1', s)
        s = re.sub(r'_', r'\\_', s)

        return jinja2.Markup(s)


template = MyEnv().from_string("""\documentclass[{{ class }}]
         \\begin{document}
         {{ content }}
         \end{document}
     """)

print template.render({
    'class':'memoir',
    'content': '{bob} <-- is escaped',
    })

输出不正确，因为主要文本未制作成 Markup （即标记为安全的字符串）：

documentclass[memoir]
     begin\{document\}
     \{bob\} <-- is escaped
     end\{document\}

如果我将 autoescape 设置为 True，并保留在 Finalize 中它几乎有效（在本例中，它确实有效）：

\documentclass[memoir]
     \begin{document}
     \{bob\} <-- is escaped
     \end{document}

打开autoescape有效，因为它使模板的文本主体成为Markup代码>（即安全）。

但是，问题就在这里，如果我将输入更改为加入的列表：

template = MyEnv().from_string("""\documentclass[{{ class }}]
         \\begin{document}
         {{ content|join("  > a & b > "|safe) }}
         \end{document}
     """)

print template.render({
    'class':'memoir',
    'content': ['A&B', 'C<D'],
    })

当我运行此命令时，我得到：

\documentclass[memoir]
     \begin{document}
     A&amp;B  > a & b > C&lt;D
     \end{document}

似乎 HTML autoescape 正在运行'content' 的元素，而不是 finalize。如果 Jinja2 及其自动转义是松散耦合的，最简单的解决方案似乎是重载自动转义函数。我似乎无法弄清楚这一点，我想出的最好的方法是 finalize 函数。

有没有比重载 finalize 函数更好的方法来处理 TeX 的转义？可以重载autoescape吗？

例如，可以安装自定义标记包吗？（我宁愿避免的选择）

感谢您的阅读。

原文

Is it possible to overload Jinja2's autoescape so that it escapes something in a user-specified way (i.e. something other than HTML such as LaTeX)?

Here's an example trying to escape TeX.

import jinja2

class MyEnv(jinja2.Environment):
    def __init__(self, filters={}, globals={}, tests={},
        loader=None, extensions=[], **kwargs):

        super(MyEnv, self).__init__(
            autoescape          = True,
        )

template = MyEnv().from_string("""\documentclass[{{ class }}]
         \\begin{document}
         {{ content }}
         \end{document}
     """)

print template.render({
    'class':'memoir',
    'content': '{bob} <-- is escaped',
    })

When you run the above, it outputs:

 \documentclass[memoir]
     \begin{document}
     {bob} <-- is escaped
     \end{document}

The problem here is that HTML escaping is used. So { and } should be escaped, but they're not, and < is converted to < but it should not be.

I'd like to overload the escape function that Jinja2 uses to escape variables.

My first thought is to overload finalize and disable autoescape. e.g.

import jinja2

class MyEnv(jinja2.Environment):
    def __init__(self, filters={}, globals={}, tests={},
        loader=None, extensions=[], **kwargs):

        super(MyEnv, self).__init__(
            autoescape          = False, # turn off autoescape
            finalize            = self.finalize,
        )

    def finalize(self, s):
        import re
        if isinstance(s, jinja2.Markup):
            return s
        s = s.replace('\\', '')
        s = s.replace('~', '\\textasciitilde')
        s = re.sub(r'([#|^|$|&|%|{|}])', r'\\\1', s)
        s = re.sub(r'_', r'\\_', s)

        return jinja2.Markup(s)


template = MyEnv().from_string("""\documentclass[{{ class }}]
         \\begin{document}
         {{ content }}
         \end{document}
     """)

print template.render({
    'class':'memoir',
    'content': '{bob} <-- is escaped',
    })

The output is incorrect, because the main text isn't made into Markup (i.e. a string flagged as safe):

documentclass[memoir]
     begin\{document\}
     \{bob\} <-- is escaped
     end\{document\}

If I set autoescape to True, and leave in finalize it almost works (and in this example, it does work):

\documentclass[memoir]
     \begin{document}
     \{bob\} <-- is escaped
     \end{document}

Turning autoescape on works because it makes the main body of text for the template as Markup (i.e. safe).

However, here's where the problem lies, if I change the input to a list that's joined:

template = MyEnv().from_string("""\documentclass[{{ class }}]
         \\begin{document}
         {{ content|join("  > a & b > "|safe) }}
         \end{document}
     """)

print template.render({
    'class':'memoir',
    'content': ['A&B', 'C<D'],
    })

When I run this I get:

\documentclass[memoir]
     \begin{document}
     A&B  > a & b > C<D
     \end{document}

It would seem HTML autoescape is being run on the elements of 'content', rather than finalize. The simplest solution, provided Jinja2 and its autoescaping are loosely coupled, would seem to be to overload a autoescape function. I can't seem to figure that out, and the best I've come up with is the finalize function.

Is there a better way to handle escaping of TeX than overloading the finalize function? Can one overload autoescape?

For example, could one install a custom Markup package? (a choice I'd prefer to avoid)

Thank you for reading.

分享到QQ

分享到微博