如何为中文等特殊字符获取一定长度的子字符串

发布于 2024-12-09 04:15:07 字数 150 浏览 0 评论 0原文

例如,如果描述是英文的,我可以使用 {description?substring(0, 80)} 获取 80 个字符,但是对于中文字符,我只能获取大约 10 个字符,并且存在垃圾始终以 char 结尾。

我怎样才能获得任何语言的 80 个字符?

For example, I can get 80 chars with {description?substring(0, 80)} if description is in English, but for Chinese chars, I can get only about 10 chars, and there is a garbage char at the end always.

How can I get 80 chars for any language?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

盛夏已如深秋| 2024-12-16 04:15:07

FreeMarker 依赖 String#substring 进行实际的(基于 UTF-16 字符?)子字符串计算,这对于中文字符来说效果不佳。相反,我们应该使用 Unicode 代码点。基于 这篇文章 和 FreeMarker 自己的子字符串内置函数我将 FreeMarker TemplateMethodModelEx 实现组合在一起,该实现在代码点上运行:

public class CodePointSubstring implements TemplateMethodModelEx {

    @Override
    public Object exec(List args) throws TemplateModelException {
        int argCount = args.size(), left = 0, right = 0;
        String s = "";
        if (argCount != 3) {
            throw new TemplateModelException(
                    "Error: Expecting 1 string and 2 numerical arguments here");
        }
        try {
            TemplateScalarModel tsm = (TemplateScalarModel) args.get(0);
            s = tsm.getAsString();
        } catch (ClassCastException cce) {
            String mess = "Error: Expecting numerical argument here";
            throw new TemplateModelException(mess);
        }

        try {
            TemplateNumberModel tnm = (TemplateNumberModel) args.get(1);
            left = tnm.getAsNumber().intValue();

            tnm = (TemplateNumberModel) args.get(2);
            right = tnm.getAsNumber().intValue();

        } catch (ClassCastException cce) {
            String mess = "Error: Expecting numerical argument here";
            throw new TemplateModelException(mess);
        }
        return new SimpleScalar(getSubstring(s, left, right));
    }

    private String getSubstring(String s, int start, int end) {
        int[] codePoints = new int[end - start];
        int length = s.length();
        int i = 0;
        for (int offset = 0; offset < length && i < codePoints.length;) {
            int codepoint = s.codePointAt(offset);
            if (offset >= start) {
                codePoints[i] = codepoint;
                i++;
            }
            offset += Character.charCount(codepoint);
        }
        return new String(codePoints, 0, i);
    }
}

您可以放置​​一个实例例如,将其放入您的数据模型根目录中,

SimpleHash root = new SimpleHash();
root.put("substring", new CodePointSubstring());
template.process(root, ...);

并在 FTL 中使用自定义子字符串方法:

${substring(description, 0, 80)}

我使用非中文字符对其进行了测试,该方法仍然有效,但到目前为止我还没有尝试使用中文字符。也许你想尝试一下。

FreeMarker relies on String#substring to do the actual (UTF-16-chars-based?) substring calculation, which doesn't work well with Chinese characters. Instead one should uses Unicode code points. Based on this post and FreeMarker's own substring builtin I hacked together a FreeMarker TemplateMethodModelEx implementation which operates on code points:

public class CodePointSubstring implements TemplateMethodModelEx {

    @Override
    public Object exec(List args) throws TemplateModelException {
        int argCount = args.size(), left = 0, right = 0;
        String s = "";
        if (argCount != 3) {
            throw new TemplateModelException(
                    "Error: Expecting 1 string and 2 numerical arguments here");
        }
        try {
            TemplateScalarModel tsm = (TemplateScalarModel) args.get(0);
            s = tsm.getAsString();
        } catch (ClassCastException cce) {
            String mess = "Error: Expecting numerical argument here";
            throw new TemplateModelException(mess);
        }

        try {
            TemplateNumberModel tnm = (TemplateNumberModel) args.get(1);
            left = tnm.getAsNumber().intValue();

            tnm = (TemplateNumberModel) args.get(2);
            right = tnm.getAsNumber().intValue();

        } catch (ClassCastException cce) {
            String mess = "Error: Expecting numerical argument here";
            throw new TemplateModelException(mess);
        }
        return new SimpleScalar(getSubstring(s, left, right));
    }

    private String getSubstring(String s, int start, int end) {
        int[] codePoints = new int[end - start];
        int length = s.length();
        int i = 0;
        for (int offset = 0; offset < length && i < codePoints.length;) {
            int codepoint = s.codePointAt(offset);
            if (offset >= start) {
                codePoints[i] = codepoint;
                i++;
            }
            offset += Character.charCount(codepoint);
        }
        return new String(codePoints, 0, i);
    }
}

You can put an instance of it into your data model root, e.g.

SimpleHash root = new SimpleHash();
root.put("substring", new CodePointSubstring());
template.process(root, ...);

and use the custom substring method in FTL:

${substring(description, 0, 80)}

I tested it with non-Chinese characters, which still worked, but so far I haven't tried it with Chinese characters. Maybe you want to give it a try.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文