当前位置：文江博客话题详情

如何为中文等特殊字符获取一定长度的子字符串

发布于 2024-12-09 04:15:07 字数 150 浏览 0 评论 0原文

例如，如果描述是英文的，我可以使用 {description?substring(0, 80)} 获取 80 个字符，但是对于中文字符，我只能获取大约 10 个字符，并且存在垃圾始终以 char 结尾。

我怎样才能获得任何语言的 80 个字符？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

盛夏已如深秋| 2024-12-16 04:15:07

FreeMarker 依赖 String#substring 进行实际的（基于 UTF-16 字符？）子字符串计算，这对于中文字符来说效果不佳。相反，我们应该使用 Unicode 代码点。基于这篇文章和 FreeMarker 自己的子字符串内置函数我将 FreeMarker TemplateMethodModelEx 实现组合在一起，该实现在代码点上运行：

public class CodePointSubstring implements TemplateMethodModelEx {

    @Override
    public Object exec(List args) throws TemplateModelException {
        int argCount = args.size(), left = 0, right = 0;
        String s = "";
        if (argCount != 3) {
            throw new TemplateModelException(
                    "Error: Expecting 1 string and 2 numerical arguments here");
        }
        try {
            TemplateScalarModel tsm = (TemplateScalarModel) args.get(0);
            s = tsm.getAsString();
        } catch (ClassCastException cce) {
            String mess = "Error: Expecting numerical argument here";
            throw new TemplateModelException(mess);
        }

        try {
            TemplateNumberModel tnm = (TemplateNumberModel) args.get(1);
            left = tnm.getAsNumber().intValue();

            tnm = (TemplateNumberModel) args.get(2);
            right = tnm.getAsNumber().intValue();

        } catch (ClassCastException cce) {
            String mess = "Error: Expecting numerical argument here";
            throw new TemplateModelException(mess);
        }
        return new SimpleScalar(getSubstring(s, left, right));
    }

    private String getSubstring(String s, int start, int end) {
        int[] codePoints = new int[end - start];
        int length = s.length();
        int i = 0;
        for (int offset = 0; offset < length && i < codePoints.length;) {
            int codepoint = s.codePointAt(offset);
            if (offset >= start) {
                codePoints[i] = codepoint;
                i++;
            }
            offset += Character.charCount(codepoint);
        }
        return new String(codePoints, 0, i);
    }
}

您可以放置一个实例例如，将其放入您的数据模型根目录中，

SimpleHash root = new SimpleHash();
root.put("substring", new CodePointSubstring());
template.process(root, ...);

并在 FTL 中使用自定义子字符串方法：

${substring(description, 0, 80)}

我使用非中文字符对其进行了测试，该方法仍然有效，但到目前为止我还没有尝试使用中文字符。也许你想尝试一下。

FreeMarker relies on String#substring to do the actual (UTF-16-chars-based?) substring calculation, which doesn't work well with Chinese characters. Instead one should uses Unicode code points. Based on this post and FreeMarker's own substring builtin I hacked together a FreeMarker TemplateMethodModelEx implementation which operates on code points:

public class CodePointSubstring implements TemplateMethodModelEx {

    @Override
    public Object exec(List args) throws TemplateModelException {
        int argCount = args.size(), left = 0, right = 0;
        String s = "";
        if (argCount != 3) {
            throw new TemplateModelException(
                    "Error: Expecting 1 string and 2 numerical arguments here");
        }
        try {
            TemplateScalarModel tsm = (TemplateScalarModel) args.get(0);
            s = tsm.getAsString();
        } catch (ClassCastException cce) {
            String mess = "Error: Expecting numerical argument here";
            throw new TemplateModelException(mess);
        }

        try {
            TemplateNumberModel tnm = (TemplateNumberModel) args.get(1);
            left = tnm.getAsNumber().intValue();

            tnm = (TemplateNumberModel) args.get(2);
            right = tnm.getAsNumber().intValue();

        } catch (ClassCastException cce) {
            String mess = "Error: Expecting numerical argument here";
            throw new TemplateModelException(mess);
        }
        return new SimpleScalar(getSubstring(s, left, right));
    }

    private String getSubstring(String s, int start, int end) {
        int[] codePoints = new int[end - start];
        int length = s.length();
        int i = 0;
        for (int offset = 0; offset < length && i < codePoints.length;) {
            int codepoint = s.codePointAt(offset);
            if (offset >= start) {
                codePoints[i] = codepoint;
                i++;
            }
            offset += Character.charCount(codepoint);
        }
        return new String(codePoints, 0, i);
    }
}

You can put an instance of it into your data model root, e.g.

SimpleHash root = new SimpleHash();
root.put("substring", new CodePointSubstring());
template.process(root, ...);

and use the custom substring method in FTL:

${substring(description, 0, 80)}

I tested it with non-Chinese characters, which still worked, but so far I haven't tried it with Chinese characters. Maybe you want to give it a try.

回复收藏 0 原文

~没有更多了~

关于作者

絕版丫頭

暂无简介

0 文章

0 评论

22 人气

关注发私信

友情链接

文江博客

如何为中文等特殊字符获取一定长度的子字符串

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

已经忘了多久

15867725375

LonelySnow

走过海棠暮

轻许诺言

信馬由缰

友情链接

如何为中文等特殊字符获取一定长度的子字符串

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

已经忘了多久

15867725375

LonelySnow

走过海棠暮

轻许诺言

信馬由缰

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。