JTextField textField = new JTextField();
String textToTest = "abcdefg";
FontRenderContext fontRenderContext = textField.getFontMetrics(font).getFontRenderContext();

GlyphVector glyphVector = font.layoutGlyphVector(fontRenderContext, textToTest.toCharArray(), 0, 4, Font.LAYOUT_LEFT_TO_RIGHT);
int layoutFlags = glyphVector.getLayoutFlags();
boolean hasComplexGlyphs = (layoutFlags & GlyphVector.FLAG_COMPLEX_GLYPHS) != 0;
int numberOfGlyphs = glyphVector.getNumGlyphs();

numberOfGlyphs 应表示用于显示输入文本的字符数。

不幸的是，您需要创建一个 java GUI 组件来获取 FontRenderContext。

You may be able to get this information from the GlyphVector class.

For a given String a Font instance can create a GlyphVector that can provide information about the rendering of the text.

The layoutGlyphVector() method on the Font can provide this.

The FLAG_COMPLEX_GLYPHS attribute of the GlyphVector can tell you if the text does not have a 1 to 1 mapping with the input characters.

The following code shows an example of this:

JTextField textField = new JTextField();
String textToTest = "abcdefg";
FontRenderContext fontRenderContext = textField.getFontMetrics(font).getFontRenderContext();

GlyphVector glyphVector = font.layoutGlyphVector(fontRenderContext, textToTest.toCharArray(), 0, 4, Font.LAYOUT_LEFT_TO_RIGHT);
int layoutFlags = glyphVector.getLayoutFlags();
boolean hasComplexGlyphs = (layoutFlags & GlyphVector.FLAG_COMPLEX_GLYPHS) != 0;
int numberOfGlyphs = glyphVector.getNumGlyphs();

numberOfGlyphs should represent the number of characters used to display the input text.

Unfortunately you need to create a java GUI component to get the FontRenderContext.

回复收藏 0 原文

假装爱人 2024-09-21 16:41:25

你所说的不是连字（至少不是 Unicode 的说法）而是字素簇。有一个标准附件涉及发现文本边界，包括字形簇边界：

http://www.unicode.org/reports/tr29/tr29-15.html#Grapheme_Cluster_Boundaries

另请参阅正则表达式中定制字素簇的描述：

http://unicode.org/reports/tr18/#Tailored_Graphemes_Clusters

以及排序规则字素的定义：

http://www.unicode.org/reports/tr10/#Collation_Graphemes

我认为这些是起点。更困难的部分可能是找到适用于梵文语言环境的 Unicode 排序算法的 Java 实现。如果找到，您就可以分析字符串，而无需求助于 OpenType 功能。这会更清晰一些，因为 OpenType 关心纯粹的表示细节，而不是字符或字素簇语义，但排序算法和定制的字素簇边界查找算法看起来好像它们可以独立于字体来实现。

回复收藏 0 原文

莫多说 2024-09-21 16:41:25

虽然 Aaron 的答案并不完全正确，但它把我推了进去正确的方向。在阅读了 java.awt.font.GlyphVector 的 Java API 文档并在 Clojure REPL 上进行了大量操作之后，我能够编写一个可以完成我想要的功能的函数。

这个想法是找到 glyphVector 中的字形宽度，并将零宽度的字形与最后找到的非零宽度字形组合起来。解决方案是在 Clojure 中，但如果需要，它应该可以转换为 Java。

(ns net.abhinavsarkar.unicode
  (:import [java.awt.font TextAttribute GlyphVector]
           [java.awt Font]
           [javax.swing JTextArea]))

(let [^java.util.Map text-attrs {
        TextAttribute/FAMILY "Arial Unicode MS"
        TextAttribute/SIZE 25
        TextAttribute/LIGATURES TextAttribute/LIGATURES_ON}
      font (Font/getFont text-attrs)
      ta (doto (JTextArea.) (.setFont font))
      frc (.getFontRenderContext (.getFontMetrics ta font))]
  (defn unicode-partition
    "takes an unicode string and returns a vector of strings by partitioning
    the input string in such a way that multiple code points of a single
    ligature are in same partition in the output vector"
    [^String text]
    (let [glyph-vector 
            (.layoutGlyphVector
              font, frc, (.toCharArray text),
              0, (.length text), Font/LAYOUT_LEFT_TO_RIGHT)
          glyph-num (.getNumGlyphs glyph-vector)
          glyph-positions
            (map first (partition 2
                          (.getGlyphPositions glyph-vector 0 glyph-num nil)))
          glyph-widths
            (map -
              (concat (next glyph-positions)
                      [(.. glyph-vector getLogicalBounds width)])
              glyph-positions)
          glyph-indices 
            (seq (.getGlyphCharIndices glyph-vector 0 glyph-num nil))
          glyph-index-width-map (zipmap glyph-indices glyph-widths)
          corrected-glyph-widths
            (vec (reduce
                    (fn [acc [k v]] (do (aset acc k v) acc))
                    (make-array Float (count glyph-index-width-map))
                    glyph-index-width-map))]
      (loop [idx 0 pidx 0 char-seq text acc []]
        (if (nil? char-seq)
          acc
          (if-not (zero? (nth corrected-glyph-widths idx))
            (recur (inc idx) (inc pidx) (next char-seq)
              (conj acc (str (first char-seq))))
            (recur (inc idx) pidx (next char-seq)
              (assoc acc (dec pidx)
                (str (nth acc (dec pidx)) (first char-seq))))))))))

还发布了 Gist。

While Aaron's answer is not exactly correct, it pushed me in the right direction. After reading through the Java API docs of java.awt.font.GlyphVector and playing a lot on the Clojure REPL, I was able to write a function which does what I want.

The idea is to find the width of glyphs in the glyphVector and combine the glyphs with zero width with the last found non-zero width glyph. The solution is in Clojure but it should be translatable to Java if required.

(ns net.abhinavsarkar.unicode
  (:import [java.awt.font TextAttribute GlyphVector]
           [java.awt Font]
           [javax.swing JTextArea]))

(let [^java.util.Map text-attrs {
        TextAttribute/FAMILY "Arial Unicode MS"
        TextAttribute/SIZE 25
        TextAttribute/LIGATURES TextAttribute/LIGATURES_ON}
      font (Font/getFont text-attrs)
      ta (doto (JTextArea.) (.setFont font))
      frc (.getFontRenderContext (.getFontMetrics ta font))]
  (defn unicode-partition
    "takes an unicode string and returns a vector of strings by partitioning
    the input string in such a way that multiple code points of a single
    ligature are in same partition in the output vector"
    [^String text]
    (let [glyph-vector 
            (.layoutGlyphVector
              font, frc, (.toCharArray text),
              0, (.length text), Font/LAYOUT_LEFT_TO_RIGHT)
          glyph-num (.getNumGlyphs glyph-vector)
          glyph-positions
            (map first (partition 2
                          (.getGlyphPositions glyph-vector 0 glyph-num nil)))
          glyph-widths
            (map -
              (concat (next glyph-positions)
                      [(.. glyph-vector getLogicalBounds width)])
              glyph-positions)
          glyph-indices 
            (seq (.getGlyphCharIndices glyph-vector 0 glyph-num nil))
          glyph-index-width-map (zipmap glyph-indices glyph-widths)
          corrected-glyph-widths
            (vec (reduce
                    (fn [acc [k v]] (do (aset acc k v) acc))
                    (make-array Float (count glyph-index-width-map))
                    glyph-index-width-map))]
      (loop [idx 0 pidx 0 char-seq text acc []]
        (if (nil? char-seq)
          acc
          (if-not (zero? (nth corrected-glyph-widths idx))
            (recur (inc idx) (inc pidx) (next char-seq)
              (conj acc (str (first char-seq))))
            (recur (inc idx) pidx (next char-seq)
              (assoc acc (dec pidx)
                (str (nth acc (dec pidx)) (first char-seq))))))))))

Also posted on Gist.

回复收藏 0 原文