哪里可以找到很好的字体介绍
我必须编写一些使用字体的代码。有没有关于该主题的良好介绍可以帮助我开始?
I have to write some code working with fonts. Is there a good introduction to the subject to get me started?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
有一个非常好的介绍每个开发人员都应该了解有关字体的知识。
我已经在这里复制了这篇文章,但很多文章都依赖于特定的字体部分和图片,所以我强烈推荐上面的链接。
我原本以为使用字体会很简单。然而,正确处理字体最终成为 Windward Reports(我们的 XML 和 SQL 报告系统)中的一项重大工作。如果您要做的不仅仅是在表单中放置一行文本,那么细节就开始变得重要了。
字体和字形
那么什么是字体呢?从根本上来说,字体是一系列的字形。您所认为的字母 A 之类的字符是一个字形。字体是该字体中所有字母的一组字形。如果你使用 Helvetica 字体,它们的所有字形看起来都是一种方式。如果你使用 Times Roman 字体,它们看起来会有所不同。每个都是该字体的一组字形。
现在我们需要引入代码页的概念。代码页是从字符编号到特定字形的映射。程序最初将每个字符存储为一个字节。然后对于亚洲字符集,有 DBCS 系统(有些字符是 1 个字节,有些是 2 个字节)。如今的程序大多使用 Unicode,但网页往往使用 UTF-8,这是一种最多可达 4 个字节的多字节序列。
为什么要提出编码?因为每种字体都有一种编码,其中字符号 178 可能会返回非常不同的字形,具体取决于字体使用的代码页。大多数字体文件使用 Unicode,因此您有一个标准,但许多程序仍然使用特定的代码页,该页面映射到字体。当您显示 ABC 并且字体为 Wingdings 时就会发生这种情况,因此您会得到 。因此,第一点是您需要确保您使用的编码匹配或映射到您使用的字体的编码。
而且它变得更加复杂。值为 0xE000 – 0xF8FF 的字符未定义。每种字体都可以制作它们想要的任何内容(一种用途是添加克林贡脚本)。因此,根据定义,具有此范围内的值的字符与用于显示该字体的字体文件相关联。这就是大多数符号类型字体的工作方式。
好的,所以您正在使用 Unicode,您的字体文件也使用 Unicode,您向它传递一个字符串,然后……该字符串显示为空白。这是怎么回事?嗯,不要求字体文件具有任何给定字符的字形。 Symbol 字体不会有 ABC。欧洲和美国使用的大多数字体没有中文、日文或韩文字形。使用字体没有的字形并不是错误,但它不会显示任何内容,不是空白,而是什么也不显示(即0点宽)。
如果您使用旧代码页之一并且想要显示代码页中不存在的字形,您也可能会遇到类似的问题。在这种情况下,您需要映射到不同的代码页,至少对于该字符(这就是 Word 处理这种情况的方式)。
字体系列
字体分为几个不同的类别。首先是比例字体和等宽字体。在等宽字体中,所有字符的宽度完全相同。并且高度是一致的,因为所有小写字母与所有大写字母的高度相同。尽可能避免使用等宽字体,因为它们更难阅读。亚洲字体几乎都是等宽的,因为中文汉字都有相同的宽度和高度,所以比例没有意义。另一方面,希伯来语和阿拉伯语几乎必须成比例。
接下来是衬线字体,您可以在笔划末尾得到一些东西,无衬线字体,您在末尾不会得到任何额外的东西,装饰性的字体远远超出正常范围,而符号可以有任何随机的东西,包括条形码与映射到字形的字符代码的 ASCII 数字相匹配。这只是西欧字母。
字体计量学
现在我们开始测量字体,对于字体来说,大多数(不是全部)都是测量字形。字体使用的标准测量单位是点,虽然点的最初含义有很多历史,但对于计算机世界来说,它是 72 点 == 1 英寸。有时您还会看到缇,它代表点的二十分之一,因此 1440 缇 == 1 英寸。我们现在拥有动车组,其中 914400 个动车组 == 1 英寸(更多信息请参见此处)。如果您使用点,则需要使用浮点变量。缇通常可以作为整数,而动车组绝对可以。
然后是字体大小。这是一个完全任意的数字。可以将其想象为旧 CRT 显示器的对角线尺寸,其中实际尺寸接近您的预期,但从来不是那个数字。磅值决定了渲染字形的大小,但在页面上没有具体的测量值。
现在,字体度量开始变得有趣了。首先,一切都必须从基线开始衡量。从字体的任何其他部分工作都行不通——你会遇到重大问题。所以从这里开始。基线上方的最高绘制部分是上升部分,基线下方的最低绘制部分是下降部分,两者都是从基线测量的。
然后是两行文本之间的间距。这是字体设置,因为字体设计者确定该字体的适当间距。这可以以不同的方式返回,Windows 认为这是您在下一行上方放置的间距,返回从基线到基线的度量,而 Java 将其视为下一行之前的一行下方的间距,并仅返回此值。该行距是您在相似的单行距文本行之间放置的间距。如果间距大于单个间距,则添加到该值。
您通常希望获取字体的这些高度,而不是您显示的字符串中的字形字符串的高度。为什么?因为如果一行是“we were wrox”——没有上升部或下降部,该行将放置得更靠近段落中的其他行,这看起来很奇怪。您还需要查看所有字体和磅值,因为如果某些文本较大,则必须使用较大的上升/下降/前导值。但仅适用于具有较大文本的行,而不适用于整个段落。再说一遍,所有这些都是从基线测量的,这是处理混合字体/大小的唯一方法。
好吧,高度需要一些工作,但它非常简单,但是宽度 - 这变得非常有趣。我所说的有趣是指你必须把一切都做得恰到好处。从根本上讲,除了固定宽度字体之外,每个字形的宽度之和将不等于一起呈现的所有这些字形的宽度。几乎从来没有。为什么?有几个原因:
•字距调整是根据相邻字母放置字母的位置。这就是为什么 AB 保持不同而 tt 重叠很多的原因。
• 拉丁字母中的某些字符组合会被组合,例如ae 变为æ,德语中的ss 变为ß。
•同一字符的希伯来语和阿拉伯语字形有所不同,具体取决于它是否位于单词的开头、中间或结尾。就阿拉伯语而言,尤其是两端使用的字形往往比中间的字形更宽。因此,ﺺ 的宽度取决于它在字符串中的位置。
双向字体还有下面列出的其他问题。
•复杂的脚本,如印度语(印度),将更改由多个字符构建的位置处的字形。因此,三个字符串的宽度可以是 1 到 3 个字形。
非常简单,您需要将完整的、完全格式化的字符串提供给您所运行的平台提供的 fontmetrics API,以获取字符串的长度。这是一个昂贵的调用,因为字符串将被渲染到内存中以确定长度,但没有其他方法可以做到准确。并且您必须使用与渲染时完全相同的测量设置。每当这些不匹配时,我们就会发现差异足够大,以至于人眼可以分辨出来。测试代码的最佳方法是查看右对齐文本,因为您通常必须在渲染时获取字符串左端的基线位置,因此如果您计算的长度错误 - 它会显示出来。
双向文本
最后我们还有双向文本(阿拉伯语和希伯来语)的问题。双向文本从右到左,但数字和拉丁单词从左到右除外。因此,它是从右到左阅读的,然后在拉丁文本的数字或序列上,您跳到最左边的点,从左到右读回您完成前一个希伯来语/阿拉伯语的位置,然后跳到拉丁语/阿拉伯语的开头数字部分并从右到左返回。
关于何时应该进行这些转换已经进行了大量研究。有些角色具有强方向性,有些角色具有弱方向性,有些角色没有方向偏好。你没有祈祷正确执行这些规则。没有任何。但一切并没有失去。几乎每个平台(包括 Java 和 Windows)都有一个 API,您可以在其中按读取顺序提供字符串,它会根据规则正确呈现它们。他们还有一个 API 来告诉您每个字符的位置,以及如果您想向前或向后移动插入符号 1 个字符,您应该移动到哪个字符。
无论文本如何,您都可以使用此 API 进行所有字体渲染和插入符号移动,并且它在复杂的脚本上也能正常工作。如果您的目标不是双向或复杂的脚本,那么开始使用它会有点痛苦,但如果您最终要实现这一点,最好开始使用它,这样您就不必重新构建您的应用程序。代码。相信我,您真的不想重新架构(我曾经不得不重新架构 - 哦!)。
警告
请勿将 Windows 字体复制到 Linux 或其他操作系统。字体规格往往会出现偏差,文本也会看起来不正常。我不知道 TrueType 应该是可移植的,但实际上就像 Java 到处都是一次调试一样,字体往往是一次设计一次调整。从针对您的平台优化字体的供应商处获取字体。
There is a very good introduction at What every developer should know about fonts.
I have copied the post here but a lot of the post is dependent on specific fonts parts are written in and pictures so I strongly recommend the link above.
I originally thought using fonts would be pretty simple. However, proper handling of fonts has ended up being a significant effort in Windward Reports (our XML and SQL Reporting system). If you're going to do much more than place a line of text in a form, then the details start to matter.
Fonts & Glyphs
So what is a font? Fundamentally a font is a series of glyphs. What you think of as a character like the letter A is a glyph. A font is then a set of glyphs for all the letters in that font. If you get the Helvetica font, all their glyphs look one way. If you get the Times Roman font, they look another. Each is the set of glyphs from that font.
Now we need to introduce the concept of code pages. A code page is a mapping from a character number to a specific glyph. Programs originally stored each character as a byte. Then for Asian character sets there were the DBCS systems (some characters were 1 byte, some 2). Programs today mostly use Unicode, but web pages tend to be UTF-8 which is a multi-byte sequence that can be up to 4 bytes.
Why bring up encoding? Because each font has an encoding where character number 178 could return a very different glyph depending on the codepage used by the font. Most font files use Unicode so you have a standard there, but many programs still use specific code pages, where that page is mapped to the font. This is what occurs when you display ABC and the font is Wingdings so you get . So point one is you need to make sure that the encoding you use matches or is mapped to the encoding of the fonts you use.
And it gets even more complex. The characters with the values 0xE000 – 0xF8FF are undefined. Each font can make those anything they want (one use is to add the Klingon script). So a character with a value in this range is by definition tied to the font file it is using to display that font. This is how most symbol type fonts work.
Ok, so you are using Unicode, your font file uses Unicode, you pass it a string and… the string displays blank. What's going on? Well, there's no requirement that a font file have a glyph for any given character. A Symbol font won't have ABC. Most fonts used in Europe and America don't have the Chinese, Japanese, or Korean glyphs. It's not an error to use a glyph that a font does not have, but it will display nothing, not blank, but nothing (i.e. 0 points wide).
You can also hit a similar problem if you are using one of the old code pages if you want to display a glyph that does not exist in the code page. In that case you need to map in a different code page, at least for that character (this is how Word used to handle this case).
Font Families
Fonts fall into several different classes. First there is proportional vs. monospaced fonts. In a monospaced font all characters are the exact same width. And the height is consistent in that all lower case letters are the same height as are all upper case. Avoid monospaced fonts as much as possible because they are much harder to read. Asian fonts are almost all monospaced because the Chinese Han characters all have identical widths and heights, so proportional would make no sense. On the flip side, Hebrew and Arabic pretty much have to be proportional.
Next is the typeface which can be serif where you get stuff at the end of their stroke, sans serif where you do not get anything extra at the end, decorative where it is way beyond normal, and Symbol that can have anything random, including barcodes that match the ASCII numbers of the character codes mapped to the glyphs. And this is just the Western European alphabets.
Fontmetrics
Now we get in to measuring fonts, and by fonts most (not all) of it is measuring glyphs. The standard measurement used for fonts is the point and while there's a lot of history to what a point originally meant, for the computer world it has been 72 points == 1 inch. You will also sometimes see twip which stands for twentieth of a Point so 1440 twips == 1 inch. And we now have EMU where 914400 EMUs == 1 inch (more here). If you work with points, you need to use floating point variables. Twips are generally ok as an integer and EMUs definitely are.
Then comes the font point size. This is a completely arbitrary number. Think of it like the diagonal size of the old CRT monitors where the actual size was close to what you expected, but was never that number. The point size determines the size of the rendered glyphs, but it has no specific measurement on the page.
Now here's where it starts to get interesting, the fontmetrics. First, everything must be measured from the baseline. Working from any other part of the font won't work – you will hit major problems. So start there. The highest drawn part above the baseline is that ascent and the lowest drawn part below the baseline is the descent, both measured from the baseline.
Then there is the spacing between two lines of text. This is a font setting as the font designer determines what is the appropriate spacing for that font. This can be returned different ways, Windows considers this the spacing you put above the next line returning a measure from baseline to baseline while Java views it as the spacing below a line before the next line and returns just this value. This leading is the spacing you place between lines of similar single spaced text. If the spacing is greater than single spacing, then you add to this value.
You generally want to get these heights for the fonts, not for the string of glyphs in the string you display. Why? Because what if a line is "we were wrox" – with no ascenders or descenders the line would be placed closer to the other lines in the paragraph and that would look weird. You also need to look at all fonts and point sizes because if some text is larger you must use the large ascent/descent/leading values. But only for the line(s) that have the larger text, not for the entire paragraph. And again, all of this is measured from the baseline which is the only way to handle mixed fonts/sizes.
Ok, height takes a bit of work but it's pretty straightforward, but the width – this gets really interesting. And by interesting I mean you have to get everything just right. Fundamentally, except for fixed width fonts, adding up the width of each glyph will not equal the width of all those glyphs rendered together. Pretty much never. Why? A couple of reasons:
•Kerning is where letters are placed based on the letter the adjoin. That is why AB stays distinct while tt overlaps quite a bit.
•Some character combinations in Latin alphabets are combined such as ae becoming æ and in German ss becoming ß.
•Hebrew and Arabic glyphs are different for the same character depending on if it is at the start, middle, or end of a word. And in the case of Arabic especially the glyphs used on the ends tend to be wider than the glyphs in the middle. So the width of ﺺ is dependent of where it is in the string.
◦Bi-directional fonts have an additional issue listed below.
•Complex scripts, like Indic (India) will change the glyph at a location building it up from several characters. So a three character string can be anything from 1 to 3 glyphs wide.
Very simply, you need to feed a complete, fully formatted string, to the fontmetrics API provided by the platform you are running on to get the length of the string. It's an expensive call because the string will be rendered to memory to determine the length, but there is no alternative that will be accurate. And you must use the exact same settings measuring as you do when rendering. Anytime these have not matched, we have found differences large enough that the human eye can make them out. The best way to test your code for this is to look at right aligned text, because you generally have to get the baseline position of the left end of the string when rendering so if you calculate the length wrong – it will show.
Bi-directional text
Finally we have the issue of bi-directional text (Arabic & Hebrew). Bi-directional text goes right to left, except numbers and Latin words go left to right. So it is read right to left, then on a number or sequence of Latin text you jump over to the left most point, read left to right back to where you completed the previous Hebrew/Arabic, then jump to the start of the Latin/number part and go back to right to left.
There has been a ton of research performed on when these switches should take place. There are characters that have a strong direction, characters that have a weak direction, and characters that have no directional preference. You have no prayer of correctly implementing these rules. None. But all is not lost. Pretty much every platform, including Java and Windows, has an API where you provide the string of characters in the order read, and it will render them correctly according to the rules. They also have an API for telling you where each character is located and which character you should move to if you want to move the caret 1 character forward or backwards.
You can use this API for all font rendering and caret movement regardless of text and it will work fine – on complex scripts too. It's a bit of a pain to start with this if you are not targeting bi-di or complex scripts, but if you're going to be there eventually it's best to start off using it so you don't have to re-architect your code. Trust me, you really really don't want to have to rearchitect (I had to once – OW!).
Warning
Do not copy Windows fonts to Linux or other operating systems. The fontmetrics tend to be off and the text will look off. I don't know what as TrueType is supposed to be portable, but in practice just like Java is write once debug everywhere, fonts tend to be design once tweak everywhere. Get fonts from a vendor who has optimized them for your platform.