C#:将 byte[] 转换为 UTF8 编码字符串
我正在使用一个名为 EXIFextractor 的库从图像中提取元数据信息。该库部分使用 System.Drawing.Imaging.PropertyItem 来完成所有艰苦的工作。根据 Microsoft 文档,PropertyItem 中的某些数据(例如图像详细信息等)是作为存储在 byte[] 中的 ASCII 字符串获取的。
我的问题是国际字符(å、ä、ö 等)被删除并替换为问号。当我调试代码时,很明显 byte[] 是 UTF-8 的表示。
我想将 byte[] 解析为 UTF8 字符串,如何才能在过程中不丢失任何信息的情况下执行此操作?
提前致谢!
更新:
我被要求提供代码片段:
第一个片段来自我使用的类,即由 Asim Goheer 编写的 EXIFextractor.cs
foreach( System.Drawing.Imaging.PropertyItem p in parr )
{
string v = "";
// ...
else if( p.Type == 0x2 )
{
// string
v = ascii.GetString(p.Value);
}
这是我的代码中我尽力处理上述结果。
try {
EXIFextractor exif = new EXIFextractor(ref bmp, "");
object o;
if ((o = exif["Image Description"]) != null)
MediaFile.Description = Tools.UTF8Encode(o.ToString());
我还尝试了其他几种方法来从数据中获取我宝贵的 å, ä, ö,但似乎没有任何效果。我开始认为汉斯·帕桑特在下面的回答中得出的结论是正确的。
I am using a library called EXIFextractor to extract metadata information from images. This lib in part is using System.Drawing.Imaging.PropertyItem to do all the hard work. Some of the data in PropertyItem, such as Image Details etcetera, are fetched as an ASCII-string stored in a byte[] according to the Microsoft documentation.
My problem is that international characters (å, ä, ö, etcetera) are dropped and replaced by questionmarks. When I debug the code it is apparent that the byte[] is a representation of an UTF-8.
I'd like to parse the byte[] as an UTF8-string, how can I do this without loosing any information in the process?
Thanks in advance!
Update:
I have been asked to provide a snippet from my code:
The first snippet is from the class I use, namely the EXIFextractor.cs written by Asim Goheer
foreach( System.Drawing.Imaging.PropertyItem p in parr )
{
string v = "";
// ...
else if( p.Type == 0x2 )
{
// string
v = ascii.GetString(p.Value);
}
And this is my code where I try my best to handle the results of the above.
try {
EXIFextractor exif = new EXIFextractor(ref bmp, "");
object o;
if ((o = exif["Image Description"]) != null)
MediaFile.Description = Tools.UTF8Encode(o.ToString());
I have also tried a couple of other ways of getting my precious å, ä, ö from the data, but nothing seems to do the trick. I am starting to think Hans Passant is right about his conclusions in his answer below.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
使用
GetString
方法 < a href="http://msdn.microsoft.com/en-us/library/system.text.encoding.utf8.aspx" rel="nofollow noreferrer">Encoding.UTF8
对象。Use the
GetString
method on theEncoding.UTF8
object.是的,这是生成图像的应用程序或相机的问题。 EXIF 标准对文本的支持很糟糕,它必须以 ASCII 进行编码。只有当摄影师说英语时,效果才会很好。毫无疑问,对图像进行编码的软件忽略了这一要求。这也是 PropertyItem 类正在做的事情,它使用 Marshal.StringToHGlobalAnsi() 将字符串编码为 byte[],它采用系统的默认代码页。
这个问题没有明显的解决办法,当照片距离你的机器太远时,你会得到 mojibake。
Yes, this is a problem with the app or camera that originated the image. The EXIF standard has horrible support for text, it has to be encoded in ASCII. That only ever works out well when the photographer speaks English. No doubt the software that encoded the image is ignoring this requirement. Which is what the PropertyItem class is doing as well, it encodes a string to byte[] with Marshal.StringToHGlobalAnsi(), which assumes the system's default code page.
There's no obvious fix for this, you'll get mojibake when the photo was made too far away from your machine.
也许你可以尝试另一种编码? UTF16、统一码?
如果您不确定它的编码是否正确,请尝试使用另一个 exif 阅读器查看 exif 元数据。
Maybe you could try another encoding? UTF16, Unicode?
If you aren't sure if it got encodes right in the first place try to view the exif metadata with another exif reader.