在c#中使用iTextSharp读取中文文本字符
我使用 iTextSharp 来阅读 pdf 文件。我可以阅读英文文本,但对于中文文本,我收到问号,如何使用 iTextSharp 阅读中文字符。
coverNoteFilePath = @"D:\Temp\cc8a12e6-399a-4146-81ac-e49eb67e7e1b\CoverNote.pdf";
try
{
PdfReader reader = new PdfReader(coverNoteFilePath);
for (int page = 1; page <= reader.NumberOfPages; page++)
{
ITextExtractionStrategy its = new iTextSharp.text.pdf.parser.SimpleTextExtractionStrategy();
String s = PdfTextExtractor.GetTextFromPage(reader, page, its);
s = Encoding.UTF8.GetString(ASCIIEncoding.Convert(Encoding.Default, Encoding.UTF8, Encoding.Default.GetBytes(s)));
coverNoteContent = coverNoteContent + s;
}
reader.Close();
Response.Write(coverNoteContent);
}
I used iTextSharp for reading pdf file. i can read the english text, but for chinese i am getting question marks, how can i read chinese characters using iTextSharp.
coverNoteFilePath = @"D:\Temp\cc8a12e6-399a-4146-81ac-e49eb67e7e1b\CoverNote.pdf";
try
{
PdfReader reader = new PdfReader(coverNoteFilePath);
for (int page = 1; page <= reader.NumberOfPages; page++)
{
ITextExtractionStrategy its = new iTextSharp.text.pdf.parser.SimpleTextExtractionStrategy();
String s = PdfTextExtractor.GetTextFromPage(reader, page, its);
s = Encoding.UTF8.GetString(ASCIIEncoding.Convert(Encoding.Default, Encoding.UTF8, Encoding.Default.GetBytes(s)));
coverNoteContent = coverNoteContent + s;
}
reader.Close();
Response.Write(coverNoteContent);
}
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
尝试将
ASCIIEncoding
替换为其他编码类之一(例如UTF8Encoding
)。我想 PDF 文档知道它们使用哪种编码,因此您也许能够在PdfReader
对象中找到正确的编码。值得检查。来自 MSDN:
Try replacing
ASCIIEncoding
with one of the other encoding classes (UTF8Encoding
for example). I imagine PDF documents know which encoding they use so you might be able to find the correct one in thePdfReader
object. Worth checking.From the MSDN: