如何以编程方式读取 .pdf 文件并将其转换为音频(.mp3 格式)?
我想从我的 C# 应用程序解析 PDF 文件并创建一个音频文件。 我该怎么做呢?
我特别在寻找一个好的 pdf 到文本库或一种将 pdf 文件从其文本中剥离的方法。
I want to parse a PDF file from my C# app and create an audio file off it.
How would I do that ?
I'm particularly looking for a good pdf to text library or a way to strip a pdf file off its text.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(8)
使用 Festival 进行文本转语音。 存在各种 pdf 到文本 api...
Use Festival for the text to speech. Various pdf to text api's exist...
您需要 Microsoft 的语音 SDK。 请在此处阅读说明
You need the Speech SDK from Microsoft. Read an instruction here
正如其他海报所述,首先您必须从 .pdf 文件中提取文本。 pdf 文件现在是一种开放格式,因此您可能可以通过 Google 找到解析器。
然后,您必须从文件中提取要转换为语音的文本,忽略图形标题、页眉、目录等内容。
获得文本后,您需要将其转换为语音。 这可能是最难的部分。
不久前,我正在摆弄为游戏模组生成语音文件,因为我是一个烂配音演员。
Cepstral 拥有我能找到的最好的 TTS 转换器。 (免费的有一个恼人的倾向,在语音中插入倒谱广告,但我可以根据我正在做的事情手动编辑它。)
事实证明,有一种语音合成标记语言可用于为 TTS 提供线索关于哪个音节放置重音等的转换器。这是一个链接:
http://www.w3 .org/TR/speech-synthesis/
我有点无法理解如何自动将 SSML 添加到文本中。
无论如何,TTS 转换器将生成一个音频文件,最后一步是以所需的比特率将音频压缩为 mp3 格式。
As the other posters outlined, first you have to extract the text from the .pdf file. pdf files are an open format now, so you can probably find a parser through Google.
Then you have to extract the text you want to convert to speech from the file, ignoring things like figure titles, page headers, table of contents etc.
Once you've got the text, you need to convert it to speech. This is probably the hardest part.
A while ago I was fiddling around with generating voice files for a gaming mod, since I'm a rotten voice actor.
Cepstral had the best TTS converters I could find. (The free ones had an annoying tendency to insert Cepstral advertisements in the speech, but I could manually edit this out for what I was doing.)
It turns out that there's a speech synthesis markup language which can be used to provide clues to the TTS converter about which syllable to place accents, etc. Here's a linky:
http://www.w3.org/TR/speech-synthesis/
How you go about automatically adding the SSML to the text is a bit beyond me.
Anyway, the TTS converter will produce an audio file, and the final step would be to compress the audio at the desired bit rate in mp3 format.
如果您的唯一任务是收听 PDF 中的语音合成文本,那么“查看”菜单底部的 Acrobat“大声朗读”功能怎么样?
If your sole task is to listen to speech synthesized text from a PDF, how about the Acrobat "Read out loud" function at the bottom of the "View" menu?
我想这是一件很难做到的事情。 首先,您需要阅读该 pdf 中的文本,然后使用某种合成语音生成机制来创建音频内容。 然后你必须将其存储为 mp3。
I guess it's a hard thing to do. Firstly you need to read the text in that pdf, and then use some mechanism of synthetic voice generation to create the audio content. Then you have to store it as an mp3.
在 Mac OS X 上,您可以提取 pdf 的文本,然后通过管道将其输入“say”。 您应该在其他操作系统上找到等效的合成器。
On Mac OS X, you can extract the text of the pdf and then pipe it in "say". You should find equivalent synthetisers on other OS.
做起来并不那么复杂,只要您不重新发明轮子,而是简单地重用现有技术(即文本到语音引擎,如节日),以及 OCR 引擎来处理 PDF 文件。
最复杂的事情可能是使用不同的 PDF 布局(列、行、嵌入图形、脚注、URL 等),这可能会混淆文本识别过程。
然而,一般来说(如果这不应该是一种学习体验),使用现有的软件解决方案肯定会更容易:
It's not all that complicated to do, provided that you don't re-invent the wheel, but instead simply reuse existing technology (i.e. text to speech engines like festival), as well as OCR engines to process the PDF files.
The most complicated thing probably is to work with different PDF layouts (columns, rows, embedded graphics,foot notes, URLs etc), which may obfuscate the text recognition process.
However, in general (if this is not supposed to be a learning experience), it is certainly easier to just resort to using existing software solutions:
您最好有一个带标签的 PDF 文档作为您的输入文档。 这意味着文档包含标记文档逻辑结构的标签(通常 PDF 文档仅包含视觉信息)。
然后可以将此 PDF 转换为 DAISY 格式,这是数字化标准有声书籍,即存储书籍文本以及逻辑结构和导航功能的中间 XML 格式。
此 Daisy XML 格式可以转换为音频格式,或者您可以使用 Daisy 阅读器,一个像MP3播放器这样的物理设备来听书。
Daisy 网站上有一个演示文稿解释了该工具链的原理:
You preferably have a tagged PDF document as your input document. This means that the document contains tags to mark up the logical structure of the document (typically a PDF document will only contain visual information).
This PDF could then be converted into DAISY format, which is a standard for digital talking books, i.e. an intermediate XML format storing the text of books along with the logical structure and navigation features.
This Daisy XML format can be either converted to an audio format, or you could be using a Daisy reader, a physical device like an MP3 player to listen to the book.
There is a presentation available at the Daisy web site explaining the principles of this toolchain: