我们可以在 C# 中使用多线程将 Microsoft Word 文档转换为 HTML 吗?
我有一个 Windows 服务,它会轮询数据库中任何上传的 doc、docx、pdf 和 rtf 类型的文档,并将它们转换为 HTML 并将它们保存到本地文件系统中。文档从数据库中获取并在内存中排队,然后由多个线程从共享队列中获取进行处理。
我面临的问题是,处理速度在一段时间后变得更慢。对于大小为 50 KB 的文档,最初几天转换速度较快(例如 2 秒),几天后转换速度较慢(例如对于同一文档需要 20 秒)。我所看到的只是随着时间的推移,处理时间呈下降趋势。我无法确定是什么导致了这种下降趋势。即使重新启动 Windows 服务也无济于事。
Microsoft Office 安装在 Windows Server 上用于文档转换。每天有近 2000 个文档被转换为 HTML。
所以我的问题是我们可以使用多线程将 Microsoft Word 文档处理为 HTML 吗?
I have a Windows Service which polls the database for any uploaded documents of type doc, docx, pdf and rtf and convert them to HTML and save them into the local file system. The documents are fetched from database and queued in the memory and then picked up by multiple threads for processing from the shared queue.
The problem I am facing is, the processing become slower over a period of time. The conversion is happening faster in the initial few days say 2 seconds for a document of size 50 KB and slower after few days of time say 20 seconds for the same document. All I can see is a declining trend in the processing time as the days are progressing. I couldn't nail down to what is causing this declining trend. Even restarting of the Windows Service is not helping.
Microsoft Office is installed on the Windows Server for the document conversion. And per day nearly 2000 documents are being converted to HTML.
So my question is can we use multi threading to process Microsoft Word document to HTML?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我认为您已经尽可能多地使用多线程 - 您无法使 Word 更高效,只需并行运行多个 Word 实例(您正在这样做)。我建议花更多时间进行调查。
进行一些日志记录/跟踪和分析。找出哪些代码/方法行确实很慢。
如果结果是 Word 速度慢,请尝试观察它和系统。缓慢从何而来?是不是CPU全部用完了?也许磁盘访问次数过多?也许某处收集了太多临时文件?或者也许你的 RAM 已经用完了,而 Windows 正在疯狂地进行交换?在最后一种情况下,什么在使用这一切?也许您没有正确关闭某些内容(例如 Word 本身或您打开的文件)?
I think you are already using as much multithreading as is possible - you can't make Word more efficient, just run several Word instances in parallel (which you are doing). I'd suggest spending more time in investigation.
Do some logging/tracing and profiling. Find out which lines of code/methods are the ones that are really slow.
If it turns out to be Word that is slow, try watching it and the system. Where does the slowness come from? Is it using up all the CPU? Perhaps the disk is being accessed too much? Maybe there are too many temporary files gathered somewhere? Or perhaps you run out of RAM and Windows is swapping like mad? In the last case what is using it all? Maybe you're not closing something properly (like Word itself or the files that you make it open)?