将 Word 宏中的内容返回给 PHP

发布于 2024-10-21 00:18:25 字数 1147 浏览 3 评论 0原文

目标是获得 Microsoft Word 文件的准确字数统计。我们有一个运行 Apache 和 PHP 的 Windows 服务器。该机器上运行着一个 Web 服务,它基本上获取文档的所有内容,并通过 preg_match_all("/\S+/", $string, $matches); 运行内容;返回计数($matches[0]);。效果很好,但一点也不准确。因此,我们编写了以下宏:

Sub GetWordCountBreakdown()

    Dim x As Integer
    Dim TotalWords As Long
    Dim FieldWords As Long

    TotalWords = ActiveDocument.ComputeStatistics(wdStatisticWords)

    For x = 1 To ActiveDocument.Fields.Count
        If ActiveDocument.Fields.Item(x).Result.ComputeStatistics(wdStatisticWords) > 25 Then
            FieldWords = FieldWords + ActiveDocument.Fields.Item(x).Result.ComputeStatistics(wdStatisticWords)
        End If
    Next x

    MsgBox (TotalWords & " - " & FieldWords & " = " & TotalWords - FieldWords)

End Sub`

当我在 Word 中运行此宏时,它会显示一个简洁的小警报框,用于统计文档中的所有单词和引用。我不知道如何将这些值返回给 PHP,以便我的网络服务可以将它们传回给我。

更新:我能够用 PHP 重写这个宏并获得正确的字数。基本上:

$word = new COM("Word.Application")
$word->Documents->Open(file);
$wdStatisticWords = 0;
$wordcount = $word->ActiveDocument->ComputeStatistics($wdStatisticWords);

等等。

The objective is to get an accurate word count for a Microsoft Word file. We have a Windows server that runs Apache and PHP. There is a web service running on that machine that basically gets all the content of the document and runs the content through preg_match_all("/\S+/", $string, $matches); return count($matches[0]);. Works pretty well but it's not at all accurate. So we wrote the following macro:

Sub GetWordCountBreakdown()

    Dim x As Integer
    Dim TotalWords As Long
    Dim FieldWords As Long

    TotalWords = ActiveDocument.ComputeStatistics(wdStatisticWords)

    For x = 1 To ActiveDocument.Fields.Count
        If ActiveDocument.Fields.Item(x).Result.ComputeStatistics(wdStatisticWords) > 25 Then
            FieldWords = FieldWords + ActiveDocument.Fields.Item(x).Result.ComputeStatistics(wdStatisticWords)
        End If
    Next x

    MsgBox (TotalWords & " - " & FieldWords & " = " & TotalWords - FieldWords)

End Sub`

When I run this macro in Word, it gives me a neat little alert box counting up all the words and references in the document. I'm not sure how to return those values to PHP so my webservice can convey them back to me.

Update: I was able to just rewrite this macro in PHP and get the correct wordcount. Basically:

$word = new COM("Word.Application")
$word->Documents->Open(file);
$wdStatisticWords = 0;
$wordcount = $word->ActiveDocument->ComputeStatistics($wdStatisticWords);

etc.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

脱离于你 2024-10-28 00:18:25

如果您可以读取文档文件的 OLE 流,则文档的准确字数应该存储在 SummaryInformation 或 DocumentSummaryInformation 流中。我没有从 .doc 文件读取属性的脚本,但我有用于读取 Excel xls 文件的元属性的代码,这些代码可以相当轻松地进行调整。

编辑

我刚刚检查过,SummaryInformation 流中的属性 ID 为 0x0F。

If you can read the OLE streams for the doc file, an accurate wordcount for the document should be stored in either the SummaryInformation or the DocumentSummaryInformation stream. I don't have a script that reads the properties from .doc files, but I do have code for reading the metaproperties of Excel xls files that could be adapted fairly easily.

EDIT

I've just checked, and it's property id 0x0F in the SummaryInformation stream.

流殇 2024-10-28 00:18:25

为什么不简单地计算文档字符串中的空格数?或者我错过了什么?

Why not simply count the number of spaces in the doc string? Or am I missing something?

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文