当前位置：文江博客话题详情

Python Excel doc python-3.x python-docx

如何获得。从目录中以.docx和.doc结尾的文件字符的字符，并将每个文件的字符除以65，然后将它们保存到XLSX

发布于 2025-02-12 19:20:05 字数 1329 浏览 1 评论 0原文

我有一个以.doc和.docx结尾的许多Word文档文件的文件夹。

此代码仅适用于.docx 我想要.doc的图像

import docx
import os

charCounts = {}
directory = os.fsencode('.')
for file in os.listdir(directory):
    filename = os.fsdecode(file)
    if filename.endswith(".docx"):
        #filename = os.path.join(directory, filename)
        doc = docx.Document(filename)
        chars = sum(len(p.text) for p in doc.paragraphs)
        charCounts[filename] = chars / 65

# uses openpyxl package
from openpyxl import Workbook
wb = Workbook()
ws = wb.active

ws.cell(row=1, column=2, value='File Name')
ws.cell(row=1, column=4, value='chars/65')
for i, x in enumerate(charCounts):
    ws.cell(row=i + 3, column=2, value=x)
    ws.cell(row=i + 3, column=4, value=charCounts[x])
    ws.cell(row=len(charCounts) + 3, column=4, value=sum(charCounts.values()))
path = './charCounts.xlsx'
wb.save(path)

： -

我有这样的文件。

我希望它们像这样发生：

在这里注意两件事。

Excel表中的文件名已安排在数字上。

第二件事是在Excel表中，已删除了文件扩展名。我想要那样。

I have a folder of many word document files ending with .doc and .docx.

This code is working only for .docx
I want this for .doc also

import docx
import os

charCounts = {}
directory = os.fsencode('.')
for file in os.listdir(directory):
    filename = os.fsdecode(file)
    if filename.endswith(".docx"):
        #filename = os.path.join(directory, filename)
        doc = docx.Document(filename)
        chars = sum(len(p.text) for p in doc.paragraphs)
        charCounts[filename] = chars / 65

# uses openpyxl package
from openpyxl import Workbook
wb = Workbook()
ws = wb.active

ws.cell(row=1, column=2, value='File Name')
ws.cell(row=1, column=4, value='chars/65')
for i, x in enumerate(charCounts):
    ws.cell(row=i + 3, column=2, value=x)
    ws.cell(row=i + 3, column=4, value=charCounts[x])
    ws.cell(row=len(charCounts) + 3, column=4, value=sum(charCounts.values()))
path = './charCounts.xlsx'
wb.save(path)

Images:-

I have files like these.

I want them to happen like these:

Notice two things here.

File names in excel sheet have been arranged number-wise.

Second thing is in excel sheet, the file extensions have been removed. I want it Like that.

收藏 0

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

评论（1）

牵你的手，一向走下去 2025-02-19 19:20:05

这是您问题中代码的更新，它将按照我的要求进行操作：

# uses python-docx package
import docx
import os

# uses pywin32 package
import win32com.client as win32
from win32com.client import constants
app = win32.gencache.EnsureDispatch('Word.Application')

charCounts = {}
fileDir = '.' # Put the path of the directory to be searched here
os.chdir(fileDir)
cwd = os.getcwd()
directory = os.fsencode(cwd)
for file in os.listdir(directory):
    filename = os.fsdecode(file)
    if filename.startswith('TEMP_CONVERTED_WORD_FILE_'):
        continue
    filenameOrig = None
    if filename.endswith(".doc"):
        filenameOrig = filename
        src_path = os.path.join(cwd, filename)
        src_path_norm = os.path.normpath(src_path)
        doc = app.Documents.Open(src_path_norm)
        doc.Activate()
        docxPath = 'TEMP_CONVERTED_WORD_FILE_' + filename[:-4] + ".docx"
        dest_path = os.path.join(cwd, docxPath)
        dest_path_norm = os.path.normpath(dest_path)
        app.ActiveDocument.SaveAs(dest_path_norm, FileFormat=constants.wdFormatXMLDocument)
        doc.Close(False)
        filename = docxPath
    if filename.endswith(".docx"):
        src_path = os.path.join(cwd, filename)
        src_path_norm = os.path.normpath(src_path)
        doc = docx.Document(src_path_norm)
        chars = sum(len(p.text) for p in doc.paragraphs) + sum(len(p.text) for section in doc.sections for hf in [section.header, section.footer] for p in hf.paragraphs)
        charCounts[filenameOrig if filenameOrig else filename] = chars / 65
charCounts = {k:charCounts[k] for k in sorted(charCounts)}

# uses openpyxl package
from openpyxl import Workbook
wb = Workbook()
ws = wb.active

ws.cell(row=1, column=2, value='File Name')
ws.cell(row=1, column=4, value='chars/65')
for i, x in enumerate(charCounts):
    ws.cell(row=i + 3, column=2, value=x[:-4] if x.endswith('.doc') else x[:-5])
    ws.cell(row=i + 3, column=4, value=charCounts[x])
ws.cell(row=len(charCounts) + 3, column=3, value='Total')
ws.cell(row=len(charCounts) + 3, column=4, value=sum(charCounts.values()))
path = './charCounts.xlsx'
wb.save(path)

说明：

对于.docx的每个文件的说明：以temp_converted_word_file _开始的文件以.docx> .docx。，存储字符计数（除以65）用文件名作为键Charcount在
.doc中的每个文件中的键pywin32 win32扩展程序的软件包将其转换为.docx用temp_converted_word_word_file _预先添加到文件名中，然后存储字符数（然后分配65）上述词典中的键
以其原始文件名作为与通过filename键
通过charcounts将内容存储在Excel文件中，小心截断.doc 或.docx从filename中的后缀钥匙。

Here is an update to the code in your question which will do what I believe you have asked:

# uses python-docx package
import docx
import os

# uses pywin32 package
import win32com.client as win32
from win32com.client import constants
app = win32.gencache.EnsureDispatch('Word.Application')

charCounts = {}
fileDir = '.' # Put the path of the directory to be searched here
os.chdir(fileDir)
cwd = os.getcwd()
directory = os.fsencode(cwd)
for file in os.listdir(directory):
    filename = os.fsdecode(file)
    if filename.startswith('TEMP_CONVERTED_WORD_FILE_'):
        continue
    filenameOrig = None
    if filename.endswith(".doc"):
        filenameOrig = filename
        src_path = os.path.join(cwd, filename)
        src_path_norm = os.path.normpath(src_path)
        doc = app.Documents.Open(src_path_norm)
        doc.Activate()
        docxPath = 'TEMP_CONVERTED_WORD_FILE_' + filename[:-4] + ".docx"
        dest_path = os.path.join(cwd, docxPath)
        dest_path_norm = os.path.normpath(dest_path)
        app.ActiveDocument.SaveAs(dest_path_norm, FileFormat=constants.wdFormatXMLDocument)
        doc.Close(False)
        filename = docxPath
    if filename.endswith(".docx"):
        src_path = os.path.join(cwd, filename)
        src_path_norm = os.path.normpath(src_path)
        doc = docx.Document(src_path_norm)
        chars = sum(len(p.text) for p in doc.paragraphs) + sum(len(p.text) for section in doc.sections for hf in [section.header, section.footer] for p in hf.paragraphs)
        charCounts[filenameOrig if filenameOrig else filename] = chars / 65
charCounts = {k:charCounts[k] for k in sorted(charCounts)}

# uses openpyxl package
from openpyxl import Workbook
wb = Workbook()
ws = wb.active

ws.cell(row=1, column=2, value='File Name')
ws.cell(row=1, column=4, value='chars/65')
for i, x in enumerate(charCounts):
    ws.cell(row=i + 3, column=2, value=x[:-4] if x.endswith('.doc') else x[:-5])
    ws.cell(row=i + 3, column=4, value=charCounts[x])
ws.cell(row=len(charCounts) + 3, column=3, value='Total')
ws.cell(row=len(charCounts) + 3, column=4, value=sum(charCounts.values()))
path = './charCounts.xlsx'
wb.save(path)

Explanation:

For every file with name ending in .docx except those starting with TEMP_CONVERTED_WORD_FILE_, store character count (divided by 65) by filename as key in a dictionary charCount
For every file ending in .doc, use the pywin32 package of Win32 extensions to convert it to a .docx file with TEMP_CONVERTED_WORD_FILE_ prepended to the filename, then store character count (divided by 65) by its original filename as key in the same dictionary as above
Replace the charCounts dictionary with one that has insertion order by the filename key
Iterate through charCounts storing the contents in an Excel file, taking care to truncate the .doc or .docx suffix from the filename key.

回复收藏 0 原文

~没有更多了~

关于作者

暂无简介

文章

评论

30 人气

关注发私信

相关话题

热门标签

操作系统程序设计 IT运维 Linux系统管理 JavaScript 服务器应用 solaris C/C++ PHP Shell BSD Vue.js aix Oracle Python HTML 系统管理 HTML5 CSS 前端

推荐作者

十二

文章 0 评论 0

飞烟轻若梦

文章 0 评论 0

OPleyuhuo

文章 0 评论 0

wxb0109

文章 0 评论 0

旧城空念

文章 0 评论 0

-小熊_

文章 0 评论 0

友情链接

我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的隐私政策了解更多相关信息。单击 接受 或继续使用网站，即表示您同意使用 Cookies 和您的相关数据。

原文