如何获得。从目录中以.docx和.doc结尾的文件字符的字符,并将每个文件的字符除以65,然后将它们保存到XLSX
我有一个以.doc和.docx结尾的许多Word文档文件的文件夹。
此代码仅适用于.docx 我想要.doc的图像
import docx
import os
charCounts = {}
directory = os.fsencode('.')
for file in os.listdir(directory):
filename = os.fsdecode(file)
if filename.endswith(".docx"):
#filename = os.path.join(directory, filename)
doc = docx.Document(filename)
chars = sum(len(p.text) for p in doc.paragraphs)
charCounts[filename] = chars / 65
# uses openpyxl package
from openpyxl import Workbook
wb = Workbook()
ws = wb.active
ws.cell(row=1, column=2, value='File Name')
ws.cell(row=1, column=4, value='chars/65')
for i, x in enumerate(charCounts):
ws.cell(row=i + 3, column=2, value=x)
ws.cell(row=i + 3, column=4, value=charCounts[x])
ws.cell(row=len(charCounts) + 3, column=4, value=sum(charCounts.values()))
path = './charCounts.xlsx'
wb.save(path)
: -
我希望它们像这样发生:
在这里注意两件事。
Excel表中的文件名已安排在数字上。
第二件事是在Excel表中,已删除了文件扩展名。我想要那样。
I have a folder of many word document files ending with .doc and .docx.
This code is working only for .docx
I want this for .doc also
import docx
import os
charCounts = {}
directory = os.fsencode('.')
for file in os.listdir(directory):
filename = os.fsdecode(file)
if filename.endswith(".docx"):
#filename = os.path.join(directory, filename)
doc = docx.Document(filename)
chars = sum(len(p.text) for p in doc.paragraphs)
charCounts[filename] = chars / 65
# uses openpyxl package
from openpyxl import Workbook
wb = Workbook()
ws = wb.active
ws.cell(row=1, column=2, value='File Name')
ws.cell(row=1, column=4, value='chars/65')
for i, x in enumerate(charCounts):
ws.cell(row=i + 3, column=2, value=x)
ws.cell(row=i + 3, column=4, value=charCounts[x])
ws.cell(row=len(charCounts) + 3, column=4, value=sum(charCounts.values()))
path = './charCounts.xlsx'
wb.save(path)
Images:-
I want them to happen like these:
Notice two things here.
File names in excel sheet have been arranged number-wise.
Second thing is in excel sheet, the file extensions have been removed. I want it Like that.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
这是您问题中代码的更新,它将按照我的要求进行操作:
说明:
.docx
的每个文件的说明:以
。 ,存储字符计数(除以65)用文件名作为键temp_converted_word_file _
开始的文件以.docx
> .docxCharcount
在.doc
中的每个文件中的键pywin32
win32扩展程序的软件包将其转换为.docx
用temp_converted_word_word_file _
预先添加到文件名中,然后存储字符数(然后分配65) 上述词典中的键charcounts
将内容存储在Excel文件中,小心截断.doc
或.docx
从filename中的后缀钥匙。Here is an update to the code in your question which will do what I believe you have asked:
Explanation:
.docx
except those starting withTEMP_CONVERTED_WORD_FILE_
, store character count (divided by 65) by filename as key in a dictionarycharCount
.doc
, use thepywin32
package of Win32 extensions to convert it to a.docx
file withTEMP_CONVERTED_WORD_FILE_
prepended to the filename, then store character count (divided by 65) by its original filename as key in the same dictionary as abovecharCounts
dictionary with one that has insertion order by the filename keycharCounts
storing the contents in an Excel file, taking care to truncate the.doc
or.docx
suffix from the filename key.