文江博客开发文档 Netkiller Python 手札文章详情

文章来源于网络收集而来，版权归原创者所有，如有侵权请及时联系！

24.2. Word 文字处理

发布于 2024-02-10 15:26:30 字数 13492 浏览 0 评论 0 收藏 0

24.2. Word 文字处理

Python-docx 是一款针对于word文档处理的一个模块，它可以创建word文档，遍历word文档，以及修改word文档。

主要应用场景是文档生成，文档转换，文档分析等等。

举例一、招聘网站自动生成Word文档简历，使用 python-docx 将用户输入的简历内容，自动拼装成 word 简历，并下载。

举例二、Word 文档差异对比，例如合同修改后，产生两个版本的 word 文档，我们不知道那一行，或者那一个字做了修改，人工核对费时费力，我们就可以写一个程序，逐行逐字核对，并将差异文字显示出来。

24.2.1. 安装

pip3 install python-docx

24.2.2. 创建空白文档

from docx import Document
document = Document()
document.save('new.docx')

24.2.3. 添加标题

from docx import Document

# 创建文档对象
document = Document()

# 标题
document.add_heading('标题一', 0)
document.add_heading('标题二', 1)
document.add_heading('标题三', 2)
document.add_heading('标题四', 3)
document.add_heading('标题五', 4)
document.add_heading('标题六', 5)
document.add_heading('标题七', 6)
document.add_heading('标题八', 7)
document.add_heading('标题九', 8)
document.add_heading('标题十', 9)

# 保存 word 文档
document.save('heading.docx')

设置对齐

from docx import Document
from docx.enum.text import WD_ALIGN_PARAGRAPH, WD_LINE_SPACING

document = Document()
head = document.add_heading('Netkiller Python 手札', level=0)
head.alignment = WD_ALIGN_PARAGRAPH.CENTER  # 居中

document.add_heading('Python 入门', 1)
document.add_heading('安装 Python 运行环境', 2)
document.add_heading('Python 语法', 1)
document.add_heading('if 判断', 2)
document.add_heading('for 循环', 2)

document.save('Netkiller Python 手札.docx')

设置字体大小

from docx.enum.text import WD_ALIGN_PARAGRAPH,WD_LINE_SPACING
head = document.add_heading('level=1') #添加一级标题
run = head.add_run('这是标题')
run.font.size = Pt(24)
head.alignment = WD_ALIGN_PARAGRAPH.CENTER #居中

24.2.4. 添加段落

from docx import Document

# 创建文档对象
document = Document()

# 添加标题
document.add_heading('什么是多维度架构', 0)
document.add_heading('架构师的大局观', 1)
# 添加段落
document.add_paragraph(
    '我们从小的教育就是如何拆分问题、解决问题，这样做显然会使复杂的问题变得更容易些。但是这带来一个新问题，我们丧失了如何从宏观角度看问题，分析问题，解决问题，对更大的整体的内在领悟能力。这导致了我们对现有问题提出的解决方案，但无法预计实施该方案后产生的各种后果，为此我们付出了巨大代价。')
document.add_paragraph('而我们试图考虑大局的时候，总要在脑子里重新排序，组合哪些拆分出来问题，给它们编组列单。'
                       '习惯性认为解决了所有微观领域的问题，那么宏观上问题就得到了解决。'
                       '然而，这种做法是徒劳无益的，就好比试图通过重新拼起来的碎镜子来观察真实的影像。')
document.add_paragraph('所以在一段时间后，我们便干脆完全放弃了对整体的关注。')

document.add_heading('面临的问题', 2)

document.add_paragraph(
    '当今的社会，几乎所有的企业情况都是岗位职责清晰，分工明确，员工是企业机器上的一颗螺丝钉，我们在招聘下属的时候也仅仅是用他的一技之长。项目一旦立项，我们就根据项目需求针对性性的招聘，短短半年团队就会膨胀数倍，但效率并不是成正比增长。另一个问题是这个庞大的团队合作起来并不尽人意。结果是 80 % 协调的时间，20 % 实际工作时间。')

# 保存文档
document.save('paragraph.docx')

插入段落

paragraph = document.add_paragraph('明天去上班。')
在此段落之前插入一个段落，如下：
pre_paragraph = paragraph.insert_paragraph_before('今天休息。')

段落对齐

paragraph = document.add_paragraph("段落对齐方式")
paragraph_format = paragraph.paragraph_format
paragraph_format.alignment = WD_ALIGN_PARAGRAPH.CENTER

段落缩进设置

from docx.shared import Inches

paragraph = document.add_paragraph("段落缩进")
paragraph_format = paragraph.paragraph_format
paragraph_format.left_indent = Inches(0.5)

首行缩进：

paragraph_format.first_line_indent = Inches(-0.25)

24.2.5. 列表

from docx import Document

# 创建文档
document = Document()

# 添加标题
document.add_heading('Linux', 0)
document.add_heading('桌面发行版', 1)

# 添加列表
document.add_heading('Desktop: ', 2)
document.add_paragraph('Ubuntu', style='List Bullet')
document.add_paragraph('Debian', style='List Bullet')
document.add_paragraph('Fedora', style='List Bullet')

# 保存文档
document.save('list.docx')

编号

document.add_paragraph(
    'first item in ordered list', style='List Number'
)

编号

from docx.enum.style import WD_STYLE_TYPE
from docx import Document

document = Document()
styles = document.styles

# 遍历所有表样式
for s in styles:
    if s.type == WD_STYLE_TYPE.TABLE:
        document.add_paragraph("表格样式 :  " + s.name)
        table = document.add_table(3, 3, style=s)
        heading_cells = table.rows[0].cells
        heading_cells[0].text = '第一列内容'
        heading_cells[1].text = '第二列内容'
        heading_cells[2].text = '第三列内容'
        document.add_paragraph("\n")

document.save('所有表格样式演示.docx')

24.2.6. 表格

from docx import Document

document = Document()

document.add_heading('表格演示', 1)

# 表格
table = document.add_table(rows=3, cols=2, style='Table Grid')

# 表头
thead = table.rows[0].cells
thead[0].text = '姓名'
thead[1].text = '年龄'

# 表体
tbody = table.rows[1].cells
tbody[0].text = 'Neo'
tbody[1].text = '22'

tbody = table.rows[2].cells
tbody[0].text = 'Jam'
tbody[1].text = '33'

document.add_paragraph('2020年4月8日 netkiller 制表')
# 保存
document.save('table.docx')

24.2.7. 添加图片

document.add_picture('demo.png')

指定图片宽度，高度会自适应

# 图片
document.add_picture('pic.jpg', width=Inches(1))

指定图片宽度和高度

from docx.shared import Inches
document.add_picture('demo.png', width=Inches(1.0), height=Inches(1.0))

24.2.8. 强制分页

# 分页
# document.add_page_break()

24.2.9. 样式

24.2.9.1. 对齐

# LEFT      =>  左对齐
# CENTER    =>  文字居中
# RIGHT     =>  右对齐
# JUSTIFY   =>  文本两端对齐

标题对齐

style = document.styles['Normal']
# 标题
h0 = document.add_heading('标题0', 0)
# 居中
h0.alignment = WD_PARAGRAPH_ALIGNMENT.CENTER

段落对齐

from docx.enum.text import WD_ALIGN_PARAGRAPH
paragraph = document.add_paragraph("Netkiller Python 手札")
paragraph_format = paragraph.paragraph_format
paragraph_format.alignment = WD_ALIGN_PARAGRAPH.CENTER

24.2.9.2. 首行缩进

# 首行缩进两个字符
paragraph_format = style.paragraph_format
paragraph_format.first_line_indent = Cm(0.74)

24.2.9.3. 段落间距

设置值用 Pt 单位是磅，如下：

# 段前
paragraph_format.space_before = Pt(18)
# 段后
paragraph_format.space_after = Pt(12)

24.2.9.4. 行间距

设置值用 Pt 单位是磅，如下：

当行距为最小值和固定值时，设置值单位为磅，需要用 Pt ；

#SINGLE         =>  单倍行距（默认）
#ONE_POINT_FIVE =>  1.5倍行距
#DOUBLE2        =>  倍行距
#AT_LEAST       =>  最小值
#EXACTLY        =>  固定值
#MULTIPLE       =>  多倍行距

当行距为多倍行距时，设置值为数值，如下：

from docx.shared import Length
paragraph.line_spacing_rule = WD_LINE_SPACING.EXACTLY 	#固定值
paragraph_format.line_spacing = Pt(18) 					# 固定值18磅
paragraph.line_spacing_rule = WD_LINE_SPACING.MULTIPLE 	#多倍行距
paragraph_format.line_spacing = 1.75 					# 1.75倍行间距

24.2.9.5. 粗体，斜体

# 段落
p1 = document.add_paragraph('样式演示')
# 字体加粗
p1.add_run('字体加粗').bold = True
# 斜体
p1.add_run('设置斜体').italic = True

24.2.9.6. 字体大小

p2 = document.add_paragraph('正常尺寸')
run = p2.add_run('设置字体大小12Pt ')
# 设置字体大小
run.font.size = Pt(12)

24.2.9.7. 查看段落样式

查看paragraph有哪些样式

from docx import Document
from docx.enum.style import WD_STYLE_TYPE
document = Document()
for i in document.styles:
if i.type == WD_STYLE_TYPE.PARAGRAPH:
print(i.name)
>>>
Normal
Heading 1
Heading 2
Heading 3
Heading 4
Heading 5
Heading 6
Heading 7
Heading 8
Heading 9
No Spacing
Title
Subtitle
List Paragraph
Body Text
Body Text 2
Body Text 3
List
List 2
List 3
List Bullet
List Bullet 2
List Bullet 3
List Number
List Number 2
List Number 3
List Continue
List Continue 2
List Continue 3
macro
Quote
Caption
Intense Quote
TOC Heading

24.2.9.8. 文档样式

查看文档有哪些样式

from docx import Document
from docx.enum.style import WD_STYLE_TYPE
document = Document()
for i in document.styles:
if i.type == WD_STYLE_TYPE.CHARACTER:
print(i.name)
>>>
Default Paragraph Font
Heading 1 Char
Heading 2 Char
Heading 3 Char
Title Char
Subtitle Char
Body Text Char
Body Text 2 Char
Body Text 3 Char
Macro Text Char
Quote Char
Heading 4 Char
Heading 5 Char
Heading 6 Char
Heading 7 Char
Heading 8 Char
Heading 9 Char
Strong
Emphasis
Intense Quote Char
Subtle Emphasis
Intense Emphasis
Subtle Reference
Intense Reference
Book Title

24.2.9.9. 自动分页设置

#widow_control      =>  孤行控制，防止在页面顶端单独打印段落末行或在页面底端单独打印段落首行。
#keep_with_next     =>  与下段同页，防止在选中段落与后面一段间插入分页符。
#page_break_before  =>  段前分页，防止在段落中出现分页符。
#keep_together      =>  段中不分页，在选中段落前插入分页符。

paragraph_format.keep_with_next = True

24.2.9.10. 样式演示

from docx import Document
from docx.shared import Inches
from docx.enum.text import WD_PARAGRAPH_ALIGNMENT
from docx.shared import Cm, Pt

# 创建文档
document = Document()
style = document.styles['Normal']
document.add_heading('标题剧中', 1)
# 首行缩进两个字符
paragraph_format = style.paragraph_format
paragraph_format.first_line_indent = Cm(0.74)
# 段落
p1 = document.add_paragraph(
    '我们从小的教育就是如何拆分问题、解决问题，这样做显然会使复杂的问题变得更容易些。')
# 加粗
p1.add_run('但是这带来一个新问题，我们丧失了如何从宏观角度看问题，分析问题，解决问题，对更大的整体的内在领悟能力。').bold = True
# 斜体
p1.add_run('这导致了我们对现有问题提出的解决方案，但无法预计实施该方案后产生的各种后果，为此我们付出了巨大代价。').italic = True

# 段落
p2 = document.add_paragraph('所以在一段时间后，')
# 字体大小
p2.add_run('我们便干脆完全放弃了对整体的关注。').font.size = Pt(40)

document.save('test.docx')

24.2.10. 演示例子

24.2.10.1. 官方演示例子

from docx import Document
from docx.shared import Inches

document = Document()

document.add_heading('Document Title', 0)

p = document.add_paragraph('A plain paragraph having some ')
p.add_run('bold').bold = True
p.add_run(' and some ')
p.add_run('italic.').italic = True

document.add_heading('Heading, level 1', level=1)
document.add_paragraph('Intense quote', style='Intense Quote')

document.add_paragraph(
    'first item in unordered list', style='List Bullet'
)
document.add_paragraph(
    'first item in ordered list', style='List Number'
)

document.add_picture('monty-truth.png', width=Inches(1.25))

records = (
    (3, '101', 'Spam'),
    (7, '422', 'Eggs'),
    (4, '631', 'Spam, spam, eggs, and spam')
)

table = document.add_table(rows=1, cols=3)
hdr_cells = table.rows[0].cells
hdr_cells[0].text = 'Qty'
hdr_cells[1].text = 'Id'
hdr_cells[2].text = 'Desc'
for qty, id, desc in records:
    row_cells = table.add_row().cells
    row_cells[0].text = str(qty)
    row_cells[1].text = id
    row_cells[2].text = desc

document.add_page_break()

document.save('demo.docx')

24.2.10.2. 完整的演示例子

from docx import Document
from docx.enum.text import WD_ALIGN_PARAGRAPH, WD_LINE_SPACING

document = Document()
head = document.add_heading('Netkiller Python 手札', level=0)
head.alignment = WD_ALIGN_PARAGRAPH.CENTER  # 居中
document.add_picture('netkiller.png')
document.add_heading('Python 入门', 1)
document.add_paragraph('Python 是脚本语言的一种，可以运行在Linux,Mac,Window操作系统之上。')
document.add_heading('安装 Python 运行环境', 2)
document.add_paragraph('前往 python.org 网站，下载安装包，根据提示安装。')
document.add_paragraph(
    'Linux : https://www.python.org/downloads/release/python-395/', style='List Bullet')
document.add_paragraph(
    'Windows ：https://www.python.org/ftp/python/3.9.5/python-3.9.5-amd64.exe', style='List Bullet')
document.add_paragraph(
    'MacOS : https://www.python.org/ftp/python/3.9.5/python-3.9.5-macosx10.9.pkg', style='List Bullet')
document.add_heading('Python 语法', 1)
document.add_heading('if 判断', 2)
document.add_heading('for 循环', 2)

document.add_heading('文档修订历史记录', 1)
# 表格
table = document.add_table(rows=3, cols=3, style='Table Grid')

# 表头
thead = table.rows[0].cells
thead[0].text = '版本'
thead[1].text = '作者'
thead[2].text = '描述'

# 表体
tbody = table.rows[1].cells
tbody[0].text = 'v1.0'
tbody[1].text = 'Neo'
tbody[2].text = '创建文档'

tbody = table.rows[2].cells
tbody[0].text = 'V2.0'
tbody[1].text = 'Jam'
tbody[2].text = '添加图片'

document.save('Netkiller Python 手札.docx')

24.2.11. 另存操作

from docx import Document
document = Document('原始文档.docx')
document.save('修改后的文档.docx')

24.2.12. 读取 Word 文档

from docx import Document

# 打开已存在文档
document = Document('test.docx')

# 读取标题、段落、列表内容
paras = [ paragraph.text for paragraph in document.paragraphs]
for p in paras:
    print(p)

# 读取表格内容
tables = [table for table in document.tables]
for t in tables:
    for row in t.rows:
        for cell in row.cells:
            print(cell.text, end=' ')
        print()

24.2.12.1. 风格筛选

from docx import Document

document = Document('Netkiller Python 手札.docx')

# 读取文中所有段落内容
for p in document.paragraphs:
    print("paragraphs：", p.text)

# 读取文中所有标题一
for p in document.paragraphs:
    if p.style.name == 'Heading 1':
        print("Heading 1：", p.text)

# 读取文中所有标题二
for p in document.paragraphs:
    if p.style.name == 'Heading 2':
        print("Heading 2：", p.text)

# 读取文中正文
for p in document.paragraphs:
    if p.style.name == 'Normal':
        print("Normal：", p.text)

运行结果

neo@MacBook-Pro-Neo ~/workspace/python % python3.9 /Users/neo/workspace/python/office/netkiller.py
paragraphs： Netkiller Python 手札
paragraphs： Python 入门
paragraphs： Python 是脚本语言的一种，可以运行在Linux,Mac,Window操作系统之上。
paragraphs： 安装 Python 运行环境
paragraphs： 前往 python.org 网站，下载安装包，根据提示安装。
paragraphs： Linux : https://www.python.org/downloads/release/python-395/
paragraphs： Windows ：https://www.python.org/ftp/python/3.9.5/python-3.9.5-amd64.exe
paragraphs： MacOS : https://www.python.org/ftp/python/3.9.5/python-3.9.5-macosx10.9.pkg
paragraphs： Python 语法
paragraphs： if 判断
paragraphs： for 循环
Heading 1： Python 入门
Heading 1： Python 语法
Heading 2： 安装 Python 运行环境
Heading 2： if 判断
Heading 2： for 循环
Normal： Python 是脚本语言的一种，可以运行在Linux,Mac,Window操作系统之上。
Normal： 前往 python.org 网站，下载安装包，根据提示安装。

24.2.13. Word 模版合并

docx-mailmerge

24.2.13.1. 安装 docx-mailmerge

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

列表为空，暂无数据

24.2. Word 文字处理

24.2. Word 文字处理

24.2.1. 安装

24.2.2. 创建空白文档

24.2.3. 添加标题

24.2.4. 添加段落

24.2.5. 列表

24.2.6. 表格

24.2.7. 添加图片

24.2.8. 强制分页

24.2.9. 样式

24.2.9.1. 对齐

24.2.9.2. 首行缩进

24.2.9.3. 段落间距

24.2.9.4. 行间距

24.2.9.5. 粗体，斜体

24.2.9.6. 字体大小

24.2.9.7. 查看段落样式

24.2.9.8. 文档样式

24.2.9.9. 自动分页设置

24.2.9.10. 样式演示

24.2.10. 演示例子

24.2.10.1. 官方演示例子

24.2.10.2. 完整的演示例子

24.2.11. 另存操作

24.2.12. 读取 Word 文档

24.2.12.1. 风格筛选

24.2.13. Word 模版合并

24.2.13.1. 安装 docx-mailmerge

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。