如何将表与表转换为CSV并输出为JSON
我有以下代码可以从Word Doc提取表并从表中创建CSV文件的列表:
from docx import Document
import pandas as pd
import csv
import json
import time
document = Document('pathtoFile')
tables = []
for table in document.tables:
df = [['' for i in range(len(table.columns))] for j in range(len(table.rows))]
for i, row in enumerate(table.rows):
for j, cell in enumerate(row.cells):
if cell.text:
df[i][j] = cell.text
tables.append(pd.DataFrame(df))
for nr, i in enumerate(tables):
i.to_csv("table_" + str(nr) + ".csv")
我还具有以下脚本来获取CSV文件并将其提取到JSON:
import csv
import json
import time
def csv_to_json(csvFilePath, jsonFilePath):
jsonArray = []
#read csv file
with open(csvFilePath, encoding='utf-8', errors='ignore') as csvf:
#load csv file data using csv library's dictionary reader
csvReader = csv.DictReader(csvf)
#convert each csv row into python dict
for row in csvReader:
#add this python dict to json array
jsonArray.append(row)
#convert python jsonArray to JSON String and write to file
with open(jsonFilePath, 'w', encoding='utf-8') as jsonf:
jsonString = json.dumps(jsonArray, indent=4)
jsonf.write(jsonString)
csvFilePath = r'pathtoFile'
jsonFilePath = r'pathtoFile'
start = time.perf_counter()
csv_to_json(csvFilePath, jsonFilePath)
finish = time.perf_counter()
print(f"Conversion completed successfully in {finish - start:0.4f} seconds")
主要问题是组合两者并弄清楚。如何使用桌子将Word Document取用,将其提取到CSV,然后将CSV提取并转换为JSON。我可能会过度复杂化,但对建议开放。
I have the following code to extract tables from a word doc and create a list of csv files from the tables:
from docx import Document
import pandas as pd
import csv
import json
import time
document = Document('pathtoFile')
tables = []
for table in document.tables:
df = [['' for i in range(len(table.columns))] for j in range(len(table.rows))]
for i, row in enumerate(table.rows):
for j, cell in enumerate(row.cells):
if cell.text:
df[i][j] = cell.text
tables.append(pd.DataFrame(df))
for nr, i in enumerate(tables):
i.to_csv("table_" + str(nr) + ".csv")
I also have the following script to take a csv file and extract it to JSON:
import csv
import json
import time
def csv_to_json(csvFilePath, jsonFilePath):
jsonArray = []
#read csv file
with open(csvFilePath, encoding='utf-8', errors='ignore') as csvf:
#load csv file data using csv library's dictionary reader
csvReader = csv.DictReader(csvf)
#convert each csv row into python dict
for row in csvReader:
#add this python dict to json array
jsonArray.append(row)
#convert python jsonArray to JSON String and write to file
with open(jsonFilePath, 'w', encoding='utf-8') as jsonf:
jsonString = json.dumps(jsonArray, indent=4)
jsonf.write(jsonString)
csvFilePath = r'pathtoFile'
jsonFilePath = r'pathtoFile'
start = time.perf_counter()
csv_to_json(csvFilePath, jsonFilePath)
finish = time.perf_counter()
print(f"Conversion completed successfully in {finish - start:0.4f} seconds")
The main issue is combining the two and figuring out how to go about taking the word document with the tables, extracting them to csv's, then taking the csv and converting to JSON. I may be overcomplicating this but open to suggestions.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论