如何将表与表转换为CSV并输出为JSON

发布于 2025-01-30 06:30:21 字数 1648 浏览 3 评论 0原文

我有以下代码可以从Word Doc提取表并从表中创建CSV文件的列表:

from docx import Document
import pandas as pd
import csv 
import json
import time
document = Document('pathtoFile')

tables = []
for table in document.tables:
    df = [['' for i in range(len(table.columns))] for j in range(len(table.rows))]
    for i, row in enumerate(table.rows):
        for j, cell in enumerate(row.cells):
            if cell.text:
                df[i][j] = cell.text
    tables.append(pd.DataFrame(df))

    for nr, i in enumerate(tables):
        i.to_csv("table_" + str(nr) + ".csv")

我还具有以下脚本来获取CSV文件并将其提取到JSON:

import csv 
import json
import time

def csv_to_json(csvFilePath, jsonFilePath):
    jsonArray = []
      
    #read csv file
    with open(csvFilePath, encoding='utf-8', errors='ignore') as csvf: 
        #load csv file data using csv library's dictionary reader
        csvReader = csv.DictReader(csvf) 

        #convert each csv row into python dict
        for row in csvReader: 
            #add this python dict to json array
            jsonArray.append(row)
  
    #convert python jsonArray to JSON String and write to file
    with open(jsonFilePath, 'w', encoding='utf-8') as jsonf: 
        jsonString = json.dumps(jsonArray, indent=4)
        jsonf.write(jsonString)
          
csvFilePath = r'pathtoFile'
jsonFilePath = r'pathtoFile'

start = time.perf_counter()
csv_to_json(csvFilePath, jsonFilePath)
finish = time.perf_counter()

print(f"Conversion completed successfully in {finish - start:0.4f} seconds")

主要问题是组合两者并弄清楚。如何使用桌子将Word Document取用,将其提取到CSV,然后将CSV提取并转换为JSON。我可能会过度复杂化,但对建议开放。

I have the following code to extract tables from a word doc and create a list of csv files from the tables:

from docx import Document
import pandas as pd
import csv 
import json
import time
document = Document('pathtoFile')

tables = []
for table in document.tables:
    df = [['' for i in range(len(table.columns))] for j in range(len(table.rows))]
    for i, row in enumerate(table.rows):
        for j, cell in enumerate(row.cells):
            if cell.text:
                df[i][j] = cell.text
    tables.append(pd.DataFrame(df))

    for nr, i in enumerate(tables):
        i.to_csv("table_" + str(nr) + ".csv")

I also have the following script to take a csv file and extract it to JSON:

import csv 
import json
import time

def csv_to_json(csvFilePath, jsonFilePath):
    jsonArray = []
      
    #read csv file
    with open(csvFilePath, encoding='utf-8', errors='ignore') as csvf: 
        #load csv file data using csv library's dictionary reader
        csvReader = csv.DictReader(csvf) 

        #convert each csv row into python dict
        for row in csvReader: 
            #add this python dict to json array
            jsonArray.append(row)
  
    #convert python jsonArray to JSON String and write to file
    with open(jsonFilePath, 'w', encoding='utf-8') as jsonf: 
        jsonString = json.dumps(jsonArray, indent=4)
        jsonf.write(jsonString)
          
csvFilePath = r'pathtoFile'
jsonFilePath = r'pathtoFile'

start = time.perf_counter()
csv_to_json(csvFilePath, jsonFilePath)
finish = time.perf_counter()

print(f"Conversion completed successfully in {finish - start:0.4f} seconds")

The main issue is combining the two and figuring out how to go about taking the word document with the tables, extracting them to csv's, then taking the csv and converting to JSON. I may be overcomplicating this but open to suggestions.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。
列表为空,暂无数据
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文