Python xlrd数据提取

发布于 2024-09-24 13:42:09 字数 879 浏览 8 评论 0原文

我正在使用 python xlrd http://scienceoss.com/read-excel-files- from-python/ 从 Excel 工作表中读取数据

我的问题是,如果我在 Excel 工作表中读取第一个单元格为“员工姓名”的行,

并且还有另一行的第一个单元格为“员工姓名”

如何我们可以读取从第一个单元格中具有“员工姓名”的最后一行开始的最后一列吗?忽略前面的

  wb = xlrd.open_workbook(file,encoding_override="cp1252") 
  wb.sheet_names()
  sh =  wb.sheet_by_index(0)
  num_of_rows = sh.nrows
  num_of_cols = sh.ncols
  valid_xl_format = 0
  invalid_xl_format = 0

  if(num_of_rows != 0):
     for i in range(num_of_rows):
        questions_dict = {}
        for j in range(num_of_cols):
              xl_data=sh.cell(i,j).value
              if ((xl_data == "Employee name")):
                  # Regardless of how many "Employee name" found in rows first cell,Read only the last "Employee name"

I am using python xlrd http://scienceoss.com/read-excel-files-from-python/ to read data from an excel sheet

My question is if i read a row with first cell as "Employee name" in the excel sheet

And there is another row named whose first cell is "Employee name"

How can we read the last column starting with the last row which has "Employee name" in the first cell.Ignoring the previous

  wb = xlrd.open_workbook(file,encoding_override="cp1252") 
  wb.sheet_names()
  sh =  wb.sheet_by_index(0)
  num_of_rows = sh.nrows
  num_of_cols = sh.ncols
  valid_xl_format = 0
  invalid_xl_format = 0

  if(num_of_rows != 0):
     for i in range(num_of_rows):
        questions_dict = {}
        for j in range(num_of_cols):
              xl_data=sh.cell(i,j).value
              if ((xl_data == "Employee name")):
                  # Regardless of how many "Employee name" found in rows first cell,Read only the last "Employee name"

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

烟花肆意 2024-10-01 13:42:09

我正在使用 python xlrd http://scienceoss.com/read-excel- files-from-python/ 从 Excel 工作表中读取数据

您需要考虑自己在做什么,而不是抓取一些博客代码并留下完全不相关的内容,例如 wb.sheet_names () 并省略与您的要求非常相关的部分,例如 first_column = sh.col_values(0)

以下是如何找到 A 列(第一列)中最后一个“无论什么”的 row_index -- 未经测试:

import xlrd
wb = xlrd.open_workbook(file_name)
# Why do you think that you need to use encoding_overide?
sheet0 = wb.sheet_by_index(0)
tag = u"Employee name" # or u"Emp name" or ...
column_0_values = sheet0.col_values(colx=0)
try:
    max_tag_row_index = column_0_values.rindex(tag)
    print "last tag %r found at row_index %d" % (
        tag, max_tag_row_index)
except IndexError:
    print "tag %r not found" % tag

现在我们需要解释“我们如何读取从具有“员工姓名”的最后一行开始的最后一列“在第一个单元格中

假设“最后一列”意味着column_index ==sheet0.ncols - 1,那么:

last_colx = sheet0.ncols - 1
required_values = sheet0.col_values(colx=last_colx, start_rowx=max_tag_row_index)
required_cells = sheet0.col_slice(colx=last_colx, start_rowx=max_tag_row_index)
# choose one of the above 2 lines, depending on what you need to do

如果这不是你的意思(这很可能,因为它忽略了一大堆数据(为什么您只想读取最后一列?),请尝试用示例解释您的意思,

您可能想迭代剩余的单元格:

for rowx in xrange(max_tag_row_index, sheet0.nrows): # or max_tag_row_index + 1
    for colx in xrange(0, sheet0.ncols):
        do_something_with_cell_object(sheet0.cell(rowx, colx))

I am using python xlrd http://scienceoss.com/read-excel-files-from-python/ to read data from an excel sheet

You need to think about what you are doing, instead of grabbing some blog code and leaving in totally irrelevant stuff like wb.sheet_names() and omitting parts very relevant to your requirement like first_column = sh.col_values(0).

Here's how to find the row_index of the last "whatever" in column A (the first column) -- untested:

import xlrd
wb = xlrd.open_workbook(file_name)
# Why do you think that you need to use encoding_overide?
sheet0 = wb.sheet_by_index(0)
tag = u"Employee name" # or u"Emp name" or ...
column_0_values = sheet0.col_values(colx=0)
try:
    max_tag_row_index = column_0_values.rindex(tag)
    print "last tag %r found at row_index %d" % (
        tag, max_tag_row_index)
except IndexError:
    print "tag %r not found" % tag

Now we need to interpret "How can we read the last column starting with the last row which has "Employee name" in the first cell"

Assuming that "the last column" means the one with column_index == sheet0.ncols - 1, then:

last_colx = sheet0.ncols - 1
required_values = sheet0.col_values(colx=last_colx, start_rowx=max_tag_row_index)
required_cells = sheet0.col_slice(colx=last_colx, start_rowx=max_tag_row_index)
# choose one of the above 2 lines, depending on what you need to do

If that's not what you mean (which is quite possible as it is ignoring a whole bunch of data (why do you want to read only the last column?), please try to explain with examples what you do mean.

Possibly you want to iterate over the remaining cells:

for rowx in xrange(max_tag_row_index, sheet0.nrows): # or max_tag_row_index + 1
    for colx in xrange(0, sheet0.ncols):
        do_something_with_cell_object(sheet0.cell(rowx, colx))
你的笑 2024-10-01 13:42:09

很难准确理解你在问什么。
发布示例数据可能有助于使您的意图更加清晰。

您是否尝试过反向迭代数据集?,例如:

for i in reversed(range(num_of_rows)):
    ...
    if xl_data == "Employee name":
        # do something 
        # then break since you've found the final "Employee Name"
        break

It's difficult to understand exactly what you're asking.
Posting sample data might help make your intent more clear.

Have you tried iterating over the dataset in reverse?, e.g.:

for i in reversed(range(num_of_rows)):
    ...
    if xl_data == "Employee name":
        # do something 
        # then break since you've found the final "Employee Name"
        break
野稚 2024-10-01 13:42:09

就我而言,除了 pandas 之外,我没有使用任何其他库来读取 xls 文件,这解决了我的问题。

import pandas as pd
data = pd.read_html('file.xls')

In my case, I didn't use any other library than pandas to read xls files and this solved my problem.

import pandas as pd
data = pd.read_html('file.xls')
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文