使用 Excel Interop 删除空行

发布于 2024-10-25 04:05:46 字数 1936 浏览 4 评论 0原文

我有用户提供的 Excel 文件需要转换为 PDF。使用 Excel 互操作，我可以使用 .ExportAsFixedFormat() 来完成此操作。当工作簿有数百万行时，我的问题就出现了。这会变成一个包含 50k+ 页的文件。如果工作簿的所有这些行中都有内容那就太好了。不过，每次出现其中一个文件时，可能有 50 行有内容，其余的都是空白。如何删除空行以便将其导出为大小合适的 PDF？

我尝试从最后一行开始，使用 CountA 逐一检查该行是否有内容，如果有，则将其删除。这不仅需要很长时间，而且在大约 100k 行后似乎会失败，并出现以下错误：
无法计算表达式，因为代码已优化或本机框架位于调用堆栈的顶部。
我尝试使用 SpecialCells(XlCellType.xlCellTypeLastCell, XlSpecialCellsValue.xlTextValues) 但如果任何单元格具有格式（如背景颜色），则包含一行。
我尝试过使用 Worksheet.UsedRange ，然后删除之后的所有内容，但 UsedRange 与第二点存在相同的问题。

This is the code I've tried:

for (int i = 0; i < worksheets.Count; i++)
{
    sheet = worksheets[i + 1];
    rows = sheet.Rows;
    currentRowIndex = rows.Count;
    bool contentFound = false;

    while (!contentFound && currentRowIndex > 0)
    {
        currentRow = rows[currentRowIndex];

        if (Application.WorksheetFunction.CountA(currentRow) == 0)
        {
            currentRow.Delete();
        }
        else
        {
            contentFound = true;
        }

        Marshal.FinalReleaseComObject(currentRow);
        currentRowIndex--;
    }

    Marshal.FinalReleaseComObject(rows);
    Marshal.FinalReleaseComObject(sheet);
}

for (int i = 0; i < worksheets.Count; i++)
{
    sheet = worksheets[i + 1];
    rows = sheet.Rows;

    lastCell = rows.SpecialCells(XlCellType.xlCellTypeLastCell, XlSpecialCellsValue.xlTextValues);
    int startRow = lastCell.Row;

    Range range = sheet.get_Range(lastCell.get_Address(RowAbsolute: startRow));
    range.Delete();

    Marshal.FinalReleaseComObject(range);
    Marshal.FinalReleaseComObject(lastCell);
    Marshal.FinalReleaseComObject(rows);
    Marshal.FinalReleaseComObject(sheet);
}

我的代码是否有问题？这是互操作问题还是这只是 Excel 功能的限制？有更好的方法来做我正在尝试的事情吗？

原文

I have user supplied excel files that need to be converted to PDF. Using excel interop, I can do this fine with .ExportAsFixedFormat(). My problem comes up when a workbook has millions of rows. This turns into a file that has 50k+ pages. That would be fine if the workbook had content in all of those rows. Every time one of these files shows up though, there are maybe 50 rows that have content and the rest are blank. How can I go about removing the empty rows so I can export it to a decent sized PDF?

I've tried starting at the end row and, one-by-one, using CountA to check if the row has content and if it does, delete it. Not only does this take forever, this seems to fail after about 100k rows with the following error:
Unable to evaluate expression because the code is optimized or a native frame is on top of the call stack.
I've tried using SpecialCells(XlCellType.xlCellTypeLastCell, XlSpecialCellsValue.xlTextValues) but that includes a row if any cell has formatting (like a bg color).
I've tried using Worksheet.UsedRange and then deleting everything after that but UsedRange has the same problem as point two.

This is the code I've tried:

for (int i = 0; i < worksheets.Count; i++)
{
    sheet = worksheets[i + 1];
    rows = sheet.Rows;
    currentRowIndex = rows.Count;
    bool contentFound = false;

    while (!contentFound && currentRowIndex > 0)
    {
        currentRow = rows[currentRowIndex];

        if (Application.WorksheetFunction.CountA(currentRow) == 0)
        {
            currentRow.Delete();
        }
        else
        {
            contentFound = true;
        }

        Marshal.FinalReleaseComObject(currentRow);
        currentRowIndex--;
    }

    Marshal.FinalReleaseComObject(rows);
    Marshal.FinalReleaseComObject(sheet);
}

for (int i = 0; i < worksheets.Count; i++)
{
    sheet = worksheets[i + 1];
    rows = sheet.Rows;

    lastCell = rows.SpecialCells(XlCellType.xlCellTypeLastCell, XlSpecialCellsValue.xlTextValues);
    int startRow = lastCell.Row;

    Range range = sheet.get_Range(lastCell.get_Address(RowAbsolute: startRow));
    range.Delete();

    Marshal.FinalReleaseComObject(range);
    Marshal.FinalReleaseComObject(lastCell);
    Marshal.FinalReleaseComObject(rows);
    Marshal.FinalReleaseComObject(sheet);
}

Do I have a problem with my code, is this an interop problem or maybe it's just a limitation on what Excel can do? Is there a better way to do what I'm attempting?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

猫七 2024-11-01 04:05:46

我建议您使用 CountA 获取包含某些值的行数（正如您在第 1 点中尝试过的那样）。然后将这些行复制到新工作表中并从那里导出。将几行复制到新工作表并对其进行处理会更容易，而不是尝试从源工作表中删除大量行。

要创建新工作表并复制行，您可以使用以下代码：

        excel.Worksheet tempSheet = workbook.Worksheets.Add();
        tempSheet.Name = sheetName;
        workbook.Save();

//create a new method for copy new rows

//作为 rowindex，您可以传递使用 CountA 找到的总行数

public void CopyRows(excel.Workbook workbook, string sourceSheetName, string DestSheetName, int rowIndex)
        {
            excel.Worksheet sourceSheet = (excel.Worksheet)workbook.Sheets[sourceSheetName];
            excel.Range source = (excel.Range)sourceSheet.Range["A" + rowIndex.ToString(), Type.Missing].EntireRow;

            excel.Worksheet destSheet = (excel.Worksheet)workbook.Sheets[DestSheetName];
            excel.Range dest = (excel.Range)destSheet.Range["A" + rowIndex.ToString(), Type.Missing].EntireRow;
            source.Copy(dest);

            excel.Range newRow = (excel.Range)destSheet.Rows[rowIndex+1];
            newRow.Insert();
            workbook.Save();
        }

I would suggest you to get the count of rows which contain some values, using CountA (as you have tried in point 1). Then copy those rows into a new sheet and export it from there. It will be easier to copy few rows to new sheet and working on it, rather than trying to delete huge number of rows from source sheet.

For creating new sheet and copying rows you can use the following code:

        excel.Worksheet tempSheet = workbook.Worksheets.Add();
        tempSheet.Name = sheetName;
        workbook.Save();

//create a new method for copy new rows

//as the rowindex you can pass the total no of rows you have found out using CountA

public void CopyRows(excel.Workbook workbook, string sourceSheetName, string DestSheetName, int rowIndex)
        {
            excel.Worksheet sourceSheet = (excel.Worksheet)workbook.Sheets[sourceSheetName];
            excel.Range source = (excel.Range)sourceSheet.Range["A" + rowIndex.ToString(), Type.Missing].EntireRow;

            excel.Worksheet destSheet = (excel.Worksheet)workbook.Sheets[DestSheetName];
            excel.Range dest = (excel.Range)destSheet.Range["A" + rowIndex.ToString(), Type.Missing].EntireRow;
            source.Copy(dest);

            excel.Range newRow = (excel.Range)destSheet.Rows[rowIndex+1];
            newRow.Insert();
            workbook.Save();
        }

回复收藏 0 原文

貪欢 2024-11-01 04:05:46

您是否尝试过 Sheet1.Range("A1").CurrentRegion.ExportAsFixedFormat() 其中 Sheet1 是有效的工作表名称，“A1”是您可以测试的单元格以确保它位于您的范围内想要出口？

问题仍然存在，为什么 Excel 认为那些“空”单元格中有数据？格式化？需要清除预先存在的打印区域？我知道我以前也遇到过类似的情况，这是此刻想到的唯一可能性。

回复收藏 0 原文

冰雪梦之恋 2024-11-01 04:05:46

尝试以下步骤 -

将 Worksheet.UsedRange 复制到单独的工作表 (sheet2)。
使用特殊粘贴，以便保留格式
尝试解析sheet2中未使用的行

如果这没有帮助，请尝试重复步骤2，清除格式信息，然后解析sheet2。您可以稍后复制格式信息（如果它们足够简单）

回复收藏 0 原文

余厌 2024-11-01 04:05:46

如果您可以首先通过 OleDBAdapter 将 Excel 文件加载到数据集中，那么在导入时删除空白行相对容易...
试试这个我通过堆栈溢出发布的OleDBAdapter Excel QA。

然后将数据集导出到新的 Excel 文件并将该文件转换为 PDF。当然，这可能是一个很大的“IF”，具体取决于 Excel 布局（或缺乏布局）。

回复收藏 0 原文

掌心的温暖 2024-11-01 04:05:46

我今天必须解决这个问题，因为这可能是您可能出现的情况的一部分。

如果您的电子表格满足以下条件：

所有包含数据的列在第 1 行中都有标题文本。
所有包含数据的行都按顺序排列，直到出现第一个空白行。

那么，下面的代码可能会有所帮助：

    private static string[,] LoadCellData(Excel.Application excel, dynamic sheet)
    {
        int countCols = CountColsToFirstBlank(excel, sheet);
        int countRows = CountRowsToFirstBlank(excel, sheet);
        cellData = new string[countCols, countRows];
        string datum;

        for (int i = 0; i < countCols; i++)
        {
            for (int j = 0; j < countRows; j++)
            {
                try
                {
                    if (null != sheet.Cells[i + 1, j + 1].Value)
                    {
                        datum = excel.Cells[i + 1, j + 1].Value.ToString();
                        cellData[i, j] = datum;
                    }
                }
                catch (Exception ex)
                {
                    lastException = ex;
                    //Console.WriteLine(String.Format("LoadCellData [{1}, {2}] reported an error: [{0}]", ex.Message, i, j));
                }
            }
        }

        return cellData;
    }

    private static int CountRowsToFirstBlank(Excel.Application excel, dynamic sheet)
    {
        int count = 0;

        for (int j = 0; j < sheet.UsedRange.Rows.Count; j++)
        {
            if (IsBlankRow(excel, sheet, j + 1))
                break;

            count++;
        }
        return count;
    }
    private static int CountColsToFirstBlank(Excel.Application excel, dynamic sheet)
    {
        int count = 0;

        for (int i = 0; i < sheet.UsedRange.Columns.Count; i++)
        {
            if (IsBlankCol(excel, sheet, i + 1))
                break;

            count++;
        }
        return count;
    }

    private static bool IsBlankCol(Excel.Application excel, dynamic sheet, int col)
    {
        for (int i = 0; i < sheet.UsedRange.Rows.Count; i++)
        {
            if (null != sheet.Cells[i + 1, col].Value)
            {
                return false;
            }
        }

        return true;
    }
    private static bool IsBlankRow(Excel.Application excel, dynamic sheet, int row)
    {
        for (int i = 0; i < sheet.UsedRange.Columns.Count; i++)
        {
            if (null != sheet.Cells[i + 1, row].Value)
            {
                return false;
            }
        }

        return true;
    }

I had to solve this problem today for what might be a subset of your possible cases.

If your spreadsheet meets the following conditions:

All columns with data have header text in line 1.
All rows with data are in sequence until the first BLANK row.

Then, the following code may help:

    private static string[,] LoadCellData(Excel.Application excel, dynamic sheet)
    {
        int countCols = CountColsToFirstBlank(excel, sheet);
        int countRows = CountRowsToFirstBlank(excel, sheet);
        cellData = new string[countCols, countRows];
        string datum;

        for (int i = 0; i < countCols; i++)
        {
            for (int j = 0; j < countRows; j++)
            {
                try
                {
                    if (null != sheet.Cells[i + 1, j + 1].Value)
                    {
                        datum = excel.Cells[i + 1, j + 1].Value.ToString();
                        cellData[i, j] = datum;
                    }
                }
                catch (Exception ex)
                {
                    lastException = ex;
                    //Console.WriteLine(String.Format("LoadCellData [{1}, {2}] reported an error: [{0}]", ex.Message, i, j));
                }
            }
        }

        return cellData;
    }

    private static int CountRowsToFirstBlank(Excel.Application excel, dynamic sheet)
    {
        int count = 0;

        for (int j = 0; j < sheet.UsedRange.Rows.Count; j++)
        {
            if (IsBlankRow(excel, sheet, j + 1))
                break;

            count++;
        }
        return count;
    }
    private static int CountColsToFirstBlank(Excel.Application excel, dynamic sheet)
    {
        int count = 0;

        for (int i = 0; i < sheet.UsedRange.Columns.Count; i++)
        {
            if (IsBlankCol(excel, sheet, i + 1))
                break;

            count++;
        }
        return count;
    }

    private static bool IsBlankCol(Excel.Application excel, dynamic sheet, int col)
    {
        for (int i = 0; i < sheet.UsedRange.Rows.Count; i++)
        {
            if (null != sheet.Cells[i + 1, col].Value)
            {
                return false;
            }
        }

        return true;
    }
    private static bool IsBlankRow(Excel.Application excel, dynamic sheet, int row)
    {
        for (int i = 0; i < sheet.UsedRange.Columns.Count; i++)
        {
            if (null != sheet.Cells[i + 1, row].Value)
            {
                return false;
            }
        }

        return true;
    }

回复收藏 0 原文

弥繁 2024-11-01 04:05:46

您可以尝试使用以下代码吗：

for (int rowIndex = workSheet.Dimension.Start.Row; rowIndex <= workSheet.Dimension.End.Row; rowIndex++)
                    {
                        //Assume the first row is the header. Then use the column match ups by name to determine the index.
                        //This will allow you to have the order of the header.Keys change without any affect.
                        var row = workSheet.Cells[string.Format("{0}:{0}", rowIndex)];
                        // check if the row and column cells are empty
                        bool allEmpty = row.All(c => string.IsNullOrWhiteSpace(c.Text));
                        if (allEmpty)
                            continue; // skip this row
                        else{
                               //here read header
                               if()
                                 {
                                  //some code
                                 }
                               else
                                  {
                                   //some code to read body
                                  }
                            }
                    }

希望有所帮助，否则如果您需要有关代码的描述，请告诉我。

更新：

下面的代码用于检查工作表中有多少行。 for 循环将遍历工作表的行末尾。

for (int rowIndex = workSheet.Dimension.Start.Row; rowIndex <= workSheet.Dimension.End.Row; rowIndex++)

这里我们使用 linq 检查行和列单元格是否为空：

bool allEmpty = row.All(c => string.IsNullOrWhiteSpace(c.Text));
if (全部为空)
继续; // 如果为 true 则跳过此行
别的
// 读取标题（假设它出现在工作表中）
// 否则读取行数据
然后执行必要的步骤。

希望现在一切都清楚了。

Can you try with below code :

for (int rowIndex = workSheet.Dimension.Start.Row; rowIndex <= workSheet.Dimension.End.Row; rowIndex++)
                    {
                        //Assume the first row is the header. Then use the column match ups by name to determine the index.
                        //This will allow you to have the order of the header.Keys change without any affect.
                        var row = workSheet.Cells[string.Format("{0}:{0}", rowIndex)];
                        // check if the row and column cells are empty
                        bool allEmpty = row.All(c => string.IsNullOrWhiteSpace(c.Text));
                        if (allEmpty)
                            continue; // skip this row
                        else{
                               //here read header
                               if()
                                 {
                                  //some code
                                 }
                               else
                                  {
                                   //some code to read body
                                  }
                            }
                    }

Hope this help,else let me know if you need description about code.

Updated :

below code is used to check how many rows are in the worksheet. a for loop will traverse untill end of row of the worksheet.

for (int rowIndex = workSheet.Dimension.Start.Row; rowIndex <= workSheet.Dimension.End.Row; rowIndex++)

here we are checking if the row and column cells are empty using linq:

bool allEmpty = row.All(c => string.IsNullOrWhiteSpace(c.Text));
if (allEmpty)
continue; // if true then skip this row
else
// read headers(assuming it is presented in worksheet)
// else read row wise data
and then do necessary steps.

hoping this clears now.

回复收藏 0 原文

眸中客 2024-11-01 04:05:46

我遇到了同样的问题，并设法使用 CurrentRegion:

                    var lastcell = sheet.Cells.SpecialCells(XlCellType.xlCellTypeLastCell);
                    var filledcells = sheet.Cells.Range[sheet.Cells.Item[1, 1],
                            sheet.Cells[lastcell.Row - 1, lastcell.Column]]
                        .CurrentRegion;
                    filledcells.ExportAsFixedFormat(

等修复它。据说 CurrentRegion 会扩展到单元格为空的边界，显然这意味着如果它包含许多空单元格，它也会缩小。

I had the same problem and managed to fix it using the CurrentRegion:

                    var lastcell = sheet.Cells.SpecialCells(XlCellType.xlCellTypeLastCell);
                    var filledcells = sheet.Cells.Range[sheet.Cells.Item[1, 1],
                            sheet.Cells[lastcell.Row - 1, lastcell.Column]]
                        .CurrentRegion;
                    filledcells.ExportAsFixedFormat(

and so on. The CurrentRegion is said to expand to the borders where cells are empty, and apparently that means it also shrinks if it contains many empty cells.

回复收藏 0 原文

沫尐诺 2024-11-01 04:05:46

请尝试以下代码：

for (int i = 0; i < worksheets.Count; i++)
{
    sheet = worksheets[i + 1];
    sheet.Columns("A:A").SpecialCells(XlCellType.xlCellTypeBlanks).EntireRow.Delete
    sheet.Rows("1:1").SpecialCells(XlCellType.xlCellTypeBlanks).EntireColumn.Delete
    Marshal.FinalReleaseComObject(sheet);
}

Please try the following code:

for (int i = 0; i < worksheets.Count; i++)
{
    sheet = worksheets[i + 1];
    sheet.Columns("A:A").SpecialCells(XlCellType.xlCellTypeBlanks).EntireRow.Delete
    sheet.Rows("1:1").SpecialCells(XlCellType.xlCellTypeBlanks).EntireColumn.Delete
    Marshal.FinalReleaseComObject(sheet);
}

回复收藏 0 原文

~没有更多了~