有没有办法从 20MB+ 中获取工作表名称?使用 PHPExcel 生成 Excel 文件?

发布于 2024-10-09 14:17:55 字数 2201 浏览 2 评论 0原文

我正在使用 PHPExcel 从 Excel 文件中读取数据。

使用以下代码,我可以在几秒钟内从 3MB Excel 文件中读取一个特定工作表。效果很好。

但是,我现在有 27MB88MB Excel 文件,我需要从中获取数据。它们太大了,甚至 OpenOffice 也无法打开它们。

我发现加载工作表时可以使用索引号而不是名称,但这似乎不一致,例如在一个特定的 Excel 文件中 setLoadSheetsOnly(0) 给了我第三< /strong> 工作表,而 setLoadSheetsOnly(1) 给了我一个错误,即使文件中有四个工作表。因此,出于某种原因,这似乎不可靠

有没有一种方法可以从大文件中读出工作表的名称,以便我一次只能访问其中一个工作表?

        $objReader = PHPExcel_IOFactory::createReaderForFile("data/" . $file_name);
        $objReader->setLoadSheetsOnly(array($sheet_name));
        $objReader->setReadDataOnly(true);
        $objPHPExcel = $objReader->load("data/" . $file_name);

        echo '<table border="1">';
        for ($row = 1; $row < $number_of_rows; $row++) {
            echo '<tr>';
            for ($column = 0; $column < $number_of_columns; $column++) {
                $value = $objPHPExcel->setActiveSheetIndex(0)->getCellByColumnAndRow($column, $row)->getValue();
                echo '<td>';
                echo $value . '&nbsp;';
                echo '</td>';
            }
            echo '</tr>';
        }
        echo '</table>';
        die;

附录:

我发现了一些接近的代码,但它似乎并不总是准确的,例如这里它错过了 27MB 文件中的第二个工作表:

alt text

在这里它只得到了第三个工作表,错过了另外 3 个:

alt text

$objReader = PHPExcel_IOFactory::createReaderForFile("data/" . $file_name);
$objReader->setLoadSheetsOnly(0);
$objReader->setReadDataOnly(true);
$objPHPExcel = $objReader->load("data/" . $file_name);

echo $objPHPExcel->getSheetCount(), ' worksheets<hr/>';
$loadedSheetNames = $objPHPExcel->getSheetNames();
foreach ($loadedSheetNames as $sheetIndex => $loadedSheetName) {
    echo $sheetIndex, ' -> ', $loadedSheetName, '<br />';
}
die;

I am using PHPExcel to read data out of Excel files.

With the following code, I am able to read one particular worksheet out of a 3MB Excel file in just a couple seconds. Works nicely.

However, I now have 27MB and 88MB Excel files which I need to get data out of. They are so large that even OpenOffice cannot open them.

I have found that I can use an index number instead of a name when I load a sheet, but this seems inconsistent, e.g. in one particular Excel file setLoadSheetsOnly(0) gave me the third sheet while setLoadSheetsOnly(1) gave me an error even though there were four worksheets in the file. So that seems unreliable for some reason.

Is there a way I can read out the names of the worksheets from a large file so that I can access only one of its worksheets at a time?

        $objReader = PHPExcel_IOFactory::createReaderForFile("data/" . $file_name);
        $objReader->setLoadSheetsOnly(array($sheet_name));
        $objReader->setReadDataOnly(true);
        $objPHPExcel = $objReader->load("data/" . $file_name);

        echo '<table border="1">';
        for ($row = 1; $row < $number_of_rows; $row++) {
            echo '<tr>';
            for ($column = 0; $column < $number_of_columns; $column++) {
                $value = $objPHPExcel->setActiveSheetIndex(0)->getCellByColumnAndRow($column, $row)->getValue();
                echo '<td>';
                echo $value . ' ';
                echo '</td>';
            }
            echo '</tr>';
        }
        echo '</table>';
        die;

ADDENDUM:

I found some code that gets close, but it doesn't seem to always be accurate, e.g. here it missed the second worksheet in a 27MB file:

alt text

and here it only got the third worksheet and missed 3 others:

alt text

$objReader = PHPExcel_IOFactory::createReaderForFile("data/" . $file_name);
$objReader->setLoadSheetsOnly(0);
$objReader->setReadDataOnly(true);
$objPHPExcel = $objReader->load("data/" . $file_name);

echo $objPHPExcel->getSheetCount(), ' worksheets<hr/>';
$loadedSheetNames = $objPHPExcel->getSheetNames();
foreach ($loadedSheetNames as $sheetIndex => $loadedSheetName) {
    echo $sheetIndex, ' -> ', $loadedSheetName, '<br />';
}
die;

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

丢了幸福的猪 2024-10-16 14:17:55

不幸的是,在不加载整个文件的情况下无法读取工作表的名称。

调用 setLoadSheetsOnly() 时使用索引号而不是名称不会给出可预测的结果:执行该检查的代码逻辑使用 in_array() 来测试要读取的工作表名称是否位于要读取的工作表名称数组中。读。例如,

// check if sheet should be skipped
if (isset($this->_loadSheetsOnly) && !in_array($sheet['name'], $this->_loadSheetsOnly)) {
    continue;
}

我怀疑在执行此测试时,字符串与数值的比较将给出 0 == "mySheetName" 的真实结果(基于 PHP 的松散类型和比较转换规则)。

我可能可以提供一个 Reader 方法,该方法将返回工作表名称列表,而无需实际加载整个文件,尽管会影响性能。

编辑

如果将以下方法添加到 Classes/PHPExcel/Reader/Excel2007.php

/**
 * Reads names of the worksheets from a file, without loading the whole file to a PHPExcel object
 *
 * @param   string      $pFilename
 * @throws  Exception
 */
public function listWorksheetNames($pFilename)
{
    // Check if file exists
    if (!file_exists($pFilename)) {
        throw new Exception("Could not open " . $pFilename . " for reading! File does not exist.");
    }

    $worksheetNames = array();

    $zip = new ZipArchive;
    $zip->open($pFilename);

    $rels = simplexml_load_string($this->_getFromZipArchive($zip, "_rels/.rels")); //~ http://schemas.openxmlformats.org/package/2006/relationships");
    foreach ($rels->Relationship as $rel) {
        switch ($rel["Type"]) {
            case "http://schemas.openxmlformats.org/officeDocument/2006/relationships/officeDocument":
                $xmlWorkbook = simplexml_load_string($this->_getFromZipArchive($zip, "{$rel['Target']}"));  //~ http://schemas.openxmlformats.org/spreadsheetml/2006/main");

                if ($xmlWorkbook->sheets) {
                    foreach ($xmlWorkbook->sheets->sheet as $eleSheet) {
                        // Check if sheet should be skipped
                        $worksheetNames[] = (string) $eleSheet["name"];
                    }
                }
        }
    }

    $zip->close();

    return $worksheetNames;
}

您可以使用以下方式调用它:

$inputFileType = 'Excel2007';
$inputFileName = 'biostat-behfisk-2005.xlsx';
$objReader = PHPExcel_IOFactory::createReader($inputFileType);
$worksheetNames = $objReader->listWorksheetNames($inputFileName);

foreach ($worksheetNames as $sheetName) {
    echo $sheetName, '<br />';
}

返回的 $worksheetNames 应包含所有工作表名称作为 UTF-8 字符串的数组。因为它只是从 .xlsx 中读取绝对最小值来检索这些名称,所以它应该相当快。在将其签入 PHPExcel SVN 之前,我将做更多测试,但(目前)它似乎可以满足您的需要。

EDIT2

Excel5 Reader 的等效方法

/**
 * Reads names of the worksheets from a file, without loading the whole file to a PHPExcel object
 *
 * @param   string      $pFilename
 * @throws  Exception
 */
public function listWorksheetNames($pFilename)
{
    // Check if file exists
    if (!file_exists($pFilename)) {
        throw new Exception("Could not open " . $pFilename . " for reading! File does not exist.");
    }

    $worksheetNames = array();

    // Read the OLE file
    $this->_loadOLE($pFilename);

    // total byte size of Excel data (workbook global substream + sheet substreams)
    $this->_dataSize = strlen($this->_data);

    $this->_pos     = 0;
    $this->_sheets  = array();

    // Parse Workbook Global Substream
    while ($this->_pos < $this->_dataSize) {
        $code = self::_GetInt2d($this->_data, $this->_pos);

        switch ($code) {
            case self::XLS_Type_BOF:    $this->_readBof();      break;
            case self::XLS_Type_SHEET:  $this->_readSheet();    break;
            case self::XLS_Type_EOF:    $this->_readDefault();  break 2;
            default:                    $this->_readDefault();  break;
        }
    }

    foreach ($this->_sheets as $sheet) {
        if ($sheet['sheetType'] != 0x00) {
            // 0x00: Worksheet, 0x02: Chart, 0x06: Visual Basic module
            continue;
        }

        $worksheetNames[] = $sheet['name'];
    }

    return $worksheetNames;
}

不如 Excel2007 Reader 版本高效,但仍比仅解析工作表名称的整个 .xls 文件更快,因为我只解析全局流。

Unfortunately, it isn't possible to read the names of the worksheets without loading the whole file.

Using an index number rather than a name when calling setLoadSheetsOnly() won't give a predictable result: the code logic that performs that check uses in_array() to test if the sheetname that it is about to read is in the array of sheetnames to read. e.g.

// check if sheet should be skipped
if (isset($this->_loadSheetsOnly) && !in_array($sheet['name'], $this->_loadSheetsOnly)) {
    continue;
}

I would suspect that the comparison of a string against a numeric value will be giving a true result for 0 == "mySheetName" when executing this test (based on PHP's loose typing and comparison casting rules).

I could probably provide a Reader method that would return a list of worksheet names without actually loading the entire file, although there would be a performance hit involved.

EDIT

If you add the following method to Classes/PHPExcel/Reader/Excel2007.php

/**
 * Reads names of the worksheets from a file, without loading the whole file to a PHPExcel object
 *
 * @param   string      $pFilename
 * @throws  Exception
 */
public function listWorksheetNames($pFilename)
{
    // Check if file exists
    if (!file_exists($pFilename)) {
        throw new Exception("Could not open " . $pFilename . " for reading! File does not exist.");
    }

    $worksheetNames = array();

    $zip = new ZipArchive;
    $zip->open($pFilename);

    $rels = simplexml_load_string($this->_getFromZipArchive($zip, "_rels/.rels")); //~ http://schemas.openxmlformats.org/package/2006/relationships");
    foreach ($rels->Relationship as $rel) {
        switch ($rel["Type"]) {
            case "http://schemas.openxmlformats.org/officeDocument/2006/relationships/officeDocument":
                $xmlWorkbook = simplexml_load_string($this->_getFromZipArchive($zip, "{$rel['Target']}"));  //~ http://schemas.openxmlformats.org/spreadsheetml/2006/main");

                if ($xmlWorkbook->sheets) {
                    foreach ($xmlWorkbook->sheets->sheet as $eleSheet) {
                        // Check if sheet should be skipped
                        $worksheetNames[] = (string) $eleSheet["name"];
                    }
                }
        }
    }

    $zip->close();

    return $worksheetNames;
}

You can call it by using:

$inputFileType = 'Excel2007';
$inputFileName = 'biostat-behfisk-2005.xlsx';
$objReader = PHPExcel_IOFactory::createReader($inputFileType);
$worksheetNames = $objReader->listWorksheetNames($inputFileName);

foreach ($worksheetNames as $sheetName) {
    echo $sheetName, '<br />';
}

the returned $worksheetNames should contain an array of all worksheet names as UTF-8 strings. Because it's only reading the absolute minimum from the .xlsx to retrieve these names, it should be fairly fast. I'll do some more tests before checking it into the PHPExcel SVN, but (for now) it seems to do what you need.

EDIT2

Equivalent method for the Excel5 Reader

/**
 * Reads names of the worksheets from a file, without loading the whole file to a PHPExcel object
 *
 * @param   string      $pFilename
 * @throws  Exception
 */
public function listWorksheetNames($pFilename)
{
    // Check if file exists
    if (!file_exists($pFilename)) {
        throw new Exception("Could not open " . $pFilename . " for reading! File does not exist.");
    }

    $worksheetNames = array();

    // Read the OLE file
    $this->_loadOLE($pFilename);

    // total byte size of Excel data (workbook global substream + sheet substreams)
    $this->_dataSize = strlen($this->_data);

    $this->_pos     = 0;
    $this->_sheets  = array();

    // Parse Workbook Global Substream
    while ($this->_pos < $this->_dataSize) {
        $code = self::_GetInt2d($this->_data, $this->_pos);

        switch ($code) {
            case self::XLS_Type_BOF:    $this->_readBof();      break;
            case self::XLS_Type_SHEET:  $this->_readSheet();    break;
            case self::XLS_Type_EOF:    $this->_readDefault();  break 2;
            default:                    $this->_readDefault();  break;
        }
    }

    foreach ($this->_sheets as $sheet) {
        if ($sheet['sheetType'] != 0x00) {
            // 0x00: Worksheet, 0x02: Chart, 0x06: Visual Basic module
            continue;
        }

        $worksheetNames[] = $sheet['name'];
    }

    return $worksheetNames;
}

Not as efficient as the Excel2007 Reader version, but should still be faster than parsing the whole .xls file just for the sheet names because I'm only parsing the global stream.

独木成林 2024-10-16 14:17:55

我不想修改 phpexcel 所以我这样做了:

public function getWorksheetNames($pFilename) {

    $worksheetNames = array ();

    $zip = zip_open ( $pFilename );
    while ( $entry = zip_read ( $zip ) ) {

        $entry_name = zip_entry_name ( $entry );
        if ($entry_name == 'xl/workbook.xml') {
            if (zip_entry_open ( $zip, $entry, "r" )) {
                $buf = zip_entry_read ( $entry, zip_entry_filesize ( $entry ) );
                $workbook = simplexml_load_string ( $buf );
                foreach ( $workbook->sheets as $sheets ) {
                    foreach( $sheets as $sheet) {
                        $attributes=$sheet->attributes();
                        $worksheetNames[]=$attributes['name'];
                    }
                }
                zip_entry_close ( $entry );
            }
            break;
        }

    }
    zip_close ( $zip );
    return $worksheetNames;
}

它只适用于 excel 2007 或更高版本,但做了我需要的事情

I didn't want to amend phpexcel so I went with this:

public function getWorksheetNames($pFilename) {

    $worksheetNames = array ();

    $zip = zip_open ( $pFilename );
    while ( $entry = zip_read ( $zip ) ) {

        $entry_name = zip_entry_name ( $entry );
        if ($entry_name == 'xl/workbook.xml') {
            if (zip_entry_open ( $zip, $entry, "r" )) {
                $buf = zip_entry_read ( $entry, zip_entry_filesize ( $entry ) );
                $workbook = simplexml_load_string ( $buf );
                foreach ( $workbook->sheets as $sheets ) {
                    foreach( $sheets as $sheet) {
                        $attributes=$sheet->attributes();
                        $worksheetNames[]=$attributes['name'];
                    }
                }
                zip_entry_close ( $entry );
            }
            break;
        }

    }
    zip_close ( $zip );
    return $worksheetNames;
}

It only works on excel 2007 or later but did what I needed

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文