测试用大熊猫生成文件的Python脚本

发布于 2025-01-25 20:15:56 字数 1714 浏览 2 评论 0原文

我有一系列的Python脚本，我终于建立了Unitests。这些脚本通常读取一堆Excel文件，在Pandas中进行一些处理，然后生成一个或多个输出文件。

脚本通常看起来像这样：

import datetime
import paths # contains common pathlib.Path objects for all scripts
NOW = datetime.datetime.now()

INPUT_DATA = pd.read_excel(paths.data_filepath, ...)

def main():
   ... do a bunch of stuff with INPUT_DATA to get MUNGED_DATA
   report_path = paths.output_dir / f"report {NOW:%Y-%m-%d %I:%M}.xlsx"
   with pd.ExcelFile(report_path) as fp:
       MUNGED_DATA.to_excel(fp)

有时我用一个脚本生成两个文件。

在测试脚本中，我将脚本作为模块导入，并通过覆盖导入的模块的全局变量来强制我要测试的数据，但是我不知道如何捕获输出。生成输出文件并再次删除它们似乎有风险。有什么方法可以捕获通过pandas.to_excel和pandas.to_csv用于测试目的而生成的文件？

import datetime
import pathlib
import paths # my predetermined path library

mock_dir = pathlib.Path(".").absolute() / "mocks"
paths.data_filepath = mock_dir / "mockdata.xlsx" # this is a stub file to speed up testing

import data_processing_script as script

script.NOW =datetime.datetime(1999,12, 31, 23, 59, 59) # ensures all output files have same name

class TestTheScript(unittest.TestCase):
    def test_intended_success(self):
        script.INPUTDATA = pd.DataFrame( ... mock data ... )
        script.main()
        intended = pd.Series( ... items I expect ...)
        result = pd.read_excel('mocks/report 1999-12-31.xlsx')
        self.assertEqual(set(intended, set(result[Target Column Name]))

        # there is only one column in the established dataset worth testing in this case, and the order of the items do not matter

一旦我进行了两次测试，我就会获得一个权限错误，因为输出文件仍然“打开”或通过OneDrive复制（我认为是这种情况，因为当我暂停Onedrive时，测试全部通过而无需给我权限错误）。由于我不记得要阻止OneDrive与我搞砸，是否有更好的方法可以在测试环境中捕获这些文件？

原文

I have a series of Python scripts for which I am finally building unittests. The scripts generally read a bunch of Excel files, do some processing in pandas, and then generate one or more output files.

The scripts generally look like this:

import datetime
import paths # contains common pathlib.Path objects for all scripts
NOW = datetime.datetime.now()

INPUT_DATA = pd.read_excel(paths.data_filepath, ...)

def main():
   ... do a bunch of stuff with INPUT_DATA to get MUNGED_DATA
   report_path = paths.output_dir / f"report {NOW:%Y-%m-%d %I:%M}.xlsx"
   with pd.ExcelFile(report_path) as fp:
       MUNGED_DATA.to_excel(fp)

Sometimes I generate two files with one script.

In the testing script, I import the script as a module and force the data I want to test by overriding the imported module's global variables, but I don't know how to capture the output. It seems risky to generate the output files and delete them again. Is there any way to capture files generated through pandas.to_excel and pandas.to_csv for testing purposes?

import datetime
import pathlib
import paths # my predetermined path library

mock_dir = pathlib.Path(".").absolute() / "mocks"
paths.data_filepath = mock_dir / "mockdata.xlsx" # this is a stub file to speed up testing

import data_processing_script as script

script.NOW =datetime.datetime(1999,12, 31, 23, 59, 59) # ensures all output files have same name

class TestTheScript(unittest.TestCase):
    def test_intended_success(self):
        script.INPUTDATA = pd.DataFrame( ... mock data ... )
        script.main()
        intended = pd.Series( ... items I expect ...)
        result = pd.read_excel('mocks/report 1999-12-31.xlsx')
        self.assertEqual(set(intended, set(result[Target Column Name]))

        # there is only one column in the established dataset worth testing in this case, and the order of the items do not matter

As soon as I run two tests, I get a Permission error because the output file is still "open" or is being copied by OneDrive (I think this is the case because when I pause OneDrive the tests all pass without giving me the permissions error). Since I'm not going to remember to stop OneDrive from messing with me, is there a better way to capture these files in the testing environment?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

作业与我同在 2025-02-01 20:15:56

我无法获得pyfakefs工作，似乎没有其他解决方案立即即将到来，因此我编辑了脚本以使其看起来像这样：

import datetime
import paths # contains common pathlib.Path objects for all scripts
NOW = datetime.datetime.now()

INPUT_DATA = pd.read_excel(paths.data_filepath, ...)

def munge():
   ... do a bunch of stuff with INPUT_DATA to get MUNGED_DATA
   return MUNGED_DATA

def main():
   MUNGED_DATA = munge()
   report_path = paths.output_dir / f"report {NOW:%Y-%m-%d %I:%M}.xlsx"
   with pd.ExcelFile(report_path) as fp:
       MUNGED_DATA.to_excel(fp)

然后更新了测试以调用Munge，

class TestTheScript(unittest.TestCase):
    def test_intended_success(self):
        script.INPUTDATA = pd.DataFrame( ... mock data ... )
        result = script.munge()
        intended = pd.Series( ... items I expect ...)
        
        self.assertEqual(set(intended), set(result[Target Column Name]))

而不必担心我的测试工作文件系统。

I could not get pyfakefs to work and no other solution seems immediately forthcoming, so I edited my script to look like this:

import datetime
import paths # contains common pathlib.Path objects for all scripts
NOW = datetime.datetime.now()

INPUT_DATA = pd.read_excel(paths.data_filepath, ...)

def munge():
   ... do a bunch of stuff with INPUT_DATA to get MUNGED_DATA
   return MUNGED_DATA

def main():
   MUNGED_DATA = munge()
   report_path = paths.output_dir / f"report {NOW:%Y-%m-%d %I:%M}.xlsx"
   with pd.ExcelFile(report_path) as fp:
       MUNGED_DATA.to_excel(fp)

And then updated the tests to call munge instead

class TestTheScript(unittest.TestCase):
    def test_intended_success(self):
        script.INPUTDATA = pd.DataFrame( ... mock data ... )
        result = script.munge()
        intended = pd.Series( ... items I expect ...)
        
        self.assertEqual(set(intended), set(result[Target Column Name]))

Now my tests work without having to worry about the file system.

回复收藏 0 原文

~没有更多了~