测试用大熊猫生成文件的Python脚本
我有一系列的Python脚本,我终于建立了Unitests。这些脚本通常读取一堆Excel文件,在Pandas中进行一些处理,然后生成一个或多个输出文件。
脚本通常看起来像这样:
import datetime
import paths # contains common pathlib.Path objects for all scripts
NOW = datetime.datetime.now()
INPUT_DATA = pd.read_excel(paths.data_filepath, ...)
def main():
... do a bunch of stuff with INPUT_DATA to get MUNGED_DATA
report_path = paths.output_dir / f"report {NOW:%Y-%m-%d %I:%M}.xlsx"
with pd.ExcelFile(report_path) as fp:
MUNGED_DATA.to_excel(fp)
有时我用一个脚本生成两个文件。
在测试脚本中,我将脚本作为模块导入,并通过覆盖导入的模块的全局变量来强制我要测试的数据,但是我不知道如何捕获输出。生成输出文件并再次删除它们似乎有风险。有什么方法可以捕获通过pandas.to_excel
和pandas.to_csv
用于测试目的而生成的文件?
import datetime
import pathlib
import paths # my predetermined path library
mock_dir = pathlib.Path(".").absolute() / "mocks"
paths.data_filepath = mock_dir / "mockdata.xlsx" # this is a stub file to speed up testing
import data_processing_script as script
script.NOW =datetime.datetime(1999,12, 31, 23, 59, 59) # ensures all output files have same name
class TestTheScript(unittest.TestCase):
def test_intended_success(self):
script.INPUTDATA = pd.DataFrame( ... mock data ... )
script.main()
intended = pd.Series( ... items I expect ...)
result = pd.read_excel('mocks/report 1999-12-31.xlsx')
self.assertEqual(set(intended, set(result[Target Column Name]))
# there is only one column in the established dataset worth testing in this case, and the order of the items do not matter
一旦我进行了两次测试,我就会获得一个权限错误,因为输出文件仍然“打开”或通过OneDrive复制(我认为是这种情况,因为当我暂停Onedrive时,测试全部通过而无需给我权限错误)。由于我不记得要阻止OneDrive与我搞砸,是否有更好的方法可以在测试环境中捕获这些文件?
I have a series of Python scripts for which I am finally building unittests. The scripts generally read a bunch of Excel files, do some processing in pandas, and then generate one or more output files.
The scripts generally look like this:
import datetime
import paths # contains common pathlib.Path objects for all scripts
NOW = datetime.datetime.now()
INPUT_DATA = pd.read_excel(paths.data_filepath, ...)
def main():
... do a bunch of stuff with INPUT_DATA to get MUNGED_DATA
report_path = paths.output_dir / f"report {NOW:%Y-%m-%d %I:%M}.xlsx"
with pd.ExcelFile(report_path) as fp:
MUNGED_DATA.to_excel(fp)
Sometimes I generate two files with one script.
In the testing script, I import the script as a module and force the data I want to test by overriding the imported module's global variables, but I don't know how to capture the output. It seems risky to generate the output files and delete them again. Is there any way to capture files generated through pandas.to_excel
and pandas.to_csv
for testing purposes?
import datetime
import pathlib
import paths # my predetermined path library
mock_dir = pathlib.Path(".").absolute() / "mocks"
paths.data_filepath = mock_dir / "mockdata.xlsx" # this is a stub file to speed up testing
import data_processing_script as script
script.NOW =datetime.datetime(1999,12, 31, 23, 59, 59) # ensures all output files have same name
class TestTheScript(unittest.TestCase):
def test_intended_success(self):
script.INPUTDATA = pd.DataFrame( ... mock data ... )
script.main()
intended = pd.Series( ... items I expect ...)
result = pd.read_excel('mocks/report 1999-12-31.xlsx')
self.assertEqual(set(intended, set(result[Target Column Name]))
# there is only one column in the established dataset worth testing in this case, and the order of the items do not matter
As soon as I run two tests, I get a Permission error because the output file is still "open" or is being copied by OneDrive (I think this is the case because when I pause OneDrive the tests all pass without giving me the permissions error). Since I'm not going to remember to stop OneDrive from messing with me, is there a better way to capture these files in the testing environment?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我无法获得
pyfakefs
工作,似乎没有其他解决方案立即即将到来,因此我编辑了脚本以使其看起来像这样:然后更新了测试以调用Munge,
而不必担心我的测试工作文件系统。
I could not get
pyfakefs
to work and no other solution seems immediately forthcoming, so I edited my script to look like this:And then updated the tests to call munge instead
Now my tests work without having to worry about the file system.