python:如何使用格式插件编写通用文件阅读器

发布于 2024-10-29 09:01:04 字数 4901 浏览 0 评论 0原文

我正在尝试编写一个各种医学图像格式的通用阅读器 我们遇到了。我想,我们向高手学习一下,然后去模仿如何 PIL 一般读取文件(“Python 成像库”,格式)。

据我了解,PIL 有一个 open 函数,它循环遍历可能的列表 接受函数。工作时,会使用相关的工厂函数来实例化 适当的对象。

所以我就这么做了,我的(精简的)努力就在这里:


pluginID = []     # list of all registered plugin IDs
OPEN = {}         # plugins have open and (maybe) accept functions as a tuple

_initialized = False

import os, sys

def moduleinit():
    '''Explicitly initializes the library.  This function 
    loads all available file format drivers.

    This routine has been lifted from PIL, the Python Image Library'''

    global _initialized
    global pluginID
    if _initialized:
        return 

    visited = {}

    directories = sys.path

    try:
        directories = directories + [os.path.dirname(__file__)]
    except NameError:
        pass

    # only check directories (including current, if present in the path)
    for directory in filter(isDirectory, directories):
        fullpath = os.path.abspath(directory)
        if visited.has_key(fullpath):
            continue
        for file in os.listdir(directory):
            if file[-19:] == "TestReaderPlugin.py":
                f, e = os.path.splitext(file)
                try:
                    sys.path.insert(0, directory)
                    try: # FIXME: this will not reload and hence pluginID 
                        # will be unpopulated leading to "cannot identify format"
                        __import__(f, globals(), locals(), [])
                    finally:
                        del sys.path[0]
                except ImportError:
                    print f, ":", sys.exc_value
        visited[fullpath] = None

    if OPEN:
        _initialized = True
        return 1

class Reader:
    '''Base class for image file format handlers.'''
    def __init__(self, fp=None, filename=None):

        self.filename = filename

        if isStringType(filename):
            import __builtin__
            self.fp = __builtin__.open(filename) # attempt opening

        # this may fail if not implemented
        self._open() # unimplemented in base class but provided by plugins

    def _open(self):
        raise NotImplementedError(
            "StubImageFile subclass must implement _open"
            )


# this is the generic open that tries to find the appropriate handler
def open(fp):
    '''Probe an image file

    Supposed to attempt all opening methods that are available. Each 
    of them is supposed to fail quickly if the filetype is invalid for its 
    respective format'''

    filename=fp

    moduleinit() # make sure we have access to all the plugins

    for i in pluginID:
        try:
            factory, accept = OPEN[i]
            if accept:
                fp = accept(fp)
                # accept is expected to either return None (if unsuccessful) 
                # or hand back a file handle to be used for opening                                 
                if fp:
                    fp.seek(0)  
                    return factory(fp, filename=filename) 
        except (SyntaxError, IndexError, TypeError): 
                pass # I suppose that factory is allowed to have these 
                # exceptions for problems that weren't caught with accept()
                # hence, they are simply ignored and we try the other plugins

    raise IOError("cannot identify format")

# --------------------------------------------------------------------
# Plugin registry

def register_open(id, factory, accept=None):
    pluginID.append(id)
    OPEN[id] = factory, accept

# --------------------------------------------------------------------
# Internal:

# type stuff
from types import  StringType

def isStringType(t):
    return isinstance(t, StringType)

def isDirectory(f):
    '''Checks if an object is a string, and that it points to a directory'''
    return isStringType(f) and os.path.isdir(f)

幕后重要的一点是所有格式插件的注册 第一次尝试打开文件(moduleinit)。每个符合条件的 插件必须位于可访问的路径中并命名为 *TestReaderPlugin.py。它将 获取(动态)导入。每个插件模块必须调用一个register_open 提供 ID、创建文件的方法以及用于测试的接受函数 候选人档案。

示例插件如下所示:


import TestReader

def _accept(filename):
    fp=open(filename,"r")
    # we made it here, so let's just accept this format
    return fp

class exampleTestReader(TestReader.Reader):
    format='example'

    def _open(self):
        self.data = self.fp.read()

TestReader.register_open('example', exampleTestReader, accept=_accept)

TestReader.open() 是用户将使用的函数:

import TestReader
a=TestReader.open(filename) # easy

那么 - 问题出在哪里?首先,我仍在寻找 pythonic 方式。是这个吗?我怀疑的理由是 moduleinit 中的魔力 舞台看起来很乱。它是直接从 PIL 复制的。主要问题:如果你 reload(TestReader),它将停止工作,因为 ID 被初始化为 [], 但插件不会重新加载。

有没有更好的方法来设置通用阅读器
1. 允许对所有格式进行简单的 open(filename) 调用
2.只需要为您想要的任何格式提供封装良好的插件。
3. 可以重新加载吗?

I'm trying to write a generic reader of all sorts of medical image formats that
we come accross. I thought, let's learn from the pros and went to imitate how
PIL generically reads files ("Python Imaging Library", Formats).

As I understand it, PIL has an open function that loops throuh a list of possible
accept functions. When one works, it uses the related factory function to instantiate
the appropriate object.

So I went to do this and my (stripped-down) efforts are here:


pluginID = []     # list of all registered plugin IDs
OPEN = {}         # plugins have open and (maybe) accept functions as a tuple

_initialized = False

import os, sys

def moduleinit():
    '''Explicitly initializes the library.  This function 
    loads all available file format drivers.

    This routine has been lifted from PIL, the Python Image Library'''

    global _initialized
    global pluginID
    if _initialized:
        return 

    visited = {}

    directories = sys.path

    try:
        directories = directories + [os.path.dirname(__file__)]
    except NameError:
        pass

    # only check directories (including current, if present in the path)
    for directory in filter(isDirectory, directories):
        fullpath = os.path.abspath(directory)
        if visited.has_key(fullpath):
            continue
        for file in os.listdir(directory):
            if file[-19:] == "TestReaderPlugin.py":
                f, e = os.path.splitext(file)
                try:
                    sys.path.insert(0, directory)
                    try: # FIXME: this will not reload and hence pluginID 
                        # will be unpopulated leading to "cannot identify format"
                        __import__(f, globals(), locals(), [])
                    finally:
                        del sys.path[0]
                except ImportError:
                    print f, ":", sys.exc_value
        visited[fullpath] = None

    if OPEN:
        _initialized = True
        return 1

class Reader:
    '''Base class for image file format handlers.'''
    def __init__(self, fp=None, filename=None):

        self.filename = filename

        if isStringType(filename):
            import __builtin__
            self.fp = __builtin__.open(filename) # attempt opening

        # this may fail if not implemented
        self._open() # unimplemented in base class but provided by plugins

    def _open(self):
        raise NotImplementedError(
            "StubImageFile subclass must implement _open"
            )


# this is the generic open that tries to find the appropriate handler
def open(fp):
    '''Probe an image file

    Supposed to attempt all opening methods that are available. Each 
    of them is supposed to fail quickly if the filetype is invalid for its 
    respective format'''

    filename=fp

    moduleinit() # make sure we have access to all the plugins

    for i in pluginID:
        try:
            factory, accept = OPEN[i]
            if accept:
                fp = accept(fp)
                # accept is expected to either return None (if unsuccessful) 
                # or hand back a file handle to be used for opening                                 
                if fp:
                    fp.seek(0)  
                    return factory(fp, filename=filename) 
        except (SyntaxError, IndexError, TypeError): 
                pass # I suppose that factory is allowed to have these 
                # exceptions for problems that weren't caught with accept()
                # hence, they are simply ignored and we try the other plugins

    raise IOError("cannot identify format")

# --------------------------------------------------------------------
# Plugin registry

def register_open(id, factory, accept=None):
    pluginID.append(id)
    OPEN[id] = factory, accept

# --------------------------------------------------------------------
# Internal:

# type stuff
from types import  StringType

def isStringType(t):
    return isinstance(t, StringType)

def isDirectory(f):
    '''Checks if an object is a string, and that it points to a directory'''
    return isStringType(f) and os.path.isdir(f)

The important bit behind the scenes is a registration of all format plugins upon
the first time an attempt is made to open a file (moduleinit). Every eligible
plugin must be in an accessible path and named *TestReaderPlugin.py. It will
get (dynamically) imported. Each plugin module has to call a register_open
to provide an ID, a method to create the file and an accept function to test
candidate files.

An example plugin will look like this:


import TestReader

def _accept(filename):
    fp=open(filename,"r")
    # we made it here, so let's just accept this format
    return fp

class exampleTestReader(TestReader.Reader):
    format='example'

    def _open(self):
        self.data = self.fp.read()

TestReader.register_open('example', exampleTestReader, accept=_accept)

TestReader.open() is the function that a user will use:

import TestReader
a=TestReader.open(filename) # easy

So - where is the problem? Firstly, I'm still on the search for the pythonic
way. Is this it? My reasons to doubt it is that the magic in the moduleinit
stage looks messy. It is copied straight from PIL. Main problem: If you
reload(TestReader), it will all stop working because ID gets initialized to [],
but the plugins will not get reloaded.

Are there better ways to set up a generic reader that
1. allows a simple open(filename) call for all formats and
2. requires only nicely encapsulated plugins to be provided for whatever format you want.
3. works on reloads?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

〆凄凉。 2024-11-05 09:01:04

一些准则:

  1. 使用“查看”缓冲区的概念来测试是否存在您可以理解的数据。
  2. 用户不想知道进口商的名称(如果您有 100 个进口商怎么办)
    使用“facade”接口 medicimage.open(filepath)
  3. 要进行重新加载,您必须实现一点逻辑,有一些关于如何实现这一点的示例

Some guidelines:

  1. Use the concept of "peek" into a buffer to test if there is data data you could understand.
  2. Knowing the name of the importer is something a user does not want to know (what if you have 100 importers)
    use a "facade" interface medicimage.open(filepath)
  3. To work on reload you have to implement a little bit of logic, there are exaples out there on how to achieve that
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文