Python 中的 ungetc

发布于 2024-08-29 08:30:32 字数 310 浏览 6 评论 0原文

Python中的一些文件读取(readlines())函数
将文件内容复制到内存(作为列表)

我需要处理一个太大的文件
被复制到内存中,因此需要使用
文件指针(用于访问文件一个字节
一次)——如 C getc() 中那样。

我的额外要求是
我想将文件指针倒回到上一个
类似于 C ungetc() 中的字节。

有没有办法在 Python 中做到这一点?

另外,在Python中,我可以在
处读取一行 使用 readline() 的时间

有没有办法读取上一行
倒退?

Some file read (readlines()) functions in Python
copy the file contents to memory (as a list)

I need to process a file that's too large to
be copied in memory and as such need to use
a file pointer (to access the file one byte
at a time) -- as in C getc().

The additional requirement I have is that
I'd like to rewind the file pointer to previous
bytes like in C ungetc().

Is there a way to do this in Python?

Also, in Python, I can read one line at a
time with readline()

Is there a way to read the previous line
going backward?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(6

十秒萌定你 2024-09-05 08:30:32
  • 您不需要文件指针,Python 没有或不需要文件指针。

  • 要逐行浏览文件而不将整个文件读入内存,只需迭代文件对象本身,即

     with open(filename, "r") as f:
         对于 f 中的行:
             ...
    

    通常要避免使用readlines

  • 返回一行并不是一件非常容易的事情。如果您不需要返回超过一行,请查看 itertools 文档

  • You do not need file pointers, which Python does not have or want.

  • To go through a file line by line without reading the whole thing into memory, just iterate over the file object itself, i.e.

     with open(filename, "r") as f:
         for line in f:
             ...
    

    Using readlines is generally to be avoided.

  • Going back a line isn't something you can do super-easily. If you never need to go back more than one line, check out the pairwise recipe in the itertools documentation.

猫弦 2024-09-05 08:30:32

好的,这就是我的想法。感谢布伦达提出建立班级的想法。
感谢 Josh 提出使用类似 C 语言的函数eek() 和 read() 的想法

#!/bin/python

# Usage: BufRead.py inputfile

import sys, os, string
from inspect import currentframe

# Debug function usage
#
# if DEBUG:
#   debugLogMsg(currentframe().f_lineno,currentframe().f_code.co_filename)
#   print ...
def debugLogMsg(line,file,msg=""):
  print "%s:%s %s" % (file,line,msg)

# Set DEBUG off.
DEBUG = 0

class BufRead: 
  def __init__(self,filename): 
    self.__filename           = filename 
    self.__file               = open(self.__filename,'rb') 
    self.__fileposition       = self.__file.tell()
    self.__file.seek(0, os.SEEK_END)
    self.__filesize           = self.__file.tell() 
    self.__file.seek(self.__fileposition, os.SEEK_SET) 

  def close(self): 
    if self.__file is not None: 
      self.__file.close() 
      self.__file = None 

  def seekstart(self):
    if self.__file == None:
      self.__file.seek(0, os.SEEK_SET)
      self.__fileposition = self.__file.tell()

  def seekend(self):
    if self.__file == None:
      self.__file.seek(-1, os.SEEK_END)
      self.__fileposition = self.__file.tell()

  def getc(self):
    if self.__file == None:
      return None 
    self.__fileposition = self.__file.tell()
    if self.__fileposition < self.__filesize:
      byte = self.__file.read(1)
      self.__fileposition = self.__file.tell()
      return byte
    else:
      return None

  def ungetc(self):
    if self.__file == None:
      return None 
    self.__fileposition = self.__file.tell()
    if self.__fileposition > 0:
      self.__fileposition = self.__fileposition - 1
      self.__file.seek(self.__fileposition, os.SEEK_SET)
      byte = self.__file.read(1)
      self.__file.seek(self.__fileposition, os.SEEK_SET)
      return byte
    else:
      return None

  # uses getc() and ungetc()
  def getline(self):
    if self.__file == None:
      return None 
    self.__fileposition = self.__file.tell()

    if self.__fileposition < self.__filesize:
      startOfLine = False
      line = ""

      while True:
        if self.__fileposition == 0:
          startOfLine = True
          break
        else:
          c = self.ungetc()
          if c == '\n':
            c = self.getc()
            startOfLine = True
            break

      if startOfLine:
        c = self.getc()
        if c == '\n':
          return '\n'
        else:
          self.ungetc()

        while True:
          c = self.getc()
          if c == '\n':
            line += c
            c = self.getc()
            if c == None:
              return line
            if c == '\n':
              self.ungetc()
            return line
          elif c == None:
            return line
          else:
            line += c
    else:
      return None

  # uses getc() and ungetc()
  def ungetline(self):
    if self.__file == None:
      return None 
    self.__fileposition = self.__file.tell()

    if self.__fileposition > 0:
      endOfLine = False
      line = ""

      while True:
        if self.__fileposition == self.__filesize:
          endOfLine = True
          break
        else:
          c = self.getc()
          if c == '\n':
            c = self.ungetc()
            endOfLine = True
            break

      if endOfLine:
        c = self.ungetc()
        if c == '\n':
          return '\n'
        else:
          self.getc()

        while True:
          c = self.ungetc()
          if c == None:
            return line
          if c == '\n':
            line += c
            c = self.ungetc()
            if c == None:
              return line
            if c == '\n':
              self.getc()
            return line
          elif c == None:
            return line
          else:
            line = c + line
    else:
      return None

def main():
  if len(sys.argv) == 2:
    print sys.argv[1]
    b = BufRead(sys.argv[1])

    sys.stdout.write(
      '----------------------------------\n' \
      '- TESTING GETC                    \n' \
      '----------------------------------\n')

    while True:
      c = b.getc()
      if c == None: 
        sys.stdout.write('\n')
        break
      else:
        sys.stdout.write(c)

    sys.stdout.write(
      '----------------------------------\n' \
      '- TESTING UNGETC                  \n' \
      '----------------------------------\n')

    while True:
      c = b.ungetc()
      if c == None: 
        sys.stdout.write('\n')
        break
      else:
        sys.stdout.write(c)

    sys.stdout.write(
      '----------------------------------\n' \
      '- TESTING GETLINE                 \n' \
      '----------------------------------\n')

    b.seekstart()

    while True:
      line = b.getline()
      if line == None:
        sys.stdout.write('\n')
        break
      else:
        sys.stdout.write(line)

    sys.stdout.write(
      '----------------------------------\n' \
      '- TESTING UNGETLINE               \n' \
      '----------------------------------\n')

    b.seekend()

    while True:
      line = b.ungetline()
      if line == None:
        sys.stdout.write('\n')
        break
      else:
        sys.stdout.write(line)

    b.close()

if __name__=="__main__": main()

OK, here's what I came up with. Thanks Brenda for the idea of building a class.
Thanks Josh for the idea to use C like functions seek() and read()

#!/bin/python

# Usage: BufRead.py inputfile

import sys, os, string
from inspect import currentframe

# Debug function usage
#
# if DEBUG:
#   debugLogMsg(currentframe().f_lineno,currentframe().f_code.co_filename)
#   print ...
def debugLogMsg(line,file,msg=""):
  print "%s:%s %s" % (file,line,msg)

# Set DEBUG off.
DEBUG = 0

class BufRead: 
  def __init__(self,filename): 
    self.__filename           = filename 
    self.__file               = open(self.__filename,'rb') 
    self.__fileposition       = self.__file.tell()
    self.__file.seek(0, os.SEEK_END)
    self.__filesize           = self.__file.tell() 
    self.__file.seek(self.__fileposition, os.SEEK_SET) 

  def close(self): 
    if self.__file is not None: 
      self.__file.close() 
      self.__file = None 

  def seekstart(self):
    if self.__file == None:
      self.__file.seek(0, os.SEEK_SET)
      self.__fileposition = self.__file.tell()

  def seekend(self):
    if self.__file == None:
      self.__file.seek(-1, os.SEEK_END)
      self.__fileposition = self.__file.tell()

  def getc(self):
    if self.__file == None:
      return None 
    self.__fileposition = self.__file.tell()
    if self.__fileposition < self.__filesize:
      byte = self.__file.read(1)
      self.__fileposition = self.__file.tell()
      return byte
    else:
      return None

  def ungetc(self):
    if self.__file == None:
      return None 
    self.__fileposition = self.__file.tell()
    if self.__fileposition > 0:
      self.__fileposition = self.__fileposition - 1
      self.__file.seek(self.__fileposition, os.SEEK_SET)
      byte = self.__file.read(1)
      self.__file.seek(self.__fileposition, os.SEEK_SET)
      return byte
    else:
      return None

  # uses getc() and ungetc()
  def getline(self):
    if self.__file == None:
      return None 
    self.__fileposition = self.__file.tell()

    if self.__fileposition < self.__filesize:
      startOfLine = False
      line = ""

      while True:
        if self.__fileposition == 0:
          startOfLine = True
          break
        else:
          c = self.ungetc()
          if c == '\n':
            c = self.getc()
            startOfLine = True
            break

      if startOfLine:
        c = self.getc()
        if c == '\n':
          return '\n'
        else:
          self.ungetc()

        while True:
          c = self.getc()
          if c == '\n':
            line += c
            c = self.getc()
            if c == None:
              return line
            if c == '\n':
              self.ungetc()
            return line
          elif c == None:
            return line
          else:
            line += c
    else:
      return None

  # uses getc() and ungetc()
  def ungetline(self):
    if self.__file == None:
      return None 
    self.__fileposition = self.__file.tell()

    if self.__fileposition > 0:
      endOfLine = False
      line = ""

      while True:
        if self.__fileposition == self.__filesize:
          endOfLine = True
          break
        else:
          c = self.getc()
          if c == '\n':
            c = self.ungetc()
            endOfLine = True
            break

      if endOfLine:
        c = self.ungetc()
        if c == '\n':
          return '\n'
        else:
          self.getc()

        while True:
          c = self.ungetc()
          if c == None:
            return line
          if c == '\n':
            line += c
            c = self.ungetc()
            if c == None:
              return line
            if c == '\n':
              self.getc()
            return line
          elif c == None:
            return line
          else:
            line = c + line
    else:
      return None

def main():
  if len(sys.argv) == 2:
    print sys.argv[1]
    b = BufRead(sys.argv[1])

    sys.stdout.write(
      '----------------------------------\n' \
      '- TESTING GETC                    \n' \
      '----------------------------------\n')

    while True:
      c = b.getc()
      if c == None: 
        sys.stdout.write('\n')
        break
      else:
        sys.stdout.write(c)

    sys.stdout.write(
      '----------------------------------\n' \
      '- TESTING UNGETC                  \n' \
      '----------------------------------\n')

    while True:
      c = b.ungetc()
      if c == None: 
        sys.stdout.write('\n')
        break
      else:
        sys.stdout.write(c)

    sys.stdout.write(
      '----------------------------------\n' \
      '- TESTING GETLINE                 \n' \
      '----------------------------------\n')

    b.seekstart()

    while True:
      line = b.getline()
      if line == None:
        sys.stdout.write('\n')
        break
      else:
        sys.stdout.write(line)

    sys.stdout.write(
      '----------------------------------\n' \
      '- TESTING UNGETLINE               \n' \
      '----------------------------------\n')

    b.seekend()

    while True:
      line = b.ungetline()
      if line == None:
        sys.stdout.write('\n')
        break
      else:
        sys.stdout.write(line)

    b.close()

if __name__=="__main__": main()
黯然#的苍凉 2024-09-05 08:30:32

如果您确实想直接使用文件指针(不过我认为 Mike Graham 的建议更好),您可以使用文件对象的 seek() 方法,可让您设置内部指针,与 read() 方法,该方法支持指定要读取的字节数的选项参数。

If you do want to use a file pointer directly (I think Mike Graham's suggestion is better though), you can use the file object's seek() method which lets you set the internal pointer, combined with the read() method, which support an option argument specifying how many bytes you'd like to read.

小姐丶请自重 2024-09-05 08:30:32

为您编写一个读取和缓冲输入的类,并在其上实现 ungetc —— 可能是这样的(警告:未经测试,在编译时编写):

class BufRead:
    def __init__(self,filename):
        self.filename = filename
        self.fn = open(filename,'rb')
        self.buffer = []
        self.bufsize = 256
        self.ready = True
    def close(self):
        if self.fn is not None:
            self.fn.close()
            self.fn = None
        self.ready = False
    def read(self,size=1):
        l = len(self.buffer)
        if not self.ready: return None
        if l <= size:
            s = self.buffer[:size]
            self.buffer = self.buffer[size:]
            return s
        s = self.buffer
        size = size - l
        self.buffer = self.fn.read(min(self.bufsize,size))
        if self.buffer is None or len(self.buffer) == 0:
            self.ready = False
            return s
        return s + self.read(size)
    def ungetc(self,ch):
        if self.buffer is None:
            self.buffer = [ch]
        else:   
            self.buffer.append(ch)
        self.ready = True

Write a class the reads and buffers input for you, and implement ungetc on it -- something like this perhaps (warning: untested, written while compiling):

class BufRead:
    def __init__(self,filename):
        self.filename = filename
        self.fn = open(filename,'rb')
        self.buffer = []
        self.bufsize = 256
        self.ready = True
    def close(self):
        if self.fn is not None:
            self.fn.close()
            self.fn = None
        self.ready = False
    def read(self,size=1):
        l = len(self.buffer)
        if not self.ready: return None
        if l <= size:
            s = self.buffer[:size]
            self.buffer = self.buffer[size:]
            return s
        s = self.buffer
        size = size - l
        self.buffer = self.fn.read(min(self.bufsize,size))
        if self.buffer is None or len(self.buffer) == 0:
            self.ready = False
            return s
        return s + self.read(size)
    def ungetc(self,ch):
        if self.buffer is None:
            self.buffer = [ch]
        else:   
            self.buffer.append(ch)
        self.ready = True
塔塔猫 2024-09-05 08:30:32

我不想进行数十亿次无缓冲的单个字符文件读取,而且我想要一种方法
调试文件指针的位置。因此,我决定返回文件位置
除了字符或行之外,还使用 ​​mmap 将文件映射到内存。 (并让 mmap
处理分页)我认为如果文件真的很大,这会是一个问题。
(如大于物理内存量)此时 mmap 将开始进入
虚拟内存和事情可能会变得非常慢。目前,它处理一个 50 MB 的文件大约需要 4 分钟。

import sys, os, string, re, time
from mmap import mmap

class StreamReaderDb: 
  def __init__(self,stream):
    self.__stream             = mmap(stream.fileno(), os.path.getsize(stream.name))  
    self.__streamPosition     = self.__stream.tell()
    self.__stream.seek(0                    , os.SEEK_END)
    self.__streamSize         = self.__stream.tell() 
    self.__stream.seek(self.__streamPosition, os.SEEK_SET) 

  def setStreamPositionDb(self,streamPosition):
    if self.__stream == None:
      return None 
    self.__streamPosition = streamPosition
    self.__stream.seek(self.__streamPosition, os.SEEK_SET)

  def streamPositionDb(self):
    if self.__stream == None:
      return None 
    return self.__streamPosition

  def streamSize(self):
    if self.__stream == None:
      return None 
    return self.__streamSize

  def close(self): 
    if self.__stream is not None: 
      self.__stream.close() 
      self.__stream = None 

  def seekStart(self):
    if self.__stream == None:
      return None 
    self.setStreamPositionDb(0)

  def seekEnd(self):
    if self.__stream == None:
      return None 
    self.__stream.seek(-1, os.SEEK_END)
    self.setStreamPositionDb(self.__stream.tell())

  def getcDb(self):
    if self.__stream == None:
      return None,None 
    self.setStreamPositionDb(self.__stream.tell())
    if self.streamPositionDb() < self.streamSize():
      byte = self.__stream.read(1)
      self.setStreamPositionDb(self.__stream.tell())
      return byte,self.streamPositionDb()
    else:
      return None,self.streamPositionDb()

  def unGetcDb(self):
    if self.__stream == None:
      return None,None 
    self.setStreamPositionDb(self.__stream.tell())
    if self.streamPositionDb() > 0:
      self.setStreamPositionDb(self.streamPositionDb() - 1)
      byte = self.__stream.read(1)
      self.__stream.seek(self.streamPositionDb(), os.SEEK_SET)
      return byte,self.streamPositionDb()
    else:
      return None,self.streamPositionDb()

  def seekLineStartDb(self):
    if self.__stream == None:
      return None
    self.setStreamPositionDb(self.__stream.tell())

    if self.streamPositionDb() < self.streamSize():
      # Back up to the start of the line
      while True:
        if self.streamPositionDb() == 0:
          return self.streamPositionDb() 
        else:
          c,fp = self.unGetcDb()
          if c == '\n':
            c,fp = self.getcDb()
            return fp
    else:
      return None

  def seekPrevLineEndDb(self):
    if self.__stream == None:
      return None
    self.setStreamPositionDb(self.__stream.tell())

    if self.streamPositionDb() < self.streamSize():
      # Back up to the start of the line
      while True:
        if self.streamPositionDb() == 0:
          return self.streamPositionDb() 
        else:
          c,fp = self.unGetcDb()
          if c == '\n':
            return fp
    else:
      return None

  def seekPrevLineStartDb(self):
    if self.__stream == None:
      return None
    self.setStreamPositionDb(self.__stream.tell())

    if self.streamPositionDb() < self.streamSize():
      # Back up to the start of the line
      while True:
        if self.streamPositionDb() == 0:
          return self.streamPositionDb() 
        else:
          c,fp = self.unGetcDb()
          if c == '\n':
            return self.seekLineStartDb()
    else:
      return None

  def seekLineEndDb(self):
    if self.__stream == None:
      return None 
    self.setStreamPositionDb(self.__stream.tell())

    if self.streamPositionDb() > 0:
      while True:
        if self.streamPositionDb() == self.streamSize():
          return self.streamPositionDb()
        else:
          c,fp = self.getcDb()
          if c == '\n':
            c,fp = self.unGetcDb()
            return fp
    else:
      return None

  def seekNextLineEndDb(self):
    if self.__stream == None:
      return None 
    self.setStreamPositionDb(self.__stream.tell())

    if self.streamPositionDb() > 0:
      while True:
        if self.streamPositionDb() == self.streamSize():
          return self.streamPositionDb()
        else:
          c,fp = self.getcDb()
          if c == '\n':
            return fp
    else:
      return None

  def seekNextLineStartDb(self):
    if self.__stream == None:
      return None 
    self.setStreamPositionDb(self.__stream.tell())

    if self.streamPositionDb() > 0:
      while True:
        if self.streamPositionDb() == self.streamSize():
          return self.streamPositionDb()
        else:
          c,fp = self.getcDb()
          if c == '\n':
            return self.seekLineStartDb()
    else:
      return None

  # uses getc() and ungetc()
  def getLineDb(self):
    if self.__stream == None:
      return None,None 
    self.setStreamPositionDb(self.__stream.tell())

    line = ""

    if self.seekLineStartDb() != None:
      c,fp = self.getcDb()
      if c == '\n':
        return c,self.streamPositionDb()
      else:
        self.unGetcDb()

      while True:
        c,fp = self.getcDb()
        if c == '\n':
          line += c
          c,fp = self.getcDb()
          if c == None:
            return line,self.streamPositionDb()
          self.unGetcDb()
          return line,self.streamPositionDb()
        elif c == None:
          return line,self.streamPositionDb()
        else:
          line += c
    else:
      return None,self.streamPositionDb()

  # uses getc() and ungetc()
  def unGetLineDb(self):
    if self.__stream == None:
      return None,None 
    self.setStreamPositionDb(self.__stream.tell())

    line = ""

    if self.seekLineEndDb() != None:
      c,fp = self.unGetcDb()
      if c == '\n':
        return c,self.streamPositionDb()
      else:
        self.getcDb()

      while True:
        c,fp = self.unGetcDb()
        if c == None:
          return line,self.streamPositionDb()
        if c == '\n':
          line += c
          c,fp = self.unGetcDb()
          if c == None:
            return line,self.streamPositionDb()
          self.getcDb()
          return line,self.streamPositionDb()
        elif c == None:
          return line,self.streamPositionDb()
        else:
          line = c + line
    else:
      return None,self.streamPositionDb()

I don't want to do billions of unbuffered single char file reads plus I wanted a way
to debug the position of the file pointer. Hence, I resolved to return the file position
in addition to a char or line and to use mmap to map the file to memory. (and let mmap
handle paging) I think this will be a bit of a problem if the file is really, really big.
(as in larger than the amount of physical memory) That's when mmap would start going into
the virtual memory and things could get really slow. For now, it processes a 50 MB file in about 4 min.

import sys, os, string, re, time
from mmap import mmap

class StreamReaderDb: 
  def __init__(self,stream):
    self.__stream             = mmap(stream.fileno(), os.path.getsize(stream.name))  
    self.__streamPosition     = self.__stream.tell()
    self.__stream.seek(0                    , os.SEEK_END)
    self.__streamSize         = self.__stream.tell() 
    self.__stream.seek(self.__streamPosition, os.SEEK_SET) 

  def setStreamPositionDb(self,streamPosition):
    if self.__stream == None:
      return None 
    self.__streamPosition = streamPosition
    self.__stream.seek(self.__streamPosition, os.SEEK_SET)

  def streamPositionDb(self):
    if self.__stream == None:
      return None 
    return self.__streamPosition

  def streamSize(self):
    if self.__stream == None:
      return None 
    return self.__streamSize

  def close(self): 
    if self.__stream is not None: 
      self.__stream.close() 
      self.__stream = None 

  def seekStart(self):
    if self.__stream == None:
      return None 
    self.setStreamPositionDb(0)

  def seekEnd(self):
    if self.__stream == None:
      return None 
    self.__stream.seek(-1, os.SEEK_END)
    self.setStreamPositionDb(self.__stream.tell())

  def getcDb(self):
    if self.__stream == None:
      return None,None 
    self.setStreamPositionDb(self.__stream.tell())
    if self.streamPositionDb() < self.streamSize():
      byte = self.__stream.read(1)
      self.setStreamPositionDb(self.__stream.tell())
      return byte,self.streamPositionDb()
    else:
      return None,self.streamPositionDb()

  def unGetcDb(self):
    if self.__stream == None:
      return None,None 
    self.setStreamPositionDb(self.__stream.tell())
    if self.streamPositionDb() > 0:
      self.setStreamPositionDb(self.streamPositionDb() - 1)
      byte = self.__stream.read(1)
      self.__stream.seek(self.streamPositionDb(), os.SEEK_SET)
      return byte,self.streamPositionDb()
    else:
      return None,self.streamPositionDb()

  def seekLineStartDb(self):
    if self.__stream == None:
      return None
    self.setStreamPositionDb(self.__stream.tell())

    if self.streamPositionDb() < self.streamSize():
      # Back up to the start of the line
      while True:
        if self.streamPositionDb() == 0:
          return self.streamPositionDb() 
        else:
          c,fp = self.unGetcDb()
          if c == '\n':
            c,fp = self.getcDb()
            return fp
    else:
      return None

  def seekPrevLineEndDb(self):
    if self.__stream == None:
      return None
    self.setStreamPositionDb(self.__stream.tell())

    if self.streamPositionDb() < self.streamSize():
      # Back up to the start of the line
      while True:
        if self.streamPositionDb() == 0:
          return self.streamPositionDb() 
        else:
          c,fp = self.unGetcDb()
          if c == '\n':
            return fp
    else:
      return None

  def seekPrevLineStartDb(self):
    if self.__stream == None:
      return None
    self.setStreamPositionDb(self.__stream.tell())

    if self.streamPositionDb() < self.streamSize():
      # Back up to the start of the line
      while True:
        if self.streamPositionDb() == 0:
          return self.streamPositionDb() 
        else:
          c,fp = self.unGetcDb()
          if c == '\n':
            return self.seekLineStartDb()
    else:
      return None

  def seekLineEndDb(self):
    if self.__stream == None:
      return None 
    self.setStreamPositionDb(self.__stream.tell())

    if self.streamPositionDb() > 0:
      while True:
        if self.streamPositionDb() == self.streamSize():
          return self.streamPositionDb()
        else:
          c,fp = self.getcDb()
          if c == '\n':
            c,fp = self.unGetcDb()
            return fp
    else:
      return None

  def seekNextLineEndDb(self):
    if self.__stream == None:
      return None 
    self.setStreamPositionDb(self.__stream.tell())

    if self.streamPositionDb() > 0:
      while True:
        if self.streamPositionDb() == self.streamSize():
          return self.streamPositionDb()
        else:
          c,fp = self.getcDb()
          if c == '\n':
            return fp
    else:
      return None

  def seekNextLineStartDb(self):
    if self.__stream == None:
      return None 
    self.setStreamPositionDb(self.__stream.tell())

    if self.streamPositionDb() > 0:
      while True:
        if self.streamPositionDb() == self.streamSize():
          return self.streamPositionDb()
        else:
          c,fp = self.getcDb()
          if c == '\n':
            return self.seekLineStartDb()
    else:
      return None

  # uses getc() and ungetc()
  def getLineDb(self):
    if self.__stream == None:
      return None,None 
    self.setStreamPositionDb(self.__stream.tell())

    line = ""

    if self.seekLineStartDb() != None:
      c,fp = self.getcDb()
      if c == '\n':
        return c,self.streamPositionDb()
      else:
        self.unGetcDb()

      while True:
        c,fp = self.getcDb()
        if c == '\n':
          line += c
          c,fp = self.getcDb()
          if c == None:
            return line,self.streamPositionDb()
          self.unGetcDb()
          return line,self.streamPositionDb()
        elif c == None:
          return line,self.streamPositionDb()
        else:
          line += c
    else:
      return None,self.streamPositionDb()

  # uses getc() and ungetc()
  def unGetLineDb(self):
    if self.__stream == None:
      return None,None 
    self.setStreamPositionDb(self.__stream.tell())

    line = ""

    if self.seekLineEndDb() != None:
      c,fp = self.unGetcDb()
      if c == '\n':
        return c,self.streamPositionDb()
      else:
        self.getcDb()

      while True:
        c,fp = self.unGetcDb()
        if c == None:
          return line,self.streamPositionDb()
        if c == '\n':
          line += c
          c,fp = self.unGetcDb()
          if c == None:
            return line,self.streamPositionDb()
          self.getcDb()
          return line,self.streamPositionDb()
        elif c == None:
          return line,self.streamPositionDb()
        else:
          line = c + line
    else:
      return None,self.streamPositionDb()
烙印 2024-09-05 08:30:32

这个问题最初是由我需要构建一个词法分析器引起的。
getc() 和 ungetc() 一开始很有用(可以消除读取错误,并且
构建状态机)状态机完成后,
getc() 和 ungetc() 成为一种负担,因为它们读取时间太长
直接来自存储。

当状态机完成时(调试任何 IO 问题,
最终确定了状态),我优化了词法分析器。

将源文件以块(或页)形式读取到内存中并运行
每个页面上的状态机都会产生最佳时间结果。

我发现如果不使用 getc() 和 ungetc() 可以节省大量时间
直接从文件中读取。

The question was initially prompted by my need to build a lexical analyzer.
getc() and ungetc() are useful at first (to get the read bugs out the way and
to build the state machine) After the state machine is done,
getc() and ungetc() become a liability as they take too long to read
directly from storage.

When the state machine was complete (debugged any IO problems,
finalized the states), I optimized the lexical analyzer.

Reading the source file in chunks (or pages) into memory and running
the state machine on each page yields the best time result.

I found that considerable time is saved if getc() and ungetc() are not used
to read from the file directly.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文