角色通过弦循环时被跳过
我正在制作一种爱好编程语言,而我的Lexer在阅读整数时存在问题。
这是当前字符在数字字符串列表中的代码:
integers = "1234567890"
elif currentChar in integers:
res = ""
while pos < length and src[pos] in integers:
print(src[pos])
res += src[pos]
pos += 1
column += 1
pos += 1
column += 1
tokens.append({"type": "INTEGER", "value": res})
如果您需要整个主Lexer函数,则是:
def tokenize(self):
tokens = []
pos = 0
line = 1
column = 1
src = self.src
length = len(src)
varChars = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ_"
integers = "1234567890"
KEYWORDS = ["print"]
while pos < length:
currentChar = src[pos]
if currentChar == " ":
pos += 1
column += 1
continue
elif currentChar == "\n":
line += 1
column = 0
pos += 1
continue
elif currentChar == '"':
pos += 1
column += 1
res = ""
while pos < length and src[pos] != '"':
res += src[pos]
pos += 1
column += 1
try:
if src[pos] != '"':
return [], f"Unterminated string at line {line}, column {column}"
except IndexError:
if src[pos - 1] != '"':
return [], f"Unterminated string at line {line}, column {column}"
pos += 1
column += 1
tokens.append({"type": "STRING", "value": res})
elif currentChar in varChars:
pos += 1
column += 1
res = currentChar
while pos < length and src[pos] in varChars:
res += src[pos]
pos += 1
column += 1
if res not in KEYWORDS:
tokens.append({"type": "VARIABLE_NAME", "value": res})
elif res in KEYWORDS:
tokens.append({"type": "KEYWORD", "value": res})
elif currentChar == "=":
pos += 1
column += 1
tokens.append({"type": "OPERATOR", "value": currentChar})
elif currentChar in integers:
res = ""
while pos < length and src[pos] in integers:
print(src[pos])
res += src[pos]
pos += 1
column += 1
tokens.append({"type": "INTEGER", "value": res})
elif currentChar == "(":
pos += 1
column += 1
tokens.append({"type": "OPEN_PAREN", "value": currentChar})
elif currentChar == ")":
pos += 1
column += 1
tokens.append({"type": "CLOSE_PAREN", "value": currentChar})
elif currentChar == ";":
res = ""
pos += 1
column += 1
while pos < length and src[pos] != "\n":
res += src[pos]
pos += 1
column += 1
pos += 1
column += 1
tokens.append({"type": "COMMENT", "value": res})
else:
return [], f"Unexpected character {currentChar} at line {line}, column {column}"
PS:POS:POS是SRC中的当前索引,而SRC是代码。
当我最终到达解析器的结尾时,它说我缺少一个角色,在数字结束后总是是角色。
例如:
print(10)
在此代码中,闭合括号将由Lexer跳过。
任何帮助将不胜感激!
I'm making a hobby programming language and there's an issue with my lexer when its reading an integer.
Here is the code for when the current character is in a string list of numbers:
integers = "1234567890"
elif currentChar in integers:
res = ""
while pos < length and src[pos] in integers:
print(src[pos])
res += src[pos]
pos += 1
column += 1
pos += 1
column += 1
tokens.append({"type": "INTEGER", "value": res})
If you need the entire main lexer function here it is:
def tokenize(self):
tokens = []
pos = 0
line = 1
column = 1
src = self.src
length = len(src)
varChars = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ_"
integers = "1234567890"
KEYWORDS = ["print"]
while pos < length:
currentChar = src[pos]
if currentChar == " ":
pos += 1
column += 1
continue
elif currentChar == "\n":
line += 1
column = 0
pos += 1
continue
elif currentChar == '"':
pos += 1
column += 1
res = ""
while pos < length and src[pos] != '"':
res += src[pos]
pos += 1
column += 1
try:
if src[pos] != '"':
return [], f"Unterminated string at line {line}, column {column}"
except IndexError:
if src[pos - 1] != '"':
return [], f"Unterminated string at line {line}, column {column}"
pos += 1
column += 1
tokens.append({"type": "STRING", "value": res})
elif currentChar in varChars:
pos += 1
column += 1
res = currentChar
while pos < length and src[pos] in varChars:
res += src[pos]
pos += 1
column += 1
if res not in KEYWORDS:
tokens.append({"type": "VARIABLE_NAME", "value": res})
elif res in KEYWORDS:
tokens.append({"type": "KEYWORD", "value": res})
elif currentChar == "=":
pos += 1
column += 1
tokens.append({"type": "OPERATOR", "value": currentChar})
elif currentChar in integers:
res = ""
while pos < length and src[pos] in integers:
print(src[pos])
res += src[pos]
pos += 1
column += 1
tokens.append({"type": "INTEGER", "value": res})
elif currentChar == "(":
pos += 1
column += 1
tokens.append({"type": "OPEN_PAREN", "value": currentChar})
elif currentChar == ")":
pos += 1
column += 1
tokens.append({"type": "CLOSE_PAREN", "value": currentChar})
elif currentChar == ";":
res = ""
pos += 1
column += 1
while pos < length and src[pos] != "\n":
res += src[pos]
pos += 1
column += 1
pos += 1
column += 1
tokens.append({"type": "COMMENT", "value": res})
else:
return [], f"Unexpected character {currentChar} at line {line}, column {column}"
P.S: pos is the current index in the src, and src is the code.
When i eventually reach the end of my parser it says that I'm missing a character, always being the character after the end of a number.
e.g:
print(10)
in this code the closing parenthesis would be skipped over by the lexer.
Any help would be appreciated!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您会意外地在循环外再次递增自己的位置。
当您的Lexer进入该循环的最后一次迭代时,它会读取最终数字,将其递增,然后再次递增,然后再跳过角色。
You're accidentally incrementing your position again outside of your loop.
As your lexer enters it's last iteration of this loop, it reads the final digit, increments it's position, and then increments it again, thus skipping the character afterward.