MASM中如何判断一个值是字符还是数字

发布于 2025-01-16 15:23:34 字数 2032 浏览 4 评论 0 原文

我无法弄清楚如何确定 MASM 汇编语言中的值是数字还是字母。该程序应该遍历数组并显示在数组中找到的第一个数字，并将其与找到的索引一起打印。我正在使用 Irvine32.inc 库，其中包含 IsDigit 但由于某种原因它无法工作，我不知道为什么。

这是代码：

TITLE Number Finder

INCLUDE Irvine32.inc

.data
AlphaNumeric SDWORD 'A', 'p', 'Q', 'M', 67d, -3d, 74d, 'G', 'W', 92d
Alphabetical DWORD 'A', 'B', 'C', 'D', 'E'
Numeric      DWORD  0, 1, 2, 3, 4, 5, 6
index        DWORD  ?
valueFound   BYTE "number found: ", 0
atIndex      BYTE "at index: ", 0
noValueFound BYTE "no numeric found", 0
spacing      BYTE ", ", 0

;DOESNT WORK CORRECTLY
;SKIPS the value 67

.code
main PROC
mov esi, OFFSET AlphaNumeric    ;point to start of array
mov ecx, LENGTHOF AlphaNumeric  ;set loop counter
mov index, 0

mov eax, 0                      ; clear eax

L1: mov al, [esi]
    call IsDigit                    ; ZF = 1 -> valid digit , ZF = 0 -> not a valid digit

;jmp if digit
jz NUMBER_FOUND 

;jmp if char
jnz CHARACTER

;this probably never gets reached
inc index
add esi, TYPE AlphaNumeric
loop L1

;if loop finishes without finding a number
jmp NUMBER_NOT_FOUND

;next iteration of loop if val is a char
CHARACTER:
add esi, TYPE AlphaNumeric
add index, 1
loop L1

NUMBER_FOUND:
mov edx, OFFSET valueFound
call WriteString                ; prints "number found"
mov eax, [esi]
call WriteInt                   ; prints the number found
mov edx, OFFSET spacing
call WriteString
mov edx, OFFSET atIndex
call WriteString                ; prints "at index: "
mov eax, index
call WriteDec                   ; prints the index value

;jmp to NEXT to skip NUMBER_NOT_FOUND block
jmp NEXT

NUMBER_NOT_FOUND:
mov edx, OFFSET noValueFound
call WriteString

NEXT:

exit
main ENDP
END main

当我调试它时，当它获得处理值 67d 的循环迭代时，它会将 43 加载到 al 中，这是它的十六进制表示形式，但由于 43h 与 ASCII 值“C”对齐，因此假设 call IsDigit 将其处理为字母而不是数字。它还会跳过所有数字并打印“找到的数字：+65，索引处：10”，这根本不应该发生。是否可以使用某种操作将十六进制值转换为十进制值，以便 IsDigit 调用正常工作？因此，如果有人能解释一种评估数组中的值是数字还是字母、大写和小写的方法，我们将非常感激。

原文

I'm having trouble with figuring out how to determine if a value is a number or letter in MASM assembly language. This program should go through and array and display the first number found in an array and print it along with the index it was found at. I'm using the Irvine32.inc library which contains IsDigit but for some reason it isn't working and I don't know why.

Here's the code:

TITLE Number Finder

INCLUDE Irvine32.inc

.data
AlphaNumeric SDWORD 'A', 'p', 'Q', 'M', 67d, -3d, 74d, 'G', 'W', 92d
Alphabetical DWORD 'A', 'B', 'C', 'D', 'E'
Numeric      DWORD  0, 1, 2, 3, 4, 5, 6
index        DWORD  ?
valueFound   BYTE "number found: ", 0
atIndex      BYTE "at index: ", 0
noValueFound BYTE "no numeric found", 0
spacing      BYTE ", ", 0

;DOESNT WORK CORRECTLY
;SKIPS the value 67

.code
main PROC
mov esi, OFFSET AlphaNumeric    ;point to start of array
mov ecx, LENGTHOF AlphaNumeric  ;set loop counter
mov index, 0

mov eax, 0                      ; clear eax

L1: mov al, [esi]
    call IsDigit                    ; ZF = 1 -> valid digit , ZF = 0 -> not a valid digit

;jmp if digit
jz NUMBER_FOUND 

;jmp if char
jnz CHARACTER

;this probably never gets reached
inc index
add esi, TYPE AlphaNumeric
loop L1

;if loop finishes without finding a number
jmp NUMBER_NOT_FOUND

;next iteration of loop if val is a char
CHARACTER:
add esi, TYPE AlphaNumeric
add index, 1
loop L1

NUMBER_FOUND:
mov edx, OFFSET valueFound
call WriteString                ; prints "number found"
mov eax, [esi]
call WriteInt                   ; prints the number found
mov edx, OFFSET spacing
call WriteString
mov edx, OFFSET atIndex
call WriteString                ; prints "at index: "
mov eax, index
call WriteDec                   ; prints the index value

;jmp to NEXT to skip NUMBER_NOT_FOUND block
jmp NEXT

NUMBER_NOT_FOUND:
mov edx, OFFSET noValueFound
call WriteString

NEXT:

exit
main ENDP
END main

When I debug it, when it gets the the loop iteration where it processes the value 67d it load 43 into al which is its hex representation but since 43h lines up with the ASCII value 'C' is assuming that call IsDigit processes this as a letter and not a number. It also skips all numbers and will print "Number found: +65, at index: 10" which shouldn't even happen. Is there an operation I can use to convert the hex value to the decimal value for the IsDigit call to work correctly? So if someone could please explain a way to evaluate if a value in an array is either a number or letter, capital and lowercase, that would be very much appreciated.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

開玄 2025-01-23 15:23:34

这是一项不可能完成的任务。您最多能做的就是检查不是字母字符 ASCII 代码的数字 (https://asciitable.com/），这就是您的代码的作用。索引 5 是这种情况的第一个字节。

67（十进制）与'C' 是相同的字节值。一旦它在 .data 部分中组装成二进制字节，它们就是相同的单字节。因此你无法知道它是如何在源代码中编写的； db 67, 'C' 与 db 'C', 67 是同一对字节。它是大写 ASCII 代码范围内的数字。在源代码中写入相同值的另一种等效方法是43h。

字节没有与之关联的类型，只有表示值的 8 位位模式。相同位的不同解释可能是不同的值，例如 -3（有符号）和 253（无符号）均由位模式 0b11111101 是 0xfd。所有这些都是写入由程序加载到 AL 中的值的有效方法。计算机中的数字是二进制的；十六进制和十进制对于人类来说只是方便的格式，因此调试器将二进制值转换为 ASCII 数字字符串以供显示。作为字符值，它还表示某些 8 位字符集中的字体字形。

如果您的程序不单独跟踪类型，则该信息不可恢复。

通常，您编写程序时会知道整个数组保存 8 位数字，或保存 ASCII 代码，就像在 C 中您有采用 int8_t* 或 char* 的函数一样，尽管它们是相同的实际类型，但对于人类程序员来说它们具有不同的语义含义。或者另一个例子是 int* 与 char*；您当然可以将 int 数组的字节视为字符数据（其中许多字符是 '\0' 或 '\xff' 表示小的正/负整数值），但是您不会尝试通过查看字节值来找出它。 Python 和 Perl 等高级语言将类型与每个对象一起存储，例如 struct { enum type;联合{东西}； }，有许多类型，例如包含指针的字符串。

回复：实现 IsAlpha 函数：请参阅^= 32 背后的想法是什么，它将小写字母转换为大写字母和副字母反之亦然？ - 只需几条指令即可。

;; input in DL, unmodified
IsAlpha:
    mov     eax, edx
    or      al, 0x20  ; force to lower case if it wasn't already
    sub     al, 'a'
    cmp     al, 25    ; 'z'-'a' = index of the last letter in the alphabet
      ; setbe al      ; for a boolean 0/1 return value in AL
    ret
;; return in FLAGS: ja non_alpha    or   jbe  alphabetic

This is an impossible task. The most you can do is check for numbers that aren't the ASCII code for an alphabetic character (https://asciitable.com/), which is what your code does. Index 5 is the first byte where that's the case.

67 (decimal) is the same byte value as 'C'. Once it's assembled into binary bytes in your .data section, they're the same single byte. Thus there's no way you can tell how it was written in the source; db 67, 'C' is the same pair of bytes as db 'C', 67. It's a number that's in the range of upper-case ASCII codes. Another equivalent way to write the same value in the source is 43h.

Bytes don't have types associated with them, just the 8-bit bit-pattern which represents a value. Different interpretations of the same bits could be different values, e.g. -3 (signed) and 253 (unsigned) are both represented by the bit-pattern 0b11111101 which is 0xfd. All of those are valid ways of writing the value that gets loaded into AL by your program. Numbers in a computer are binary; hex and decimal are just convenient formats for humans, so debuggers convert binary values into strings of ASCII digits for display.
As a character value, it also represents a font glyph in some 8-bit character sets.

If your program doesn't keep track of types separately, that info is not recoverable.

Normally you write programs to know that a whole array holds 8-bit numbers, or holds ASCII codes, just like in C you have functions that take int8_t* or char*, even though those are the same actual type, they have different semantic meaning for human programmers. Or another example would be int* vs. char*; you certainly could look at the bytes of an int array as character data (with many of the characters being '\0' or '\xff' for small positive / negative integer values), but you don't try to figure it out by looking at the byte values. Higher-level languages like Python and Perl store a type along with each object, like a struct { enum type; union { stuff }; }, with many types like a string including a pointer.

Re: implementing an IsAlpha function: See What is the idea behind ^= 32, that converts lowercase letters to upper and vice versa? - it only takes a few instructions.

;; input in DL, unmodified
IsAlpha:
    mov     eax, edx
    or      al, 0x20  ; force to lower case if it wasn't already
    sub     al, 'a'
    cmp     al, 25    ; 'z'-'a' = index of the last letter in the alphabet
      ; setbe al      ; for a boolean 0/1 return value in AL
    ret
;; return in FLAGS: ja non_alpha    or   jbe  alphabetic

回复收藏 0 原文

~没有更多了~