这个 cobol 语法无法处理 --9 图片

发布于 2024-12-06 18:40:21 字数 1120 浏览 3 评论 0 原文

我在我的 javacc 中使用这个网站上的语法。除了一些图片说明之外,它工作得很好。例如 ----,---,---.99 或 --9。

http://mapage.noos.fr/~bpinon/cobol.jj

它没有似乎不喜欢超过一划线。

我需要对此进行哪些更改才能支持我的图片示例。

我搞砸了

void NumericConstant() :
{}
{
  (<PLUSCHAR>|<MINUSCHAR>)? IntegerConstant() [ <DOTCHAR> IntegerConstant() ]
} 

,但似乎没有任何效果。非常感谢任何帮助

编辑:

<COBOL_WORD: ((["0"-"9"])+ (<MINUSCHAR>)*)*
    (["0"-"9"])* ["a"-"z"] ( ["a"-"z","0"-"9"] )*
    ( (<MINUSCHAR>)+ (["a"-"z","0"-"9"])+)*
>

这是整行的正则表达式吗:

07 STRINGFIELD2 PIC AAAA. ??

如果我想接受 05 TEST3 REDEFINES TEST2 PIC X(10). 我会将正则表达式更改为:

)*)*
(<重新定义> (["0"-"9"])* ["a"-"z"] (["a"-"z","0"-"9"] )*)?
    (["0"-"9"])* ["a"-"z"] (["a"-"z","0"-"9"] )*
    ( ()+ (["a"-"z","0"-"9"])+)*

非常感谢迄今为止的帮助

I'm using the grammar on this site in my javacc. It works fine apart from some picture statements. For example ----,---,---.99 or --9.

http://mapage.noos.fr/~bpinon/cobol.jj

It doesn't seem to like more than one dash.

What do I need to change in this to support my picture examples.

I'v messed about with

void NumericConstant() :
{}
{
  (<PLUSCHAR>|<MINUSCHAR>)? IntegerConstant() [ <DOTCHAR> IntegerConstant() ]
} 

but nothing seems to be working. Any help is much appreciated

EDIT:

<COBOL_WORD: ((["0"-"9"])+ (<MINUSCHAR>)*)*
    (["0"-"9"])* ["a"-"z"] ( ["a"-"z","0"-"9"] )*
    ( (<MINUSCHAR>)+ (["a"-"z","0"-"9"])+)*
>

Is this the regular expression for this whole line:

07 STRINGFIELD2 PIC AAAA. ??

If I want to accept 05 TEST3 REDEFINES TEST2 PIC X(10). would I change the regex to be:

<COBOL_WORD: ((["0"-"9"])+ (<MINUSCHAR>)*)*
(<REDEFINES> (["0"-"9"])* ["a"-"z"] ( ["a"-"z","0"-"9"] )*)?
    (["0"-"9"])* ["a"-"z"] ( ["a"-"z","0"-"9"] )*
    ( (<MINUSCHAR>)+ (["a"-"z","0"-"9"])+)*

Thanks a lot for the help so far

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

顾北清歌寒 2024-12-13 18:40:21

当你试图解析一个时,为什么你要搞乱 NumericConstant()
COBOL 图片字符串?

根据您拥有的 JavaCC 源代码,COBOL PICTURE 应使用以下内容进行解析:

void DataPictureClause() :
{}
{
  ( <PICTURE> | <PIC> ) [ <IS> ] PictureString()
}

--9 位是图片字符串,应使用 PictureString() 函数进行解析:

void PictureString() :
{}
{
    [ PictureCurrency() ]
    ( ( PictureChars() )+ [ <LPARENCHAR> IntegerConstant() <RPARENCHAR> ] )+
    [ PicturePunctuation() ( ( PictureChars() )+ [ <LPARENCHAR> IntegerConstant() <RPARENCHAR> ] )+ ]
}

PictureCurrency() 出现空,因此转到 PictureChars()

void PictureChars() :
{}
{
    <INTEGER> | <COBOL_WORD>
}

但是 COBOL_WORD 似乎不支持许多“有趣的”有效 PICTURE 子句定义:

<COBOL_WORD: ((["0"-"9"])+ (<MINUSCHAR>)*)*
    (["0"-"9"])* ["a"-"z"] ( ["a"-"z","0"-"9"] )*
    ( (<MINUSCHAR>)+ (["a"-"z","0"-"9"])+)*
>

解析 COBOL 并不容易,事实上,它可能是现有的构建高质量解析器最困难的语言之一
为了。我现在可以告诉你
您正在使用的 JavaCC 源代码不会削减它 - 除了一些非常简单且可能的
完全人工的 COBOL 程序示例。

评论答案

COBOL 图片字符串往往会搞乱最好的解析器。你是减号
遇到的麻烦只是冰山一角!图片串
很难
解析因为句点和逗号
可以是图片字符串的一部分,但用作字符串外部的分隔符。这意味着
解析器无法以上下文无关的方式明确地对句点或逗号进行分类。他们需要
“意识到”遇到的上下文。这听起来可能微不足道,但事实并非如此。

从技术上讲,分隔符句点和逗号后面必须跟一个空格(或行尾)。这
很少有事实可以使确定句点/逗号角色变得非常简单,因为图片字符串
不能包含空格。然而,许多
商业 COBOL 编译器足够“智能”,可以正确识别分隔符句点/逗号,
后面没有空格。
最后
有很多 COBOL 程序员编写了非法分隔符句点/逗号,这意味着您
可能不得不对付他们。

最重要的是,无论你做什么,那些小图片串都会
困扰着你。他们将需要相当多的努力来应对。

只是对即将发生的事情的提示,您将如何解析以下内容:

01 DISP-NBR-1 PIC -99,999.
01 DISP-NBR-2 PIC -99,999..
01 DISP-NBR-3 PIC -99,999, .
01 DISP-NBR-4 PIC -99,999,. 

DISP-NBR-1 之后的句点终止图片字符串。这是一个分隔符时期。这
DISP-NBR-2 后面的句点是字符串的一部分,第二个句点是分隔符。逗号
后面的 DISP-NBR-3 是分隔符 - 它不是图片字符串的一部分。然而逗号
后面的 DISP-NBR-4 是图片字符串的一部分,因为它后面没有空格。

欢迎来到 COBOL!

Why are you messing around with NumericConstant() when you are trying to parse a
COBOL PICTURE string?

According to the JavaCC source you have, a COBOL PICTURE should parse with:

void DataPictureClause() :
{}
{
  ( <PICTURE> | <PIC> ) [ <IS> ] PictureString()
}

the --9 bit is a Picture String and should parse with the PictureString() function:

void PictureString() :
{}
{
    [ PictureCurrency() ]
    ( ( PictureChars() )+ [ <LPARENCHAR> IntegerConstant() <RPARENCHAR> ] )+
    [ PicturePunctuation() ( ( PictureChars() )+ [ <LPARENCHAR> IntegerConstant() <RPARENCHAR> ] )+ ]
}

PictureCurrency() comes up empty so move on to PictureChars():

void PictureChars() :
{}
{
    <INTEGER> | <COBOL_WORD>
}

But COBOL_WORD does not appear to support many "interesting" valid PICTURE clause definitions:

<COBOL_WORD: ((["0"-"9"])+ (<MINUSCHAR>)*)*
    (["0"-"9"])* ["a"-"z"] ( ["a"-"z","0"-"9"] )*
    ( (<MINUSCHAR>)+ (["a"-"z","0"-"9"])+)*
>

Parsing COBOL is not easy, in fact it is probably one of the most difficult languages in existance to build a quality parser
for. I can tell you right now that the
JavaCC source you are working from is not going to cut it - except for some very simple and probably
totally artificial COBOL program examples.

Answer to comment

COBOL Picture strings tend to mess up the best of parsers. The minus sign you are
having trouble with is only the tip of the iceburg! Picture Strings
are difficult to
parse through because the period and comma
may be part of a Picture string but serve as separators outside of the string. This means
that parsers cannot unambiguously classify a period or comma in a context free manner. They need
to be "aware" of the context in which it is encountered. This may sound trivial but it isn't.

Technically, the separator period and comma must be followed by a space (or end of line). This
little fact could make determining the period/comma role very simple because a Picture String
cannot contain a space. However, many
commercial COBOL compilers are "smart" enough correctly recognize separator periods/commas that
are not followed by a space.
Consequently
there are a lot of COBOL programmers that code illegal separator period/commas, which means you
will probably have to deal with them.

The bottom line is that no matter what you do, those little Picture Strings are going to
haunt you. They will take quite a bit of effort to to deal with.

Just a hint of things to come, how would you parse the following:

01 DISP-NBR-1 PIC -99,999.
01 DISP-NBR-2 PIC -99,999..
01 DISP-NBR-3 PIC -99,999, .
01 DISP-NBR-4 PIC -99,999,. 

The period following DISP-NBR-1 terminates the Picture string. It is a separator period. The
period following DISP-NBR-2 is part of the string, the second period is the separator. The comma
following DISP-NBR-3 is a separator - it is not part of the Picture string. However the comma
following DISP-NBR-4 is part of the Picture string because it is not followed by a space.

Welcome to COBOL!

旧情别恋 2024-12-13 18:40:21

我发现当我得到图片时我必须将词法分析器切换到另一种模式。 COBOL PICTURE 字符串与语言的其余部分具有完全不同的“词汇”,并且您必须阻止杠杆执行任何包含句点、逗号等的操作,而不是将它们累积到图片字符串中。有关了解何时停止图片扫描的一些示例,请参阅 NealB 的回答。

我不知道你为什么要将 REDEFINES 短语合并到单词中。只需在解析器中正常解析即可。

I found that I had to switch the lexer into another mode when I got PICTURE. A COBOL PICTURE string has completely different 'lexics' from the rest of the language, and you must discourage the lever from doing anything with periods, commas, etc, other than accumulate them into the picture string. See NealB's answer for some examples of knowing when to stop picture-scanning.

I have no idea why you want to incorporate the REDEFINES phrase into the word. Just parse it normally in the parser.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文