XML、S-Expressions 和重叠范围...它叫什么?

发布于 2024-09-12 01:38:27 字数 414 浏览 7 评论 0原文

我正在阅读 XML 不是 S 表达式。 XML 范围有点严格,S 表达式也是如此。在我见过的每种编程语言中,都不能有以下内容:

BOLD BOTH ITALIC == BOLD BOTH ITALIC

它甚至无法用 S 表达式来表达:

(bold "BOLD" (italic "BOTH" ) "ITALIC" ) == :(

有任何编程语言支持这种“重叠”范围吗?它有任何实际用途吗?

I was reading XML is not S-Expressions. XML scoping is kind of strict, as are S-expressions. And in every programming language I've seen, you can't have the following:

<b>BOLD <i>BOTH </b>ITALIC</i> == BOLD BOTH ITALIC

It's not even expressible with S-Expressions:

(bold "BOLD" (italic "BOTH" ) "ITALIC" ) == :(

Does any programming language support this kind of "overlapping" scoping? Could there be any practical use for it?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

城歌 2024-09-19 01:38:27

重叠标记结构有许多实际用途。考虑 人文学科文本分析的并发标记重叠结构标记国际研讨会指出:

重叠结构无处不在,出现在各种文本标记的应用中,如飞机维修手册和古代圣经和礼拜仪式作品。每当文本编码超越特定层次结构的快照视图来表示和处理文本的多个并发方面时,“重叠问题”就会出现,包括反映文本跨多个版本和变体(无论是印刷的还是表示的、结构的)演变的特征。 、注释性或参考性、分类性或主题性。

重叠是各种文本中的一个问题,例如技术文档和产品手册(版本控制)、法律法规(有效性)、文学作品(散文与戏剧结构、修辞结构、注释)、神圣文本(章节加诗句参考与句子)结构和评论)和语言语料库(多层语言注释)。

文本编码倡议 (TEI) 发布指南处理非嵌套信息并提供重叠的 XML 语法。他们在 2004 年表示:

目前尚未提出解决方案,它结合了形式简单性、表示所有发生或想象的结构类型的能力、形式或机械验证的适用性以及与更简单情况所需的符号的清晰一致性等所有理想属性(即文本特征正确嵌套的情况)。

处理重叠结构的一些选项包括:

SGML 有一个CONCUR 功能可用于支持重叠结构,尽管 Goldfarb(标准)写道“因此我建议不要使用 CONCUR 创建文档的多个逻辑视图”。

GODDAG 提供了一种数据结构,用于表示具有重叠结构的文档。

XCONCUR 是一种实验性标记语言,其主要目标是提供一种方便的方法来在 XML 中表达并发层次结构 -喜欢时尚。

Overlapping markup structures has many practical uses. Consider for example applications of concurrent markup for text analysis in the humanities. The International Workshop on Markup of Overlapping Structures noted that:

Overlapping structures are ubiquitous, appearing in applications of textual markup as varied as aircraft maintenance manuals and ancient scriptural and liturgical works. The “overlap issue“ raises its ugly head whenever text encoding looks beyond the snapshot view of a particular hierarchy to represent and process multiple concurrent aspects of a text, including features that reflect the text’s evolution across multiple versions and variants whether typographic or presentational, structural, annotational or referential, taxonomic or topical.

Overlap is a problem in texts as diverse as technical documents and product manuals (versioning), legal codes (effectivity), literary works (prosadic versus dramatic stucture, rhetorical structures, annotation), sacred texts (chapter plus verse reference versus sentence structure and commentary), and language corpora (multiple layers of linguistic annotation).

The Text Encoding Initiative (TEI) publishes Guidelines to handle non-nesting information and provides an XML syntax for overlap. They stated in 2004 that:

[N]o solution has yet been suggested which combines all the desirable attributes of formal simplicity, capacity to represent all occurring or imaginable kinds of structures, suitability for formal or mechanical validation, and clear identity with the notations needed for simpler cases (i.e. cases where the textual features do nest properly).

Some options to handle overlapping structures include:

SGML has a CONCUR feature that can be used to support overlapping structures, although Goldfarb (the author of the standard) writes that "“I therefore recommend that CONCUR not be used to create multiple logical views of a document".

GODDAG provides a data structure for representing documents with overlapping structures.

XCONCUR is an experimental markup language with the major goal to provide a convenient method to express concurrent hierarchies in an XML-like fashion.

行雁书 2024-09-19 01:38:27

可能没有任何编程语言在其正式定义中支持重叠范围。虽然技术上可行,但它会使实施变得比需要的更加复杂。它还会使语言变得含糊不清,无法接受很可能被认为是错误的内容。

我现在能想到的唯一实际用途是它的打字量更少并且编写更直观,就像在标记中编写属性感觉更直观而无需不必要的引号一样,如 > 中所示。 而不是

我认为强制执行嵌套结构也可以提高处理效率。通过强制执行嵌套结构,解析器可以将节点推入和弹出到单个堆栈上以跟踪打开节点的列表。对于重叠的范围,您需要一个开放范围的有序列表,每当遇到 begin-new-scope 令牌时都必须附加到该列表,然后在每次遇到end-scope 标记来查看哪个开放范围最有可能是它关闭的范围。

尽管没有编程语言支持重叠作用域,但有一些 HTML 解析器支持它作为其错误恢复算法的一部分,包括所有主要浏览器中的算法。

此外,C 中的 switch 语句允许看起来像重叠范围的构造,如 Duff's Device

switch(count%8)
  {
   case 0:  do{ *to = *from++;
   case 7:      *to = *from++;
   case 6:      *to = *from++;
   case 5:      *to = *from++;
   case 4:      *to = *from++;
   case 3:      *to = *from++;
   case 2:      *to = *from++;
   case 1:      *to = *from++;

              } while(--n>0);
  } 

因此,理论上,编程语言通常可以对作用域具有类似的语义,以便在需要时允许使用此类技巧进行优化,但可读性会非常低。

goto 语句以及某些语言中的 breakcontinue 还可以让您构建程序,使其表现得像重叠作用域:

BOLD: while (bold)
 { styles.add(bold)
   print "BOLD"

   while(italic) 
    { styles.add(italic)
      print "BOTH";
      break BOLD;
    }
 }

italic-continued: 
    styles.remove(bold)
    print "ITALIC"

There probably isn't any programming language that supports overlapping scopes in its formal definition. While technically possible, it would make the implementation more complex than it needed to be. It would also make the language ambiguous as to accept as valid what would very likely supposed to be a mistake.

The only practical use I can think of right now is that it's less typing and is written more intuitively, just as writing attributes in mark-up feel more intuitive without uneccessary quotes, as in <foo id=45 /> instead of <foo id="45" />.

I think that enforcing nested structures makes for more efficient processing, too. By enforcing nested structures, the parser can push and pop nodes onto a single stack to keep track of the list of open nodes. With overlapped scopes, you'd need an ordered list of open scopes that you'd have to append to whenever you come across a begin-new-scope token, and then scan each time you come across an end-scope token to see which open scope is most likely to be the one it closes.

Although no programming languages support overlapping scopes, there are HTML parsers that support it as part of their error-recovery algorithms, including the ones in all major browsers.

Also, the switch statement in C allows for constructs that look something like overlapping scopes, as in Duff's Device:

switch(count%8)
  {
   case 0:  do{ *to = *from++;
   case 7:      *to = *from++;
   case 6:      *to = *from++;
   case 5:      *to = *from++;
   case 4:      *to = *from++;
   case 3:      *to = *from++;
   case 2:      *to = *from++;
   case 1:      *to = *from++;

              } while(--n>0);
  } 

So, in theory, a programming language can have similar semantics for scopes in general to allow these kinds of tricks for optimization when needed but readability would be very low.

The goto statement, along with break and continue in some languages also lets you structure programs to behave like overlapped scopes:

BOLD: while (bold)
 { styles.add(bold)
   print "BOLD"

   while(italic) 
    { styles.add(italic)
      print "BOTH";
      break BOLD;
    }
 }

italic-continued: 
    styles.remove(bold)
    print "ITALIC"
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文