EDI文件可以在数据中具有〜吗?

发布于 2025-01-26 16:17:33 字数 72 浏览 1 评论 0 原文

我正在解析EDI文件,然后分配〜S。我想知道EDI是否可以在数据本身中拥有〜?是否有一个规则在数据中说不?这是针对810/850等

I'm parsing a EDI file and splitting by ~s. I am wondering if it's possible for EDI to have ~ in the data itself? Is there a rule that says no ~ in the data? This is for 810/850 etc

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

枯叶蝶 2025-02-02 16:17:33

ISA段的第106个字符中定义的值(或者,或者,对于空格问题的脆弱 - isa16 元素之后的第一个字符)是段定界符(用官方术语:段终结者)。大多数时候,人们指定字符,但其他选择肯定是有效的。

在此示例中,第106个字符是

isa*00*00*00**zz*amazonds*01*testID*070808*1310*u*00401*000000043*1*t*t*t*t* +〜

而不是计数106个字符(同样,这可能是脆弱的问题),您可以计算16 elements - 也就是说16个星号 - 找到 isa16 (+),然后选择下一个字符()。

官方X12规范中有两个相关部分(强调强调):

12.5.4.3定界符规格

定界符由三个分隔符和一个终结器组成。这
设计分配器是为了包含在传输的数据流中。定界符是:

  • 段终结者 [注意:这是我们正在讨论的]

  • 数据元素分离器

  • 组件元素隔板

  • 重复分离器

分隔子由交换发送者分配。这些字符与数据元素的脱节是不相交的。 如果选择一个字符用于数据元素分离器,组件元素分离器,重复分离器或段终结器与数据元素可用的段终结器,那么在此交换期间,该字符不再可用用于数据元素。终结器的实例(< tr> )必须与数据元素隔离器的实例(< gs> ),组件元素隔离器(< us> )和重复分隔符(< rs> )。数据元素分离器,组件元素分离器和重复分离器不得具有相同的字符分配。

因此,根据规格的这一部分,如果将​​用作段终结者,则在数据元素中不允许使用文字主体)。

现在,让我们看一下第12.5.A.5节 - 针对分界符的建议:

在考虑数据内容,所使用的传输协议的局限性以及适用的行业约定后,必须小心地选择定界符字符。在没有其他准则的情况下,提供了以下建议:

< tr> terminator | 注意:“〜”是因为其在文本数据中的使用不足而选择的。

本节说,被选择为默认值,因为很少在文本数据中找到(例如,使用>,这是一个坏主意。 。

也就是说,即使在技术上禁止使用片段终止器,EDI传输仍然是可能的可能的,以无意中的方式将在文本数据中包括可能是偶然的。此外, bin bsd (二进制数据)段肯定可以包括(尽管这些可能无法基于您的事务集应用与之合作)。

我们的解析API ,我们根据基于类型的类型应用了一组特定模式我们遇到的细分市场。对我们来说,仅基于细分定界符天真地拆分不足包含在文本数据中。

对于常规段(即 bin bsd ),逻辑就是这样:

  • 消耗段代码(即第一个元素定界符之前的字符)。
  • 根据元素定界符消耗段的每个元素。
  • 如果下一个字符是段定界符或新行,请停止。

例如,对于段 beg*po-00001 ** 20210901〜 ,该过程看起来像:

  • 消费 beg 。由于这不是一个特殊的段( bin bsd ),请通过在*上拆分来消耗元素。
  • 消费 PO-00001
  • 消费''
  • 消费 20210901
  • 停止,因为下一个char是

(The pattern for binary segments is different from the pattern we use for regular segments.)

希望这会有所帮助。

The value defined in the 106th character of the ISA segment (or, alternatively – to be a bit less brittle to whitespace issues – the 1st character after the ISA16 element) is the segment delimiter (in official terms: the segment terminator). Most of the time people specify the ~ character, but other choices are certainly valid.

In this example, the 106th character is ~:

ISA*00* *00* *ZZ*AMAZONDS *01*TESTID *070808*1310*U*00401*000000043*1*T*+~

Instead of counting 106 characters (which, again, can be brittle to whitespace issues), you can count 16 elements – that is, 16 asterisks – to find the value for ISA16 (which is +), and then pick the next character (which is ~).

There are two relevant sections in the official X12 specification (bolded for emphasis):

12.5.4.3 Delimiter Specifications

The delimiters consist of three separators and a terminator. The
delimiters are devised for inclusion within the data stream of the transfer. The delimiters are:

  • segment terminator [note: this is the one we're discussing]

  • data element separator

  • component element separator

  • repetition separator

The delimiters are assigned by the interchange sender. These characters are disjoint from those of the data elements; if a character is selected for the data element separator, the component element separator, the repetition separator or the segment terminator from those available for the data elements, that character is no longer available during this interchange for use in a data element. The instance of the terminator (<tr>) must be different from the instance of the data element separator (<gs>), the component element separator (<us>) and the repetition separator (<rs>). The data element separator, component element separator and repetition separator must not have the same character assignment.

So, according to this part of the spec, if the ~ is used as the segment terminator, then the use of the ~ is disallowed in a data element (that is, the textual body).

Now, let's look at section 12.5.A.5 – Recommendations for the Delimiters:

Delimiter characters must be chosen with care, after consideration of data content, limitations of the transmission protocol(s) used, and applicable industry conventions. In the absence of other guidelines, the following recommendations are offered:

<tr> terminator: ~ | Note: the "~" was chosen for its infrequency of use in textual data.

This section is saying that ~ was chosen as the default because ~ is seldom found in textual data (it would have been a bad idea, for example, to use . as the default, since that's such a common inclusion).

That said, even though using the segment terminator is technically prohibited, it's still possible for an EDI transmission to inadvertently include ~ in the textual data – in other words, your trading partner may include this by accident. Further, the BIN and BSD (binary data) segments can certainly include ~ (though these may not apply based on the transaction sets you're working with).

In our parsing API, we apply a set of specific set of patterns based on the type of segment we encounter. For us, it's not sufficient to split naively based on the segment delimiter alone because we may encounter binary segments (BIN, BSD), where it's possible that the segment delimiter character is included in the textual data.

For a regular segment (i.e. not BIN or BSD), the logic is something like this:

  • Consume the segment code (i.e. the characters before the first element delimiter).
  • Consume each element of the segment based on the element delimiter.
  • Stop if the next character is a segment delimiter or a new line.

As an example, for segment BEG*PO-00001**20210901~, the process would look like:

  • Consume BEG. Since this is not a special segment (BIN or BSD), consume elements by splitting on *.
  • Consume PO-00001.
  • Consume ''.
  • Consume 20210901.
  • Stop since next char is ~.

(The pattern for binary segments is different from the pattern we use for regular segments.)

  • Here's an example of how our parser "fails" on a ~ in the textual data when the ISA16 segment delimiter is also ~; the JSON representation is particularly helpful for seeing the issue.
  • Here's an example of our parser succeeding on a ~ in the textual data when the ISA16 segment delimiter is ^.
  • Lastly, here's an example of our parsing succeeding where the ~ is specified in ISA16, but has been omitted altogether in favor of newlines – which we see occasionally.

Hope this helps.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文