OpenJSON未检测到重复的值

发布于 2025-02-06 17:20:06 字数 3396 浏览 2 评论 0 原文

我需要将一些销售记录导入和解析到数据库中。
一些客户有多个产品,因此有几个订单具有不同的产品,但是 json array 中的运输地址相同。

当我尝试将地址保存在地址表:
中时会出现问题 显然,我只需要每个地址的一个记录,并且要实现此目标,我计算了地址字段的 hash ,我将其与表中已经存在的哈希进行了比较:

看来(嗯,我当然)查询仅第一次检查 hash 已经存在,如果不存在,它将添加与订单计数

的行一样多的行:我可以想象这是 openjson 功能,因为 JSON的解析有效负载似乎是由内部循环进行的,因此我认为我必须使用其他方法。...但是我没有想法使用什么。

在这里, JSON 有效负载

declare @json nvarchar(max)=N'[
   {
      "id": 21660,
      "currency": "USD",
      "total": "15.00",
      "shipping": {
         "first_name": "Charles",
         "last_name": "Leuschke",
         "address_1": "3121 W Olive Ave",
         "city": "Burbank",
         "state": "CA",
         "postcode": "91505",
         "country": "US"
      },
      "line_items": [
         {
            "id": 1052
         }
      ]
   },
   {
      "id": 21659,
      "currency": "USD",
      "total": "38.00",
      "shipping": {
         "first_name": "Charles",
         "last_name": "Leuschke",
         "address_1": "3121 W Olive Ave",
         "city": "Burbank",
         "state": "CA",
         "postcode": "91505",
         "country": "US"
      },
      "line_items": [
         {
            "id": 1050
         }
      ]
   },
   {
      "id": 21658,
      "currency": "USD",
      "total": "38.00",
      "shipping": {
         "first_name": "Charles",
         "last_name": "Leuschke",
         "address_1": "3121 W Olive Ave",
         "city": "Burbank",
         "state": "CA",
         "postcode": "91505",
         "country": "US"
      },
      "line_items": [
         {
            "id": 1048
         }
      ]
   }
]'

和(简化) QUERY

Insert Into @Addresses
(
    orderId,
    fullName,
    addressLine1,
    city,
    stateOrProvince,
    postalCode,
    countryCode,
    addressCode
)
SELECT
    o.orderId,
    concat(s.firstName,' ',s.lastName),
    s.addressLine1,
    s.city,
    s.stateOrProvince,
    s.postalCode,
    s.countryCode,
    convert(nvarchar(64),hashbytes('SHA1',concat(s.firstName, ' ', s.lastName, s.addressLine1, s.city, s.stateOrProvince, s.postalCode, s.countryCode)),2)
FROM OPENJSON(@json) 
    WITH  (
            orderId                     nvarchar(64)    '$.id',
            shipping                    nvarchar(max)   '$.shipping' AS JSON
          ) o  
    CROSS APPLY OPENJSON(shipping) 
    WITH  (
            firstName                   nvarchar(128)   '$.first_name',
            lastName                    nvarchar(128)   '$.last_name',
            addressLine1                nvarchar(128)   '$.address_1',
            city                        nvarchar(128)   '$.city',
            stateOrProvince             nvarchar(64)    '$.state',
            postalCode                  nvarchar(64)    '$.postcode',
            countryCode                 nvarchar(4)     '$.country'
         ) s

left join @Addresses a on a.addressCode=convert(nvarchar(64),hashbytes('SHA1',concat(s.firstName,' ', s.lastName, s.addressLine1, s.city, s.stateOrProvince, s.postalCode, s.countryCode)),2)
where a.addressCode is null

我还准备了一个SQLFIDDLE,您可以看到它可以返回3行,而目标只需获得一个

I need to import and parse some sales records into the database.
Some customers have more than one product, therefore there are several orders with different product but with the same shipping address in the JSON Array.

Problem arise when I try to save the address in Addresses Table:
Clearly I need just one record for each Address, and to achieve this I calculate an Hash of the address fields and I compare it with the Hashes already present in the table:

It seems (well, I'm sure) that the query checks only the first time if the Hash is already present, and if it is not present, it add as many rows as the orders count:

I can imagine this is the standard behavior of OPENJSON function since the parsing of the JSON payload seems to be made by an inner loop, therefore I think I have to use a different approach.... but I have no Idea what to use.

here the JSON payload

declare @json nvarchar(max)=N'[
   {
      "id": 21660,
      "currency": "USD",
      "total": "15.00",
      "shipping": {
         "first_name": "Charles",
         "last_name": "Leuschke",
         "address_1": "3121 W Olive Ave",
         "city": "Burbank",
         "state": "CA",
         "postcode": "91505",
         "country": "US"
      },
      "line_items": [
         {
            "id": 1052
         }
      ]
   },
   {
      "id": 21659,
      "currency": "USD",
      "total": "38.00",
      "shipping": {
         "first_name": "Charles",
         "last_name": "Leuschke",
         "address_1": "3121 W Olive Ave",
         "city": "Burbank",
         "state": "CA",
         "postcode": "91505",
         "country": "US"
      },
      "line_items": [
         {
            "id": 1050
         }
      ]
   },
   {
      "id": 21658,
      "currency": "USD",
      "total": "38.00",
      "shipping": {
         "first_name": "Charles",
         "last_name": "Leuschke",
         "address_1": "3121 W Olive Ave",
         "city": "Burbank",
         "state": "CA",
         "postcode": "91505",
         "country": "US"
      },
      "line_items": [
         {
            "id": 1048
         }
      ]
   }
]'

and the (simplified) query

Insert Into @Addresses
(
    orderId,
    fullName,
    addressLine1,
    city,
    stateOrProvince,
    postalCode,
    countryCode,
    addressCode
)
SELECT
    o.orderId,
    concat(s.firstName,' ',s.lastName),
    s.addressLine1,
    s.city,
    s.stateOrProvince,
    s.postalCode,
    s.countryCode,
    convert(nvarchar(64),hashbytes('SHA1',concat(s.firstName, ' ', s.lastName, s.addressLine1, s.city, s.stateOrProvince, s.postalCode, s.countryCode)),2)
FROM OPENJSON(@json) 
    WITH  (
            orderId                     nvarchar(64)    '$.id',
            shipping                    nvarchar(max)   '$.shipping' AS JSON
          ) o  
    CROSS APPLY OPENJSON(shipping) 
    WITH  (
            firstName                   nvarchar(128)   '$.first_name',
            lastName                    nvarchar(128)   '$.last_name',
            addressLine1                nvarchar(128)   '$.address_1',
            city                        nvarchar(128)   '$.city',
            stateOrProvince             nvarchar(64)    '$.state',
            postalCode                  nvarchar(64)    '$.postcode',
            countryCode                 nvarchar(4)     '$.country'
         ) s

left join @Addresses a on a.addressCode=convert(nvarchar(64),hashbytes('SHA1',concat(s.firstName,' ', s.lastName, s.addressLine1, s.city, s.stateOrProvince, s.postalCode, s.countryCode)),2)
where a.addressCode is null

I also prepared a sqlfiddle, where you can see that it returns 3 rows while target is to get just one

https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=3031a99e3cd24f6bf383c0de29cf19a1

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

心病无药医 2025-02-13 17:20:06

只需使用标准 row_number()重复数据删除方法,例如


WITH cte AS (
    SELECT
    o.orderId,
    concat(s.firstName,' ',s.lastName) fullName,
    s.addressLine1,
    s.city,
    s.stateOrProvince,
    s.postalCode,
    s.countryCode,
    h.addressCode,
    ROW_NUMBER() OVER (PARTITION BY h.addressCode ORDER BY o.OrderId ASC) rn
    FROM OPENJSON(@json) 
    WITH  (
        orderId nvarchar(64) '$.id',
        shipping nvarchar(max) '$.shipping' AS JSON
    ) o  
    CROSS APPLY OPENJSON(shipping) 
    WITH  (
        firstName nvarchar(128) '$.first_name',
    lastName nvarchar(128) '$.last_name',
    addressLine1 nvarchar(128) '$.address_1',
    city nvarchar(128) '$.city',
    stateOrProvince nvarchar(64) '$.state',
    postalCode nvarchar(64) '$.postcode',
    countryCode nvarchar(4) '$.country'
    ) s
    CROSS APPLY (
      VALUES (CONVERT(nvarchar(64), HASHBYTES('SHA1', CONCAT(s.firstName ,' ', s.lastName, s.addressLine1, s.city, s.stateOrProvince, s.postalCode, s.countryCode)),2))
    ) h (addressCode)
    LEFT JOIN @Addresses a ON a.addressCode = h.AddressCode
    WHERE a.addressCode IS NULL
)
INSERT INTO @Addresses
(
  orderId,
  fullName,
  addressLine1,
  city,
  stateOrProvince,
  postalCode,
  countryCode,
  addressCode
)
     SELECT orderId,
         fullName,
         addressLine1,
         city,
         stateOrProvince,
         postalCode,
         countryCode,
         addressCode
    FROM cte
    WHERE rn = 1;

注意:如果使用交叉应用来计算哈希码,则避免多次计算它。

Just use the standard ROW_NUMBER() deduplication method e.g.


WITH cte AS (
    SELECT
    o.orderId,
    concat(s.firstName,' ',s.lastName) fullName,
    s.addressLine1,
    s.city,
    s.stateOrProvince,
    s.postalCode,
    s.countryCode,
    h.addressCode,
    ROW_NUMBER() OVER (PARTITION BY h.addressCode ORDER BY o.OrderId ASC) rn
    FROM OPENJSON(@json) 
    WITH  (
        orderId nvarchar(64) '$.id',
        shipping nvarchar(max) '$.shipping' AS JSON
    ) o  
    CROSS APPLY OPENJSON(shipping) 
    WITH  (
        firstName nvarchar(128) '$.first_name',
    lastName nvarchar(128) '$.last_name',
    addressLine1 nvarchar(128) '$.address_1',
    city nvarchar(128) '$.city',
    stateOrProvince nvarchar(64) '$.state',
    postalCode nvarchar(64) '$.postcode',
    countryCode nvarchar(4) '$.country'
    ) s
    CROSS APPLY (
      VALUES (CONVERT(nvarchar(64), HASHBYTES('SHA1', CONCAT(s.firstName ,' ', s.lastName, s.addressLine1, s.city, s.stateOrProvince, s.postalCode, s.countryCode)),2))
    ) h (addressCode)
    LEFT JOIN @Addresses a ON a.addressCode = h.AddressCode
    WHERE a.addressCode IS NULL
)
INSERT INTO @Addresses
(
  orderId,
  fullName,
  addressLine1,
  city,
  stateOrProvince,
  postalCode,
  countryCode,
  addressCode
)
     SELECT orderId,
         fullName,
         addressLine1,
         city,
         stateOrProvince,
         postalCode,
         countryCode,
         addressCode
    FROM cte
    WHERE rn = 1;

Note: If you use CROSS APPLY to calculate the hashcode one avoids calculating it multiple times.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文