使用类似的数据列出一系列对象

发布于 2025-02-12 19:40:47 字数 1472 浏览 0 评论 0原文

我正在尝试删除内容和作者相同的JSON对象数组,但是时间戳略有不同(即1秒钟内)。我想将重复的消息保存为一个新字段,称为重复。例如,考虑应删除的条目2,3和5个消息的以下内容:

myObject = [
{content: 'content1', date: '1980-08-01 12:12:40.000', author: 'Person1'}, 
{content: 'content2', date: '1980-08-01 12:12:40.900', author: 'Person2'},
{content: 'content2', date: '1980-08-01 12:12:41.100', author: 'Person2'},
{content: 'content3', date: '1980-08-01 12:12:41.000', author: 'Person1'},
{content: 'content2', date: '1980-08-01 12:12:41.400', author: 'Person2'},
{content: 'content4', date: '1980-08-01 12:12:45.100', author: 'Person2'},
]

应该将其转换为:

deduped = [
{content: 'content1', date: '1980-08-01 12:12:40.000', author: 'Person1', duplicates: 0}, 
{content: 'content2', date: '1980-08-01 12:12:40.900', author: 'Person2', duplicates: 2},
{content: 'content3', date: '1980-08-01 12:12:41.000', author: 'Person1', duplicates: 0},
{content: 'content4', date: '1980-08-01 12:12:45.100', author: 'Person2', duplicates: 0},
]

我遇到的部分是DateTime。如果重复项之间发生非重复消息,则按日期进行排序,然后还要减少。并且比较数据的字符串值也很容易出错,因为两条消息可能非常接近,但是根据它们跌落的位置显示为1秒。

使用lodash _.uniqwith,我可以根据实际的时间整体与相同的内容和作者的组合进行辩论,但是我缺乏重复的字段……

const dedupedButNoCount = _.uniqWith(myObject, (item1, item2) => 
{return (item1.content== item2.content) && (item1.author== item2.author) 
&& ((new Date(item1.date).getTime() - new Date(item2.date).getTime())<500)}
)

任何关于如何删除具有相似但不相同数据的对象的指针?

I am trying to dedupe an array of JSON objects where the content and author are the same, but the timestamp is slightly different (i.e. within 1 second). I'd like to preserve the duplicated messages as a new field, called duplicates. For example, consider the following which has as entries 2,3 and 5 messages which should be deduped :

myObject = [
{content: 'content1', date: '1980-08-01 12:12:40.000', author: 'Person1'}, 
{content: 'content2', date: '1980-08-01 12:12:40.900', author: 'Person2'},
{content: 'content2', date: '1980-08-01 12:12:41.100', author: 'Person2'},
{content: 'content3', date: '1980-08-01 12:12:41.000', author: 'Person1'},
{content: 'content2', date: '1980-08-01 12:12:41.400', author: 'Person2'},
{content: 'content4', date: '1980-08-01 12:12:45.100', author: 'Person2'},
]

should be transformed to :

deduped = [
{content: 'content1', date: '1980-08-01 12:12:40.000', author: 'Person1', duplicates: 0}, 
{content: 'content2', date: '1980-08-01 12:12:40.900', author: 'Person2', duplicates: 2},
{content: 'content3', date: '1980-08-01 12:12:41.000', author: 'Person1', duplicates: 0},
{content: 'content4', date: '1980-08-01 12:12:45.100', author: 'Person2', duplicates: 0},
]

The part that I am having trouble with is the datetime. Sorting by datetime and then reducing is prone to errors if a non duplicate message occurs between the duplicates. And comparing the string value of the datetimes is also error prone because two messages may be very close together, but show as 1 second apart based on where they fall.

Using lodash _.uniqWith, I can dedupe based on the combination of a actual timedelta with identical content and author, but I lack the duplicates field...

const dedupedButNoCount = _.uniqWith(myObject, (item1, item2) => 
{return (item1.content== item2.content) && (item1.author== item2.author) 
&& ((new Date(item1.date).getTime() - new Date(item2.date).getTime())<500)}
)

Any pointers on how to dedupe an array of objects with similar but not identical datetimes?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

明月松间行 2025-02-19 19:40:47

我已经做到了,但是我用一种...

const
  getTimeMs = YMDhmsx =>     // date string conversion to UTC (time zone = 0)
    {
    let [Y,M,D,h,m,s,x] = YMDhmsx.split(/\-|\.|\s|\:/).map(Number)
    return (new Date(Date.UTC(Y,--M,D,h,m,s,x))).getTime() // time UTC value in ms
    }
, myObject = [
    {content: 'content1', date: '1980-08-01 12:12:40.000', author: 'Person1'}, 
    {content: 'content2', date: '1980-08-01 12:12:40.900', author: 'Person2'},
    {content: 'content2', date: '1980-08-01 12:12:41.100', author: 'Person2'},
    {content: 'content3', date: '1980-08-01 12:12:41.000', author: 'Person1'},
    {content: 'content2', date: '1980-08-01 12:12:41.400', author: 'Person2'},
    {content: 'content4', date: '1980-08-01 12:12:45.100', author: 'Person2'},
    ]
    
let result = 
  myObject
  .sort( (a,b) =>
    a.content.localeCompare(b.content) || 
    a.author.localeCompare(b.author) || 
    a.date.localeCompare(b.date) 
    )
  .reduce( (r,el,i,{[i-1]:prev}) =>
    {
    let msTime = getTimeMs(el.date)

    if (el.content === prev?.content 
     && el.author === prev?.author
     && (msTime - r.msTime) <= 1000 )  // 1 second less on previous
      r.current.duplicates++;
    else
      {
      r.current = {...el, duplicates:0 }
      r.result.push( r.current )
      }
    r.msTime = msTime
    return r
    }
    , {msTime:0, current:null, result:[] })
  .result;
  
console.log ( 'result:\n' + JSON.stringify( result ).replaceAll('},{','}\n,{') ) 
.as-console-wrapper {max-height: 100% !important;top: 0;}
.as-console-row::after {display: none !important;}

I've done that, but I use a sort...

const
  getTimeMs = YMDhmsx =>     // date string conversion to UTC (time zone = 0)
    {
    let [Y,M,D,h,m,s,x] = YMDhmsx.split(/\-|\.|\s|\:/).map(Number)
    return (new Date(Date.UTC(Y,--M,D,h,m,s,x))).getTime() // time UTC value in ms
    }
, myObject = [
    {content: 'content1', date: '1980-08-01 12:12:40.000', author: 'Person1'}, 
    {content: 'content2', date: '1980-08-01 12:12:40.900', author: 'Person2'},
    {content: 'content2', date: '1980-08-01 12:12:41.100', author: 'Person2'},
    {content: 'content3', date: '1980-08-01 12:12:41.000', author: 'Person1'},
    {content: 'content2', date: '1980-08-01 12:12:41.400', author: 'Person2'},
    {content: 'content4', date: '1980-08-01 12:12:45.100', author: 'Person2'},
    ]
    
let result = 
  myObject
  .sort( (a,b) =>
    a.content.localeCompare(b.content) || 
    a.author.localeCompare(b.author) || 
    a.date.localeCompare(b.date) 
    )
  .reduce( (r,el,i,{[i-1]:prev}) =>
    {
    let msTime = getTimeMs(el.date)

    if (el.content === prev?.content 
     && el.author === prev?.author
     && (msTime - r.msTime) <= 1000 )  // 1 second less on previous
      r.current.duplicates++;
    else
      {
      r.current = {...el, duplicates:0 }
      r.result.push( r.current )
      }
    r.msTime = msTime
    return r
    }
    , {msTime:0, current:null, result:[] })
  .result;
  
console.log ( 'result:\n' + JSON.stringify( result ).replaceAll('},{','}\n,{') ) 
.as-console-wrapper {max-height: 100% !important;top: 0;}
.as-console-row::after {display: none !important;}

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文