如何在一段时间内从队列中删除重复的项目?

发布于 2024-10-17 01:47:34 字数 1756 浏览 5 评论 0原文

我想以有效的方式从队列中删除重复的条目。 队列有一个带有 DateTime 和 FullPath 以及其他一些内容的自定义类。

private Queue<MyCustomClass> SharedQueue;

类中的 DateTime 是插入队列时的时间戳。我想使用的逻辑如下:如果完整路径在 4 秒窗口内相同(即,如果在重复完整路径的 4 秒内添加到队列),则从队列中删除重复项。我有一些想要观看的事件,但仍然会出现一些重复的事件,这没关系。

我正在使用 c# 2.0 和 FileSystemWatcher 类和工作队列。

有很多方法可以做到这一点: 每次添加项目时都会修剪队列,或者当我在队列上工作时跳过当前重复项目的处理。

或者我应该使用“全局私有”变量 Dictionary<字符串、日期时间> ?这样我就可以快速搜索到吗?或者队列的本地副本?也许最好将本地队列限制为 100 个项目,以防出现许多文件事件?虽然就我而言,它“应该”只有相对较少的文件要在文件夹中监视...但事情总是会发生变化...

感谢您的帮助。

:编辑:美国东部时间 2 月 10 日 8:54: 因此,据我所知,我决定实施一个很好的简单解决方案。 我不认为我持有 Dict 键太久...

:编辑:2 月 10 日 9:53 EST:更新,因为我的词典不能包含重复的值。

   public void QueueInput(HotSynchUnit.RcdFSWFile rcd)
// start the worker thread when program starts.
// call Terminate.Set() in the programs exit routine or close handler etc.
{
  // lock shared queue
  lock (SharedQueue)
  {
    if (!IsDuplicateQueueInput(rcd))  // only add unique values to queue
    {
      SharedQueue.Enqueue(rcd);
      SomethingToDo.Set();
    }
  }
} // public void QueueInput

private bool IsDuplicateQueueInput(HotSynchUnit.RcdFSWFile rcd)
/* Return true if the object is a duplicate object.
 * Pseudo Code:
 * 
 * isDuplicate = false
 * Lock Dictionary
 * -If lastTimeStamp > 4 seconds ago then       // Optimization: save lastTimeStamp
 *    if Dict.Count > 0 then clear Dictionary
 *    return isDuplicate
 * -If not Dict.TryGetValue(sPath, dtTimeStamp) then
 *    Dict.AddKey()
 * -Else
 *    Compare key timestamp to Currenttime
 *    if key timestamp is <= 4 seconds ago then
 *       IsDuplicate = True
 *
 *    Dict.RemoveKey()
 *    Dict.AddKey()
 * 
 * return isDuplicate
*/
{
  // put real code here
}

I would like to remove duplicate entries from a queue in an efficient way.
The queue has a custom class with DateTime and FullPath and a few other things

private Queue<MyCustomClass> SharedQueue;

The DateTime in the class is the timestamp when inserted into the queue. The logic I would like to use is as following: Remove duplicates from the queue if the FullPath is identical within a 4 second window (i.e. if added to queue within 4 seconds of a duplicate fullpath). I have the events that I want to watch but a few duplicates will still arrive and that is OK.

I am using c# 2.0 and the FileSystemWatcher class and a worker queue.

There are a bunch of ways to do this:
Trim the queue each time an item is added to it, or when I am working on the queue skip the processing of the current duplicate item.

Or should I use a 'global private' variable Dictionary< String, DateTime> ? So I can quickly search it? or a local copy of the queue ? Perhaps it is best to limit the local queue to 100 items in case of many file events? Though in my case it 'should be' only a relatively few files to monitor in a folder... but things always change...

Thanks for any help.

:Edit: Feb 10 8:54 EST:
So I decided to implement a good simple solution as far as I can tell.
I don't think I am holding on to the Dict keys too long...

:Edit: Feb 10 9:53 EST: Updated as my Dictionary cannot contain duplicate values.

   public void QueueInput(HotSynchUnit.RcdFSWFile rcd)
// start the worker thread when program starts.
// call Terminate.Set() in the programs exit routine or close handler etc.
{
  // lock shared queue
  lock (SharedQueue)
  {
    if (!IsDuplicateQueueInput(rcd))  // only add unique values to queue
    {
      SharedQueue.Enqueue(rcd);
      SomethingToDo.Set();
    }
  }
} // public void QueueInput

private bool IsDuplicateQueueInput(HotSynchUnit.RcdFSWFile rcd)
/* Return true if the object is a duplicate object.
 * Pseudo Code:
 * 
 * isDuplicate = false
 * Lock Dictionary
 * -If lastTimeStamp > 4 seconds ago then       // Optimization: save lastTimeStamp
 *    if Dict.Count > 0 then clear Dictionary
 *    return isDuplicate
 * -If not Dict.TryGetValue(sPath, dtTimeStamp) then
 *    Dict.AddKey()
 * -Else
 *    Compare key timestamp to Currenttime
 *    if key timestamp is <= 4 seconds ago then
 *       IsDuplicate = True
 *
 *    Dict.RemoveKey()
 *    Dict.AddKey()
 * 
 * return isDuplicate
*/
{
  // put real code here
}

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

眼趣 2024-10-24 01:47:34

我只是考虑使用类似于通用哈希表任何集合...类似这样的东西:

Dictionary<string, YourClass> dict = new Dictionary<string, YourClass>();

/// just let's assume you want to add/check for "c:\demo.txt"

if (!dict.ContainsKey(@"c:\demo.txt"))
{
   /// add items to dict by passing fullPath as key and your objects as value
   dict.add(@"c:\demo.txt", obj1);
} 
else if (dict[@"c:\demo.txt"].CheckForIntervall())
{
   /// replace current object in dictionary with new object - in case you want to..
   /// or just do what you want to 
}

编辑 - 您的自定义类可能具有这样的功能:

class YOURCUSTOMCLASS
{
    private DateTime creationTime;

    public DateTime CreationTime
    { get { return creationTime; } }

    public YOURCUSTOMCLASS(parametersGoesHere xyz)
    {
          creationTime = DateTime.Now;
    }

    /// in this case this method will return true
    /// if the timeSpan between this object and otherObject
    /// is greater than 4 seconds
    public bool CheckForInterval(YOURCUSTOMCLASS otherObject)
    {
         TimeSpan diff = otherObj.CreationTime.Subtract(creationTime);

         /// you may replace 4 through any other digit, or even better take
         /// a const/global var/static ...
         return diff.TotalSeconds > 4;
    }

    /// all the other stuff you need ...
}

当然您会< em>放松队列的功能 - 但如果您的队列包含许多元素,您将运行时间大幅增加

I just thought about using any collection similar to a generic hashtable... Something like this:

Dictionary<string, YourClass> dict = new Dictionary<string, YourClass>();

/// just let's assume you want to add/check for "c:\demo.txt"

if (!dict.ContainsKey(@"c:\demo.txt"))
{
   /// add items to dict by passing fullPath as key and your objects as value
   dict.add(@"c:\demo.txt", obj1);
} 
else if (dict[@"c:\demo.txt"].CheckForIntervall())
{
   /// replace current object in dictionary with new object - in case you want to..
   /// or just do what you want to 
}

edit - your custom class may have some functionality like this:

class YOURCUSTOMCLASS
{
    private DateTime creationTime;

    public DateTime CreationTime
    { get { return creationTime; } }

    public YOURCUSTOMCLASS(parametersGoesHere xyz)
    {
          creationTime = DateTime.Now;
    }

    /// in this case this method will return true
    /// if the timeSpan between this object and otherObject
    /// is greater than 4 seconds
    public bool CheckForInterval(YOURCUSTOMCLASS otherObject)
    {
         TimeSpan diff = otherObj.CreationTime.Subtract(creationTime);

         /// you may replace 4 through any other digit, or even better take
         /// a const/global var/static ...
         return diff.TotalSeconds > 4;
    }

    /// all the other stuff you need ...
}

Of course you will loose the functionality of a queue - but you will get an massive increase in runtime if your queue containts many elements.

hth

静赏你的温柔 2024-10-24 01:47:34

我会创建一个子类:

class MyDeduplicatedQueue : Queue<MyCustomObject> {
    /// etc
}

然后您可以将所有适当的过滤逻辑放入 Enqueue 方法中。

I would make a subclass:

class MyDeduplicatedQueue : Queue<MyCustomObject> {
    /// etc
}

Then you can put all the appropriate filtering logic into the Enqueue method.

谁的年少不轻狂 2024-10-24 01:47:34

我将创建一个包装类,而不是从队列扩展,因为基本类型 Queue 的用户期望不同的行为。 (当您这样做时,.NET 4.0 中的数据协定甚至可能会抱怨。)

在内部,您可以有一个实际的队列来将所需的调用重定向到其中。
每次 Queue() 调用时,您都可以将尚未包含的新元素添加到字典中。在此之前,您可以从该字典中清空所有早于 x 秒的元素,并将它们按顺序添加到内部队列中。

出队时,您必须检查内部队列是否包含元素,否则从字典中选择最早的元素。

这当然只是一种可能的实现。当许多不同的元素可能快速排队时,字典将很快填满,并且可能需要添加额外的逻辑来解决这个问题。

I would make a wrapper class and not extend from queue, as users of the base type Queue expect different behavior. (Data Contracts in .NET 4.0 might even complain when you do so.)

Internally you can have a actual queue to which to redirect the required calls.
Every Queue() call you could add the new element to a Dictionary when it is not contained already. Before doing so, you could empty all elements that are older than x seconds from this dictionary, and add them to the inner queue in order.

When dequeuing, you will have to check whether the inner queue contains elements, and otherwise pick the earliest element from the dictionary.

This ofcourse is just one possible implementation. When a lot of different elements might get queued quickly, the dictionary will fill up quickly and additional logic might have to be added to resolve that.

只怪假的太真实 2024-10-24 01:47:34

如果插入有重复的路径,为什么不直接拒绝插入呢?您所要做的就是从队列尾部开始线性搜索,并在找到重复项(并拒绝插入)或时间戳超过时间限制(并插入记录)时停止?看起来比保留另一个数据结构和所有相关逻辑要简单得多。

Why not just reject inserts if they have duplicate paths? All you have to do is a linear search starting from the tail of the queue and stop when you either find a duplicate (and reject the insert) or when the timestamp exceeds your time limit (and insert the record)? Seems a lot simpler than keeping another data structure around and all the associated logic.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文