对我来说检查添加到目录中的新文件的最佳方法是什么,我认为 filesystemwatcher 不合适,因为这不是一个始终在线的服务,而是一种在我的程序启动时运行的方法。
我正在监视的文件夹结构中有超过 20,000 个文件,目前我正在单独检查每个文件以查看文件路径是否在我的数据库表中,但这大约需要十分钟,我想加快速度是可能的,
我可以存储上次检查文件夹的日期 - 是否可以轻松获取 createdate > 的所有文件最后检查日期。
有人有什么想法吗?
谢谢马克
What is the best way for me to check for new files added to a directory, I dont think the filesystemwatcher would be suitable as this is not an always on service but a method that runs when my program starts up.
there are over 20,000 files in the folder structure I am monitoring, at present I am checking each file individually to see if the filepath is in my database table, however this is taking around ten minutes and I would like to speed it up is possible,
I can store the date the folder was last checked - is it easy to get all files with createddate > last checked date.
anyone got any Ideas?
Thanks
Mark
发布评论
评论(6)
您的方法是唯一可行的(即文件系统观察程序允许您查看更改,而不是在启动时检查)。
找出什么需要这么长时间。 20.000 个检查不应花费 10 分钟 - 最多可能 1 分钟。你的程序写得很慢。你如何测试它?
提示:不要询问数据库,将所有文件的列表放入内存,数据库中所有文件的列表,检查内存。向数据库发送 20.000 条 SQL 语句太慢,这样您就需要 1 条 SQL 语句来获取列表。
Your approach is the only feasible (i.e. file system watcher allows you to see changes, not check on start).
Find out what takes so long. 20.000 checks should not take 10 minutes - maybe 1 maximum. Your program is written slowly. How do you test it?
Hint: do not ask the database, get a list of all files into memory, a list of all filesi n the database, check in memory. 20.000 SQL statements to the database are too slow, this way you need ONE to get the list.
对于 20,000 个文件来说 10 分钟似乎太长了。你打算如何进行比较?您的建议也不考虑已删除的文件。如果您想从数据库中删除它们,则必须进行全面比较。
也许问题是数据库往返。您可以从数据库中以大块(或一次全部)检索已知文件列表,并按字母顺序排序。对本地文件列表进行排序,然后遍历两个列表,同时处理丢失的或新的条目。
10 minutes seems awfully long for 20,000 files. How are you going about doing the comparison? Your suggestion doesn't account for deleted files either. If you want to remove those from the database, you will have to do a full comparison.
Perhaps the problem is the database round trips. You can retrieve a known file list from the database in large chunks (or all at once), sorted alphabetically. Sort the local file list as well and walk the two lists, processing missing or new entries as you go along.
FileSystemWatcher
不是 可靠,因此即使您可以使用某项服务,它也不一定适合您。我可以看到的两个选项是:
FileSystemWatcher
is not reliable, so even if you could use a service, it would not necessarily work for you.The two options I can see are:
您可以在某处写入创建 onfile 的最后一个时间戳,这很简单并且可以为您工作。
You can write in somewhere the last timestamp that onfile was created, it is simple and can work for you.
你能编写一个在该机器上运行的服务吗?然后该服务可以使用 FileSystemWtcher
Can you write a service that runs on that machine? The service can then use FileSystemWtcher
像凯文·琼斯(Kevin Jones)建议的那样拥有 FileSystemWatcher 服务可能是最务实的答案,但还有一些其他选择。
如果您在 Linux 机器上使用 Samba 挂载该目录,则可以使用 inotify 来查看该目录。当然,这是假设您不介意分散您的平台,但这就是 inotify 的用途。
更正确的是,如果您正在监视一个包含 20K 文件的目录,那么您获得批准的机会相应较小,那么可能是时候改进您的系统架构了。由于不了解有关您的应用程序的更多信息,听起来消息队列可能值得一看。
Having a FileSystemWatcher service like Kevin Jones suggests is probably the most pragmatic answer, but there are some other options.
You can watch the directory with inotify if you mount it with Samba on a linux box. That of course assumes you don't mind fragmenting your platform, but that's what inotify is there for.
And then more correctly but with correspondingly less chance of you getting a go-ahead, if you're sitting monitoring a directory with 20K files in it it is probably time to evolve your system architecture. Not knowing all that much more about your application, it sounds like a message queue might be worth looking at.