当文件以上传形式加载时,是否可以在数据库中搜索相关文件
我有一个关于将文件上传到该网站的网站的想法。但我想要的 - 并且想知道是否可能 - 是当用户单击“浏览”并选择文件时,网站是否可以在将文件上传到网站之前自动扫描网站的数据库中的类似文件地点。有点类似于当您在此网站上提出问题时自动出现的“相关问题”。
I have an idea for a site that involves uploading files to the site. But what I'd like - and wondering if it's possible - is when a user clicks on "Browse", and selects the file, if it's possible for the site to automatically scan the site's database for similar files before they upload the file to the site. Kind of similar to the automatic "Related Questions" when you act a question on this site.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
当然,这是可能的。但您必须提出自己的定义以及查找相似内容的算法。
文件类型差异
不同的文件类型应该进行不同的比较。例如,文本文件非常适合通过 diff 来查找相似的文件,但比较相似的图像或视频要困难得多。
比较困难
此外,与大量文件进行比较是一件非常昂贵的事情,因为它通常是成对完成的。一些索引方法可以帮助提高搜索效率,但我没有看到快速完成此操作的简单方法。
众包来源替代方案
另一种替代方案是让网站的用户指出相似之处,这样您就可以简单地显示被投票为相似的最受欢迎的文件的列表。当然,这在上传新文件时没有帮助,但它可以帮助您深入了解用户发现的相似内容。
许多网站比较内容相似性的做法是允许用户标记项目。如果一件物品与另一件物品有许多相同的标签,那么它们很可能是相似的。这可能是最简单的方法。
这还有一个好处,即任何内容类型都可以与任何其他内容类型进行比较。因此,与视频具有相同标签的文本文件可以呈现为类似的。
Sure, that's possible. But you'll have to come up with your own definition, as well as algorithm for finding what's similar.
File Type differences
Different file types should be compared differently. For example a text file would be well suited to a diff to find similar files, but comparing images or videos that are similar is considerably more difficult.
Difficulty of comparisons
Also, comparing against a large number of files is a very expensive thing to do since it's typically done pair-wise. Some indexing methods could help the efficiency of the search though, but I don't see an easy way to do this quickly.
Crowd Source Alternative
Another alternative would be to have the users of the site point out the similarities, that way you simply display a list of the most popular files that were voted similar. Of course, this doesn't help when uploading a new file, but it can help you gain insight as to what users find similar.
What many sites do to compare similarity of content is to allow users to tag items. If one item shares many of the same tags with another, they're likely similar. This is probably the easiest approach.
This also has the benefit that any content type can be compared to any other content type. So text files that have the same tags as a video can be presented as similar.
无需上传文件即可获取文件名,因此您可以根据文件名进行搜索。内容只有在上传后才可用。
It's possible to get the file name without uploading the file so you can do the search based on the file name. The content would only be available after the upload.