如何编写 Perl 脚本来过滤掉被篡改的数字图片?

发布于 2024-08-08 17:21:04 字数 926 浏览 6 评论 0原文

昨晚睡觉前,我又浏览了《学习Perl》的标量数据部分,看到了下面这句话:

字符串中包含任何字符的能力意味着您可以将原始二进制数据作为字符串创建、扫描和操作。

我立即想到了一个想法,我实际上可以让 Perl 扫描我存储在硬盘上的图片,以检查它们是否包含字符串 Adob​​e。似乎通过这样做,我可以看出其中哪些是经过修图的。因此,我尝试实现这个想法,并提出了以下代码:

#!perl
use autodie;
use strict;
use warnings;

{
    local $/="\n\n";
    my $dir = 'f:/TestPix/';
    my @pix = glob "$dir/*";

    foreach my $file (@pix) {
        open  my $pic,'<',  "$file";

        while(<$pic>) {
            if (/Adobe/) {
                print "$file\n";
            }
        }
    }
}

令人兴奋的是,该代码似乎确实有效,它可以过滤掉经过 Photoshop 处理的图片。但问题是许多图片是由其他实用程序编辑的。我想我有点被困在那里了。我们是否有一些简单但通用的方法来判断数码图片是否已被编辑,例如

if (!= /the origianl format/) {...}

或者我们是否只需添加更多条件? 有什么想法吗

if (/Adobe/|/ACDSee/|/some other picture editors/)

对此 ?或者由于我的编程知识极其有限,我是否过于简单化了?

一如既往,感谢您的指导。

Last night before going to bed, I browsed through the Scalar Data section of Learning Perl again and came across the following sentence:

the ability to have any character in a string means you can create, scan, and manipulate raw binary data as strings.

An idea immediately hit me that I could actually let Perl scan the pictures that I have stored on my hard disk to check if they contain the string Adobe. It seems by doing so, I can tell which of them have been photoshopped. So I tried to implement the idea and came up with the following code:

#!perl
use autodie;
use strict;
use warnings;

{
    local $/="\n\n";
    my $dir = 'f:/TestPix/';
    my @pix = glob "$dir/*";

    foreach my $file (@pix) {
        open  my $pic,'<',  "$file";

        while(<$pic>) {
            if (/Adobe/) {
                print "$file\n";
            }
        }
    }
}

Excitingly, the code seems to be really working and it does the job of filtering out the pictures that have been photoshopped. But problem is many pictures are edited by other utilities. I think I'm kind of stuck there. Do we have some simple but universal method to tell if a digital picture has been edited or not, something like

if (!= /the origianl format/) {...}

Or do we simply have to add more conditions? like

if (/Adobe/|/ACDSee/|/some other picture editors/)

Any ideas on this? Or am I oversimplifying due to my miserably limited programming knowledge?

Thanks, as always, for any guidance.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(10

梦冥 2024-08-15 17:21:04

Perl 中最好的选择可能是 ExifTool。这使您可以访问嵌入到图像中的任何非图像信息。然而,正如其他人所说,当然可以删除这些信息。

Your best bet in Perl is probably ExifTool. This gives you access to whatever non-image information is embedded into the image. However, as other people said, it's possible to strip this information out, of course.

囚我心虐我身 2024-08-15 17:21:04

我并不是说绝对没有办法检测图像的变化,但这个问题非常困难。

据我所知,唯一声称拥有答案的人是 Dr. Neal Krawetz,他声称图像的数字更改部分将具有与原始部分不同的压缩错误率。他声称以不同质量级别重新保存 JPEG 将突出这些差异。

在我的调查中,我还没有发现这种情况,但也许你可能会有更好的结果。

I'm not going to say there is absolutely no way to detect alterations in an image, but the problem is extremely difficult.

The only person I know of who claims to have an answer is Dr. Neal Krawetz, who claims that digitally altered parts of an image will have different compression error rates from the original portions. He claims that re-saving a JPEG at different quality levels will highlight these differences.

I have not found this to be the case, in my investigations, but perhaps you might have better results.

欢你一世 2024-08-15 17:21:04

不会。完美编辑的图像与从一开始就是这样的图像之间没有功能区别 - 毕竟,最终它只是一袋像素,以及您可以删除或伪造的任何其他元数据想。

No. There is no functional distinction between a perfectly edited image, and one which was the way it is from the start - it's all just a bag of pixels in the end, after all, and any other metadata you can remove or forge all you want.

素手挽清风 2024-08-15 17:21:04

用于编辑图像的图形程序的名称不是图像数据本身的一部分,而是所谓元数据的一部分 - 元数据可能存储在图像文件中,但正如其他人指出的那样,它既不是图像数据的一部分,也不是图像数据的一部分。必需的(因此有些程序可能不存储它,有些程序可能允许您选择不存储它)也不可靠 - 如果您伪造了图像,您也可能伪造了元数据。

所以你的问题的答案是“不,没有办法普遍判断图片是否被编辑过,尽管某些图像编辑软件可能会将其签名写入图像文件中,并且由于编辑人员的不小心而将其留在那里。

The name of the graphics program used to edit the image is not part of the image data itself but of something called meta data - which may be stored in the image file but, as others have noted, is neither required (so some programs may not store it, some may allow you an option of not storing it) nor reliable - if you forged an image, you might have forged the meta data as well.

So the answer to your question is "no, there's no way to universally tell if the pic was edited or not, although some image editing software may write its signature into the image file and it'll be left there by carelessness of the editing person.

我三岁 2024-08-15 17:21:04

如果您想了解有关 Perl 中图像处理的更多信息,您可以看看 CPAN 提供的一些优秀模块:

  • Image::Magick - 读取、操作和写入大量图像文件格式
  • GD - 使用大量图形基元创建彩色绘图,并以各种格式发出绘图。
  • GD::Graph - 创建图表
  • GD::Graph3d - 使用 GD 和 GD::Graph 创建 3D 图形

但是,还有其他实用程序可用于识别各种图像格式。这对于超级用户来说更像是一个问题,但是对于各种unix发行版,您可以使用file来识别许多不同类型的文件,对于 MacOSX,Graphic Converter 从未让我失望过。 (它甚至能够打开我从兽医那里得到的一张光盘上我的猫破碎的骨盆的奇怪的多文件 X 射线照片。)

If you're inclined to learn more about image processing in Perl, you could take a look at some of the excellent modules CPAN has to offer:

  • Image::Magick - read, manipulate and write of a large number of image file formats
  • GD - create colour drawings using a large number of graphics primitives, and emit the drawings in various formats.
  • GD::Graph - create charts
  • GD::Graph3d - create 3D Graphs with GD and GD::Graph

However, there are other utilities available for identifying various image formats. It's more of a question for Super User, but for various unix distros you can use file to identify many different types of files, and for MacOSX, Graphic Converter has never let me down. (It was even able to open the bizarre multi-file X-ray of my cat's shattered pelvis that I got on a disc from the vet.)

眼藏柔 2024-08-15 17:21:04

你怎么知道原始格式是什么?我很确定没有保证的方法可以判断图像是否已被修改。

我可以打开该文件(使用我最喜欢的编程语言和文件系统 API),然后随意地将我想要的任何内容写入该文件。只要我不搞砸文件格式,你就永远不会知道它发生了。

哎呀,我可以打印图像,然后将其扫描回来;你如何区分它和原作?

How would you know what the original format was? I'm pretty sure there's no guaranteed way to tell if an image has been modified.

I can just open the file (with my favourite programming language and filesystem API) and just write whatever I want into that file willy-nilly. As long as I don't screw something up with the file format, you'd never know it happened.

Heck, I could print the image out and then scan it back in; how would you tell it from an original?

一念一轮回 2024-08-15 17:21:04

正如其他人所说,无法知道图像是否被篡改。我猜你基本上想知道的是真实照片和经过增强或修改的照片之间的区别。

总是可以选择运行一些极其复杂的图像识别算法,该算法会分析图像中的每个像素,并执行一些非常复杂的操作来确定图像是否被篡改。该解决方案可能会涉及人工智能,它会检查数百万张经过修改和未经修改的照片,并从中学习。然而,这更多的是一个理论解决方案,不太实用……你可能只会在电影中看到它。开发起来将极其复杂,可能需要数年时间。即使你确实让这样的东西发挥作用,它可能仍然不会一直 100% 正确。我猜人工智能技术还没有达到那个水平,并且可能需要一段时间才能达到。

As other's have stated, there is no way to know if the image was doctored. I'm guessing what you basically want to know is the difference between a realistic photograph and one that has been enhanced or modified.

There's always the option of running some extremely complex image recognition algorithm that would analyze every pixel in your image and do some very complicated stuff to determine if the image was doctored or not. This solution would probably involve AI which would examine millions of photos that are both doctored and those that are not and learn from them. However, this is more of a theoretical solution and isn't very practical... you would probably only see it in movies. It would be extremely complex to develop and probably take years. And even if you did get something like this to work, it probably still wouldn't be 100% correct all the time. I'm guessing AI technology still isn't at that level and could take a while until it is.

在梵高的星空下 2024-08-15 17:21:04

exiftool 的一个不常见的功能允许您通过分析 JPEG 量化表(不依赖于图像元数据)来识别原始软件。它可以识别许多应用程序编写的表。请注意,某些相机可能使用与某些应用程序相同的量化表,因此这不是 100% 的解决方案,但值得研究。这是 exiftool 在两个图像上运行的示例,第一个图像是由 Photoshop 编辑的。

> exiftool -jpegdigest a.jpg b.jpg
======== a.jpg
JPEG Digest                     : Adobe Photoshop, Quality 10
======== b.jpg
JPEG Digest                     : Canon EOS 30D/40D/50D/300D, Normal
    2 image files read

即使元数据已被删除,这也将起作用。

A not-commonly-known feature of exiftool allows you to recognize the originating software through an analysis of the JPEG quantization tables (not relying on image metadata). It recognizes tables written by many applications. Note that some cameras may use the same quantization tables as some applications, so this isn't a 100% solution, but it is worth looking into. Here is an example of exiftool run on two images, the first was edited by photoshop.

> exiftool -jpegdigest a.jpg b.jpg
======== a.jpg
JPEG Digest                     : Adobe Photoshop, Quality 10
======== b.jpg
JPEG Digest                     : Canon EOS 30D/40D/50D/300D, Normal
    2 image files read

This will work even if the metadata has been removed.

滥情哥ㄟ 2024-08-15 17:21:04

现有的软件使用各种技术(压缩伪影、与相机数据库中的签名配置文件进行比较等)来分析实际图像数据以获取更改证据。如果您有权访问此类软件,并且您可用的软件提供了用于外部访问这些分析功能的 API,那么很有可能存在一个 Perl 模块,该模块将与该 API 进行交互,如果不存在此类模块,则它可以可能会很快被创建。

理论上,也可以直接在本机 Perl 中实现图像分析代码,但我不知道有人这样做过,我希望您最好编写一些低级且处理器密集型的代码使用完全编译的语言(例如,C/C++)而不是 Perl。

There is existing software out there which uses various techniques (compression artifacting, comparison to signature profiles in a database of cameras, etc.) to analyze the actual image data for evidence of alteration. If you have access to such software and the software available to you provides an API for external access to these analysis functions, then there's a decent chance that a Perl module exists which will interface with that API and, if no such module exists, it could probably be created rather quickly.

In theory, it would also be possible to implement the image analysis code directly in native Perl, but I'm not aware of anyone having done so and I expect that you'd be better off writing something that low-level and processor-intensive in a fully-compiled language (e.g., C/C++) rather than in Perl.

囍孤女 2024-08-15 17:21:04

http://www.impulseadventure.com/photo/jpeg-snoop.html
是一个几乎可以很好地完成工作的工具

如果有任何克隆,则像素密度或浓度有时会出现变化..在手动检查时
Photoshop 克隆区域将具有均匀的像素密度(我的意思是扫描图像的像素变化)

http://www.impulseadventure.com/photo/jpeg-snoop.html
is a tool that does the job almost good

If there has been any cloning , there is a variation in the pixel density..or concentration which sometimes shows up.. upon manual inspection
a Photoshop cloned area will have even pixel density(my meaning is variation of Pixels wrt a scanned image)

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文