裁剪图像时如何转换Yolo标签的协调?

发布于 2025-01-30 04:25:45 字数 318 浏览 3 评论 0原文

我创建了超过1200张带有Yolo检测标签的图像,问题是每个图像大小均为800x600,所有带有标签的对象都位于图像的中间。因此,我想裁剪其余部分,因为物体放在中间。 因此,图像的大小将大约是400x300(左,右,顶部,底部),但这些物体仍然位于中间。但是,除了再次标记以外,您如何转换或更改坐标呢?

# (used labelimg for yolo)
0 0.545000 0.722500 0.042500 0.091667
1 0.518750 0.762500 0.097500 0.271667

这是我的标签.txt之一。对不起,我的英语不好!

i've created over 1200 images with labels for yolo detection and the problem is every image size is 800x600 and all the objects with labels are in the middle of the image. so i wanna crop the rest of the part since objects are placed in the middle.
so the size of images would be something like 400x300 (crop left, right, top, bottom equally) but the objects will still be in the middle. but how do you convert or change the coordinates other than labeling all over again?

# (used labelimg for yolo)
0 0.545000 0.722500 0.042500 0.091667
1 0.518750 0.762500 0.097500 0.271667

heres one of my label .txt. sorry for my bad english!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

遮云壑 2025-02-06 04:25:45

我自己只是自己解决这个问题,所以这是一个完整的解释,说明为什么底部的公式是正确的。

让我们介绍如何格式化这些注释。

         x
 0--------------->1    
 |       .
 |   _________   
 |   |   .   | ^
 |   |   .   | |
y|...|...*   | h
 |   |       | |
 |   |_______| v
 |   <---w--->
 V 
 1     

每行都是5个数字,由一个空间溅出:nxyw h

  • n 类的类数:eg 0:“ tree”,1:“ car”等。
  • x 标记区域中心的x归一级坐标
  • y x标记区域中心的x归一级坐标
  • w h
  • h h标记区域

W和H的H归一化高度是指原始图像的宽度和高度。
归一化值相对于图像的宽度或高度。这是一个比例。例如,x值像x [px]/w [px] = x归一化一样归一化。

这样的一些优点:

  • 所有值均在0到1的范围内。很容易判断一个值是否超出框架&lt; 0或1。
  • 无论您是高档还是尺寸尺寸,测量的图像
  • 单位都无关紧要。

Y轴从上到下。其他一切都像您的标准坐标系。

现在要种植。让我们拍摄一棵树的照片:

      W
   0------>1
   |⠀⢀⣴⣶⣤⣄⠀| 
   |⢠⣿⣿⣿⣿⣿⡆|
H  |⠈⠿⠿⣯⠿⠿⠁|
   | ⠀⠀⣿⠀  |⠀⠀
   v  ⠐⠛⠃⠀ |⠀
   1--------

缩放

我们现在将裁剪到树图像的左上方。

 _____
 | ⣴⣶|  
 |⢠⣿⣿|
 -----

我们的新图像宽度W'现在仅是原始W的一半。也是H'= 0.5*h。旧图像的中心现在是左下角。我们知道图像的中心 p 在(0.5,0.5)。左下角在p'=(1,1)。如果我们在旧图像中裁剪(0.3,0.3)是新的底部,新坐标也将在(1,1)。 0.5也是½。为了从0.5到1,我们需要乘以2,为⅓ *3,¼ *4。我们看到,如果我们需要将宽度或高度降低为a/b,则需要乘以b/a。

翻译,

但我们也想移动图像的左上角,即我们的坐标Origin o
让裁剪到树干:

   O'---
H' |⠀⣿⠀|⠀⠀
   |⠐⠛⠃|
   ----q'
     W'

W是7个字符。新的宽度为W'是3。H= 5,H'是2。 2,3)在字符中,标准化为原始图像([![2 of 7] [2]] [2],[![3 of 5] [3]] [3] [3])或(0.285,0.6)。
o'是(0.285,0.6),但应为(0,0),因此我们在扩展新值之前分别将x和y减少0.285和0.6。这不是很有趣,因为0倍任何东西是0。

让我们做另一个示例。我们新的树干的新裁剪图像的右下角。让我们称这一点 q 我们知道,在我们的新剪辑映像系统中 q 必须是 q' =(1,1),它是毕竟底部。

我们已经测量了:
w = 7 w'= 3 h = 5 h'= 2
我们减少了多少比例的高度和宽度?

(ww'/w)为(7-3/7)是(4/7)或0.571。我们知道我们必须将W缩放为7/4或1.75或0.571^-1。 h:3/5 - &gt; 5/3 - &GT; 1.6重复。
让我们称这些 s 平原因子 s_h = 5/3和 s_w = 7/4

q'在( 5,7)在 o 中。让我们将我们的公式进行测试。
我们将小时起点以x/w为单位,y/h方向3个小时,让我们称之为ΔW= 2,ΔH= 3。

对于 q'_x ,我们从 q_x 中删除2,因为ΔW= 2。我们得到5-2 = 3。现在,我们通过除以5来归一化3。因此,我们得到 q_x 为3/5。现在,我们按 s_h = 5/3进行扩展,是的5/3倍3/5确实是1。现在我们检查了逻辑,我们可以编写算法。

我们已经具有归一化值的算法

,因此问题更简单。

对于原始的点 p ,我们可以在这样的新图像中计算 p'

p` =(x',y')=(((x-ΔW)* s_w ),(y-ΔH)* s_h
用:ΔW= abs(ww'),ΔH=
abs(hh'),s_w = w/Δw,s_h = h/δhh'= h * s_h w'= w * s_w

in Python:

    def transpose_annot(x_c, y_c, w_c,h_c,annnotations):
        # c : cropped area
    
        # s_w scale width
        s_w = 1/w_c
        # s_w scale height
        s_h = 1/h_c
        new_annots=list()
    
        for annot in annnotations:
      
            try:
                n,x, y, w, h = annot # check if n/label is given
            except Exception:
                x, y, w, h = annot
            w_ = w*s_w
            h_ = h*s_h
            delta_x= x-x_c
            delta_y=y-y_c
            # center of cropping area is new center of image
            # we just scale the image accordingly
            x_ = 0.5 + delta_x * s_w
            y_ = 0.5 + delta_y * s_h
            if n==None:
                new_annots.append((x_, y_, w_, h_))
            else:
                new_annots.append((n,x_, y_, w_, h_))
            print(x_, y_, w_, h_)
        return new_annots

纠正注释

我们可以裁剪我们需要删除的注释,或者需要部分调整为部分裁剪。

如前所述,所有值必须在间隔[0,1]中。

完全裁剪的注释将具有1+ΔW/2&gt; x&lt;ΔW/2和1+ΔW/2&gt; y&lt&lt&lt&lt&lt;ΔH/2

部分裁剪。

如果您想在仅1/4或更少的区域可见或掉落注释的注释中,则会 范围[0,25,1)将更加复杂。

         x
     _________   
     |   .   |
     |   .   |
 y...|.0-*---|-------->1
     | |     | h
     |_______|
       |  w
       V 
       1     

裁剪图像中的交叉区域

我们可以将此问题视为计算两个矩形之间的交叉区域。为方便起见,该功能还返回框架中面积的百分比。

def new_annotation_area(x, y, w, h):

    # ________
    # |  a   |
    # |   ___|______
    # |   |c |     |
    # |___|__|  b  |
    #     |________|
    # a is coordinate system (given)
    # b is the annotation in coordinate system
    # c is the intersection area
    a_x = 0.5
    a_y = 0.5
    a_w = 1
    a_h = 1
    a_max_x = a_x + a_w / 2
    a_min_x = a_x - a_w / 2

    b_max_x = x + w / 2
    b_min_x = x - w / 2

    # from the one dimensional case
    # how much do two lines overlap/intersect?
    # it is easy to get to the area
    #  a_min_x----------a_max_X
    #        b_min_X----------b_max_x
    #        c_min_x----c_max_x

    c_min_x = max(a_min_x, b_min_x)
    c_max_x = min(a_max_x, b_max_x)
    c_len_x = c_max_x - c_min_x

    a_max_y = a_y + a_h / 2
    a_min_y = a_y - a_h / 2

    b_max_y = y + h / 2
    b_min_y = y - h / 2

    c_min_y = max(a_min_y, b_min_y)
    c_max_y = min(a_max_y, b_max_y)
    c_len_y = c_max_y - c_min_y
    area = c_len_y * c_len_x

    c_w = c_len_x
    c_h = c_len_y
    c_x = c_min_x + 0.5 * c_w
    c_y = c_min_y + 0.5 * c_h

    return area/(w*h), (c_x, c_y, c_w, c_h)

I was just working this out myself, so here is a complete explanation of why the formula at the bottom is correct.

let's go over how these Annotations are formatted.

         x
 0--------------->1    
 |       .
 |   _________   
 |   |   .   | ^
 |   |   .   | |
y|...|...*   | h
 |   |       | |
 |   |_______| v
 |   <---w--->
 V 
 1     

Each line is 5 numbers sperated by a space: n x y w h with

  • n number of your class e.g. 0:"tree",1:"car" etc.
  • x the x normalized coordinate of the center of your marked area
  • y the x normalized coordinate of the center of your marked area
  • w the h normalized width of the marked area
  • h the h normalized height of the marked area

W and H mean the width and height of the original image.
A normalized value is relative to the width or height of the image.Not in pixels or other unit. It a proportion. For example the x value is normalized like this x[px]/W[px] = x normalized.

a few advantages of this:

  • all values are in the range of 0 to 1. It is easy to tell if a value is out of frame <0 or >1.
  • does not matter whether you upscale or downscale the image
  • unit of measurement is irrelevant.

The y axes goes from top to bottom. everything else is like your standard coordinate system.

Now to cropping. let's take this picture of a tree:

      W
   0------>1
   |⠀⢀⣴⣶⣤⣄⠀| 
   |⢠⣿⣿⣿⣿⣿⡆|
H  |⠈⠿⠿⣯⠿⠿⠁|
   | ⠀⠀⣿⠀  |⠀⠀
   v  ⠐⠛⠃⠀ |⠀
   1--------

scaling

We will now crop to the top left quarter of the tree image.

 _____
 | ⣴⣶|  
 |⢠⣿⣿|
 -----

our new image width W' is now only half of the original W. also H'= 0.5*H. The center of the old image is now the bottom left corner. We know the center of the image p is at (0.5,0.5). The bottom left corner is at p' =(1,1). If we would crop so (0.3,0.3) in the old image is the new bottom richt the new coordinate would also be at (1,1). 0.5 is also ½ . To get from 0.5 to 1 we need to multiply by 2, for ⅓ *3 , ¼ *4 . We see that if we reduce the the width or height by a/b be need to multiply by b/a.

translation

But we also want to move the top left of the image, our coordinate origin O.
Lets crop to the tree trunk:

   O'---
H' |⠀⣿⠀|⠀⠀
   |⠐⠛⠃|
   ----q'
     W'

W is 7 characters. the new width is W' is 3. H=5 and H' is 2. The new origin O is (0,0) of course and O' is at (2,3) in characters, normalized to the original image ([![2 over 7][2]][2], [![3 over 5][3]][3]) or (0.285,0.6).
O' is (0.285,0.6) but should be (0,0) so we reduce by x and y by 0.285 and 0.6 respectively before we scale the new value. This is not very interesting because 0 times anything is 0.

Let's do another example. the bottom right of our new cropped image of the tree trunk. Let's call this point q we know that q in our new system of the cropped image must be q' =(1,1) , it's the bottom right after all.

We already measured:
W=7 W'=3 H=5 H'=2
By how much did we reduce height and width as a proportion?

(W-W'/W) is (7-3/7) is (4/7) or 0.571 . We know we have to scale W by 7/4 or 1.75 or 0.571^-1 . For H : 3/5 -> 5/3 -> 1.6 repeating.
lets call these scaling factors s_h =5/3 and s_w=7/4

q' is at (5,7) in O . lets put our formula to the test.
we moved hour origin by 2 in x/w and 3 in y/h direction lets call this Δw=2 and Δh=3.

For q'_x we remove 2 from q_x because Δw=2. we get 5-2=3. now we normalize 3 by dividing by 5. so we get q_x is 3/5. now we scale by s_h= 5/3 and yes 5/3 times 3/5 is indeed 1. Now that we checked our logic we can write an algorithm.

The algorithm

We already have normalized values so the matter is simpler.

For a point p in the original we can calculate p' in the new image like this:

p`= (x',y')=((x -Δw)* s_w),(y -Δh)* s_h)
with: Δw = abs(W-W'),Δh =
abs(H-H') , s_w= W/Δw , s_h= H/Δh h'= h * s_h w'= w * s_w

in python:

    def transpose_annot(x_c, y_c, w_c,h_c,annnotations):
        # c : cropped area
    
        # s_w scale width
        s_w = 1/w_c
        # s_w scale height
        s_h = 1/h_c
        new_annots=list()
    
        for annot in annnotations:
      
            try:
                n,x, y, w, h = annot # check if n/label is given
            except Exception:
                x, y, w, h = annot
            w_ = w*s_w
            h_ = h*s_h
            delta_x= x-x_c
            delta_y=y-y_c
            # center of cropping area is new center of image
            # we just scale the image accordingly
            x_ = 0.5 + delta_x * s_w
            y_ = 0.5 + delta_y * s_h
            if n==None:
                new_annots.append((x_, y_, w_, h_))
            else:
                new_annots.append((n,x_, y_, w_, h_))
            print(x_, y_, w_, h_)
        return new_annots

correcting annotations

We could crop out annotations that we need to drop, or adjust to being partially cropped out.

As mentioned before all values must be in the interval [0,1].

Completely cropped out annotations will have 1+Δw/2>x<Δw/2 and 1+Δw/2>y<Δh/2

partially cropped

if you want to include annotations with only 1/4 or less area visible or drop annotations in the range [0,25,1) it will be more complicated.

         x
     _________   
     |   .   |
     |   .   |
 y...|.0-*---|-------->1
     | |     | h
     |_______|
       |  w
       V 
       1     

intersection area in cropped image

we can view this problem as calculating the intersection area between two rectangles. For convenience the function also returns the percentage of area in frame.

def new_annotation_area(x, y, w, h):

    # ________
    # |  a   |
    # |   ___|______
    # |   |c |     |
    # |___|__|  b  |
    #     |________|
    # a is coordinate system (given)
    # b is the annotation in coordinate system
    # c is the intersection area
    a_x = 0.5
    a_y = 0.5
    a_w = 1
    a_h = 1
    a_max_x = a_x + a_w / 2
    a_min_x = a_x - a_w / 2

    b_max_x = x + w / 2
    b_min_x = x - w / 2

    # from the one dimensional case
    # how much do two lines overlap/intersect?
    # it is easy to get to the area
    #  a_min_x----------a_max_X
    #        b_min_X----------b_max_x
    #        c_min_x----c_max_x

    c_min_x = max(a_min_x, b_min_x)
    c_max_x = min(a_max_x, b_max_x)
    c_len_x = c_max_x - c_min_x

    a_max_y = a_y + a_h / 2
    a_min_y = a_y - a_h / 2

    b_max_y = y + h / 2
    b_min_y = y - h / 2

    c_min_y = max(a_min_y, b_min_y)
    c_max_y = min(a_max_y, b_max_y)
    c_len_y = c_max_y - c_min_y
    area = c_len_y * c_len_x

    c_w = c_len_x
    c_h = c_len_y
    c_x = c_min_x + 0.5 * c_w
    c_y = c_min_y + 0.5 * c_h

    return area/(w*h), (c_x, c_y, c_w, c_h)
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文