保存RGBD作为单个图像

发布于 2025-01-22 16:32:34 字数 1077 浏览 0 评论 0原文

我使用了此代码 https://www.programmersought.com/article.com/article.com/article.com/article/Article/877777777743686326/ to通过集成RGB和深度图像来创建RGBD 现在,我想知道是否可以将该RGBD文件保存为图像(JPEG,PNG ...) 我尝试了它,但是通过使用imageio.imwrite(),plt.imsave(),cv2.imwrite()...

scale = (64, 1216)
 
resize_img = transforms.Resize(scale, Image.BILINEAR)
resize_depth = transforms.Resize(scale, Image.NEAREST)
to_tensor = transforms.ToTensor()
 
img_id = 0
 
# load image and resize
img = Image.open('RGB_image.jpg')
img = resize_img(img)
img = np.array(img)
 
# load depth and resize
depth = Image.open('depth_image.png')
depth = resize_depth(depth)
depth = np.array(depth)
depth = depth[:, :, np.newaxis]
 
# tensor shape and value, normalization
img = Image.fromarray(img).convert('RGB')
img = to_tensor(img).float()
 
depth = depth / 65535
depth = to_tensor(depth).float()

rgbd = torch.cat((img, depth), 0)
print("\n\nRGBD shape")
print(rgbd.shape)

i used this code https://www.programmersought.com/article/8773686326/ to create RGBD by integrating RGB and depth image
now i wonder if that RGBD file could be saved as single image (jpeg,png...)
i tried it, but unsuccessfully, by using imageio.imwrite(), plt.imsave(), cv2.imwrite()... likely due dimension [4,64,1216], so is there a way to make it happen?

scale = (64, 1216)
 
resize_img = transforms.Resize(scale, Image.BILINEAR)
resize_depth = transforms.Resize(scale, Image.NEAREST)
to_tensor = transforms.ToTensor()
 
img_id = 0
 
# load image and resize
img = Image.open('RGB_image.jpg')
img = resize_img(img)
img = np.array(img)
 
# load depth and resize
depth = Image.open('depth_image.png')
depth = resize_depth(depth)
depth = np.array(depth)
depth = depth[:, :, np.newaxis]
 
# tensor shape and value, normalization
img = Image.fromarray(img).convert('RGB')
img = to_tensor(img).float()
 
depth = depth / 65535
depth = to_tensor(depth).float()

rgbd = torch.cat((img, depth), 0)
print("\n\nRGBD shape")
print(rgbd.shape)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

海之角 2025-01-29 16:32:34

我们可以将深度作为RGBA像素格式的图像的α通道保存。

Alpha通道应用透明度通道,但我们可以将其用作存储RGB和深度的第4个通道。

由于深度可能需要高精度 - 可能需要float32精确度,因此我建议使用图像格式。
对于与OpenExr格式的兼容性,我们可以将所有通道转换为float32在范围[0,1]中。

注意:

  • 我意识到

以下代码示例使用OpenCV而不是枕头。
我认为OpenCV支持EXR文件格式,但是我的OpenCV Python版本不是在EXR支持的情况下构建的。我改用Imageio软件包。


转换和写入RGB和深度到EXR文件的阶段:

  • 加载RGB图像,调整它并转换为float:

      img = cv2.imread('rgb_image.jpg')#由于OPENCV约定,通道顺序为BGR。
     img = cv2.resize(img,比例,插值= cv2.inter_linear)
     img = img.astype(np.float32) / 255#转换为范围[0,1]
     
  • 加载深度图像,调整大小并转换为float:

      depth = cv2.imread('depth_image.png',cv2.imread_unchanged)#假设depth_image.png是16位灰度。
     深度= cv2.Resize(深度,比例,插值= Cv2.inter_nearest)
     depth = depth.astype(np.float32) / 65535#转换为范围[0,1]
     
  • 合并img(3个频道)和depth(1个频道)到4个频道:
    形状将为(1216,64,4)(应用OpenCV BGRA颜色约定)。

      bgrd = np.dstack(((img,depth)))
     
  • 编写bgrd到EXR文件:
    如果使用OpenExr构建OpenCV,我们可以使用:cv2.imwrite('rgbd.exr',bgrd)
    如果我们使用Imageio,则最好在保存之前从BGRA转换为RGBA:

      rgbd = cv2.cvtcolor(bgrd,cv2.color_bgra2rgba)
     image.imwrite('rgbd.exr',rgbd)
     

代码示例(转换RGB和RGBA EXR文件,然后读取并转换):

import numpy as np
import cv2
import imageio

scale = (64, 1216)
 
# load image and resize
img = cv2.imread('RGB_image.jpg')  # The channels order is BGR due to OpenCV conventions.
img = cv2.resize(img, scale, interpolation=cv2.INTER_LINEAR)
img = img.astype(np.float32) / 255  # Convert to float in range [0, 1]
 
# load depth and resize
depth = cv2.imread('depth_image.png', cv2.IMREAD_UNCHANGED)  # Assume depth_image.png is 16 bits grayscale.
depth = cv2.resize(depth, scale, interpolation=cv2.INTER_NEAREST)

if depth.ndim == 3:
    depth = depth[:, :, 0]  # Keep one channel if depth has 3 channels?  depth = depth[:, :, np.newaxis]
 
depth = depth.astype(np.float32) / 65535  # Convert to float in range [0, 1]

# Use the depth channel as alpha channel (the channel order is BGRA - applies OpenCV conventions).
bgrd = np.dstack((img, depth))

print("\n\nRGBD shape")
print(bgrd.shape)

# Save the data to exr file (the color format of the exr file is RGBA).
# Error: cv::initOpenEXR imgcodecs: OpenEXR codec is disabled.
#cv2.imwrite('rgbd.exr', bgrd)

# https://stackoverflow.com/questions/45482307/save-float-array-to-image-with-exr-format
rgbd = cv2.cvtColor(bgrd, cv2.COLOR_BGRA2RGBA)
imageio.imwrite('rgbd.exr', rgbd)

################################################################################
# Reading the data:  

#bgrd = cv2.imread('rgbd.exr')  # Error: cv::initOpenEXR imgcodecs: OpenEXR codec is disabled.
rgbd = imageio.imread('rgbd.exr')

img = bgrd[:, :, 0:3]  # First 3 channels are the image.
depth = bgrd[:, :, 3]  # Last channel is the depth

img = (img*255).astype(np.uint8)  # Convert back to uint8
#depth = (depth*65535).astype(np.uint16)  # Convert back to uint16 (if required).

# Show images for testing:
cv2.imshow('img', cv2.cvtColor(img, cv2.COLOR_RGBA2RGB))
cv2.imshow('depth', depth)
cv2.waitKey()
cv2.destroyAllWindows()

注意:

  • 您可能需要进行一些修改 - 我不确定尺寸(64x12161216x64),并且不确定代码depth = depth = depth [:,:,np.newaxis]
    depth_image.png的格式可能是错误的。

更新:

将16位RGBA保存到PNG文件:

而不是使用EXR文件和float32 Pixel格式...
我们可以使用png文件和uint16像素格式。

PNG文件的像素格式将为RGBA(RGB和Alpha-透明频道)。
每个颜色通道将为16位(2个字节)。
Alpha通道存储深度图(以uint16格式)。

  •   img = img.astype(np.uint16)*256
     
  • Merge img (3个频道)和深度(1个通道)到4个频道:

      bgrd = np.dstack(((img,depth)))
     
  • 将合并图像保存到PNG文件:

      cv2.imwrite('rgbd.png',bgrd)
     

代码示例(第二部分读取并显示用于测试):

import numpy as np
import cv2

scale = (64, 1216)

# load image and resize
img = cv2.imread('RGB_image.jpg')  # The channels order is BGR due to OpenCV conventions.
img = cv2.resize(img, scale, interpolation=cv2.INTER_LINEAR)

# Convert the image to from 8 bits per color channel to 16 bits per color channel
# Notes:
# 1. We may choose not to scale by 256, the scaling is used only for viewers that expects [0, 65535] range.
# 2. Consider that most image viewers refers the alpha (transparency) channel, so image is going to look strange.
img = img.astype(np.uint16)*256

# load depth and resize
depth = cv2.imread('depth_image.png', cv2.IMREAD_UNCHANGED)  # Assume depth_image.png is 16 bits grayscale.
depth = cv2.resize(depth, scale, interpolation=cv2.INTER_NEAREST)

if depth.ndim == 3:
    depth = depth[:, :, 0]  # Keep one channel if depth has 3 channels?  depth = depth[:, :, np.newaxis]

if depth.dtype != np.uint16:
    depth = depth.astype(np.uint16)  # The depth supposed to be uint16, so code should not reach here.

# Use the depth channel as alpha channel (the channel order is BGRA - applies OpenCV conventions).
bgrd = np.dstack((img, depth))

print("\n\nRGBD shape")
print(bgrd.shape)  # (1216, 64, 4)

# Save the data to PNG file (the pixel format of the PNG file is 16 bits RGBA).
cv2.imwrite('rgbd.png', bgrd)


# Testing:
################################################################################
# Reading the data:
bgrd = cv2.imread('rgbd.png', cv2.IMREAD_UNCHANGED)

img = bgrd[:, :, 0:3]  # First 3 channels are the image.
depth = bgrd[:, :, 3]  # Last channel is the depth

#img = (img // 256).astype(np.uint8)  # Convert back to uint8

# Show images for testing:
cv2.imshow('img', img)
cv2.imshow('depth', depth)
cv2.waitKey()
cv2.destroyAllWindows()

We may save the depth as an alpha channel of an image in RGBA pixel format.

The alpha channel applies transparency channel, but we may use it as 4'th channel for storing RGB and Depth.

Since the depth may require high precision - may require float32 precision, I suggest using OpenEXR image format.
For compatibility with OpenEXR format we may convert all channels to float32 in range [0, 1].

Note:

  • I realized that Open3D supports RGBD images, but it looks like it doesn't support reading and writing the RGB and depth to a single file.

The following code sample uses OpenCV instead of Pillow.
I thought OpenCV supports EXR file format, but my OpenCV Python version is not built with EXR support. I used ImageIO package instead.


Stages for converting and writing RGB and depth to an EXR file:

  • Load RGB image, resize it and convert to float:

     img = cv2.imread('RGB_image.jpg')  # The channels order is BGR due to OpenCV conventions.
     img = cv2.resize(img, scale, interpolation=cv2.INTER_LINEAR)
     img = img.astype(np.float32) / 255  # Convert to float in range [0, 1]
    
  • Load depth image, resize and convert to float:

     depth = cv2.imread('depth_image.png', cv2.IMREAD_UNCHANGED)  # Assume depth_image.png is 16 bits grayscale.
     depth = cv2.resize(depth, scale, interpolation=cv2.INTER_NEAREST)
     depth = depth.astype(np.float32) / 65535  # Convert to float in range [0, 1]
    
  • Merge img (3 channels) and depth (1 channel) to 4 channels:
    The shape is going to be (1216, 64, 4) (applies OpenCV BGRA color convention).

     bgrd = np.dstack((img, depth))
    
  • Writing bgrd to EXR file:
    If OpenCV is build with OpenEXR, we may use: cv2.imwrite('rgbd.exr', bgrd).
    If we use ImageIO, we better to convert from BGRA to RGBA before saving:

     rgbd = cv2.cvtColor(bgrd, cv2.COLOR_BGRA2RGBA)
     imageio.imwrite('rgbd.exr', rgbd)
    

Code sample (convert RGB and Range to RGBA EXR file, then read and convert back):

import numpy as np
import cv2
import imageio

scale = (64, 1216)
 
# load image and resize
img = cv2.imread('RGB_image.jpg')  # The channels order is BGR due to OpenCV conventions.
img = cv2.resize(img, scale, interpolation=cv2.INTER_LINEAR)
img = img.astype(np.float32) / 255  # Convert to float in range [0, 1]
 
# load depth and resize
depth = cv2.imread('depth_image.png', cv2.IMREAD_UNCHANGED)  # Assume depth_image.png is 16 bits grayscale.
depth = cv2.resize(depth, scale, interpolation=cv2.INTER_NEAREST)

if depth.ndim == 3:
    depth = depth[:, :, 0]  # Keep one channel if depth has 3 channels?  depth = depth[:, :, np.newaxis]
 
depth = depth.astype(np.float32) / 65535  # Convert to float in range [0, 1]

# Use the depth channel as alpha channel (the channel order is BGRA - applies OpenCV conventions).
bgrd = np.dstack((img, depth))

print("\n\nRGBD shape")
print(bgrd.shape)

# Save the data to exr file (the color format of the exr file is RGBA).
# Error: cv::initOpenEXR imgcodecs: OpenEXR codec is disabled.
#cv2.imwrite('rgbd.exr', bgrd)

# https://stackoverflow.com/questions/45482307/save-float-array-to-image-with-exr-format
rgbd = cv2.cvtColor(bgrd, cv2.COLOR_BGRA2RGBA)
imageio.imwrite('rgbd.exr', rgbd)

################################################################################
# Reading the data:  

#bgrd = cv2.imread('rgbd.exr')  # Error: cv::initOpenEXR imgcodecs: OpenEXR codec is disabled.
rgbd = imageio.imread('rgbd.exr')

img = bgrd[:, :, 0:3]  # First 3 channels are the image.
depth = bgrd[:, :, 3]  # Last channel is the depth

img = (img*255).astype(np.uint8)  # Convert back to uint8
#depth = (depth*65535).astype(np.uint16)  # Convert back to uint16 (if required).

# Show images for testing:
cv2.imshow('img', cv2.cvtColor(img, cv2.COLOR_RGBA2RGB))
cv2.imshow('depth', depth)
cv2.waitKey()
cv2.destroyAllWindows()

Note:

  • You may have to make few modifications - I was not sure about the dimensions (64x1216 or 1216x64), and not sure about the code depth = depth[:, :, np.newaxis].
    I may be wrong about the format of depth_image.png.

Update:

Saving 16 bits RGBA to PNG file:

Instead of using EXR file and float32 pixel format...
We may use PNG file and uint16 pixel format.

The pixel format of the PNG file is going to be RGBA (RGB and Alpha - transparency channel).
Each color channel is going to be 16 bits (2 bytes).
The alpha channel stores the depth map (in uint16 format).

  • Convert img to uint16 (we may choose not to scale by 256):

     img = img.astype(np.uint16)*256
    
  • Merge img (3 channels) and depth (1 channel) to 4 channels:

     bgrd = np.dstack((img, depth))
    
  • Save the merged image to PNG file:

     cv2.imwrite('rgbd.png', bgrd)
    

Code sample (the second part reads and display for testing):

import numpy as np
import cv2

scale = (64, 1216)

# load image and resize
img = cv2.imread('RGB_image.jpg')  # The channels order is BGR due to OpenCV conventions.
img = cv2.resize(img, scale, interpolation=cv2.INTER_LINEAR)

# Convert the image to from 8 bits per color channel to 16 bits per color channel
# Notes:
# 1. We may choose not to scale by 256, the scaling is used only for viewers that expects [0, 65535] range.
# 2. Consider that most image viewers refers the alpha (transparency) channel, so image is going to look strange.
img = img.astype(np.uint16)*256

# load depth and resize
depth = cv2.imread('depth_image.png', cv2.IMREAD_UNCHANGED)  # Assume depth_image.png is 16 bits grayscale.
depth = cv2.resize(depth, scale, interpolation=cv2.INTER_NEAREST)

if depth.ndim == 3:
    depth = depth[:, :, 0]  # Keep one channel if depth has 3 channels?  depth = depth[:, :, np.newaxis]

if depth.dtype != np.uint16:
    depth = depth.astype(np.uint16)  # The depth supposed to be uint16, so code should not reach here.

# Use the depth channel as alpha channel (the channel order is BGRA - applies OpenCV conventions).
bgrd = np.dstack((img, depth))

print("\n\nRGBD shape")
print(bgrd.shape)  # (1216, 64, 4)

# Save the data to PNG file (the pixel format of the PNG file is 16 bits RGBA).
cv2.imwrite('rgbd.png', bgrd)


# Testing:
################################################################################
# Reading the data:
bgrd = cv2.imread('rgbd.png', cv2.IMREAD_UNCHANGED)

img = bgrd[:, :, 0:3]  # First 3 channels are the image.
depth = bgrd[:, :, 3]  # Last channel is the depth

#img = (img // 256).astype(np.uint8)  # Convert back to uint8

# Show images for testing:
cv2.imshow('img', img)
cv2.imshow('depth', depth)
cv2.waitKey()
cv2.destroyAllWindows()
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文