从python的for循环中获取基本统计信息

发布于 2025-02-13 17:08:47 字数 2259 浏览 0 评论 0原文

我没有很多python的经验，我正在尝试一些相当复杂的事情，所以请原谅我的凌乱代码。我有一些来自栅格层（TIF）的rasterio生成的数组，最终我想从每个光栅层获得一些基本统计信息，并将其附加到数据框架上。我试图使其尽可能地自动化，因为我有很多层要进行。另一个障碍是根据每个光栅更改列名。我设法处理了几乎所有内容，问题是当我将其插入循环中，而不是统计值时，我得到了：＆lt; lt; lt; 很高兴帮助解决这个问题。

import rasterio
from osgeo import gdal
import numpy as np
import pandas as pd

#open all files **I have a lot of folders like that one to open
#Grifin data read
Gr_1A_hh_path = r"E:\SAOCOM\1A1B\Images\Griffin\130122\Source\Data\gtc-acqId0000705076-a-sm9-2201150146-hh-h.tif"
Gr_1A_hh = rasterio.open(Gr_1A_hh_path)

Gr_1A_vv_path = r"E:\SAOCOM\1A1B\Images\Griffin\130122\Source\Data\gtc-acqId0000705076-a-sm9-2201150146-vv-h.tif"
Gr_1A_vv = rasterio.open(Gr_1A_vv_path)

Gr_1A_vh_path = r"E:\SAOCOM\1A1B\Images\Griffin\130122\Source\Data\gtc-acqId0000705076-a-sm9-2201150146-vh-h.tif"
Gr_1A_vh = rasterio.open(Gr_1A_vh_path)

Gr_1A_hv_path = r"E:\SAOCOM\1A1B\Images\Griffin\130122\Source\Data\gtc-acqId0000705076-a-sm9-2201150146-hv-h.tif"
Gr_1A_hv = rasterio.open(Gr_1A_hv_path)

#reading all the rasters as arrays
array_1A_hh= Gr_1A_hh.read()
array_1A_vv= Gr_1A_vv.read()
array_1A_vh= Gr_1A_vh.read()
array_1A_hv= Gr_1A_hv.read()

#creating a dictionary so that each array would have a name that would be used as column name
A2 = {
   "HH":array_1A_hh,
   "VV":array_1A_vv,
   "VH":array_1A_vh,
   "HV":array_1A_hv}

df= pd.DataFrame(index=["min","max","mean","medien"])
for name, pol in A2.items():
   for band in pol:
       stats = {
       "min":band.min(),
       "max":band.max(),
       "mean":band.mean(),
       "median":np.median(band)}
       df[f"{name}"]=stats.values

OUTPUT:
df
                                                      HH  ...                                                 HV
min     <built-in method values of dict object at 0x00...  ...  <built-in method values of dict object at 0x00...
max     <built-in method values of dict object at 0x00...  ...  <built-in method values of dict object at 0x00...
mean    <built-in method values of dict object at 0x00...  ...  <built-in method values of dict object at 0x00...
medien  <built-in method values of dict object at 0x00...  ...  <built-in method values of dict object at 0x00...

原文

I don't have a lot of python experience and I'm trying something rather complicated for me, so excuse my messy code. I have a few arrays that were generated with rasterio from raster layers (tif), and ultimately I want to get some basic statistics from each raster layer and append it to a data frame.
I'm trying to get it as automated as possible since I have a lot of layer to go through. another obstacle was getting the column name to change according to each raster.
I managed to work almost everything out, the problem is when I insert it into a for loop, instead of stats values, I get this: <built-in method values of dict object at 0x00..
would appreciate help solving that.

import rasterio
from osgeo import gdal
import numpy as np
import pandas as pd

#open all files **I have a lot of folders like that one to open
#Grifin data read
Gr_1A_hh_path = r"E:\SAOCOM\1A1B\Images\Griffin\130122\Source\Data\gtc-acqId0000705076-a-sm9-2201150146-hh-h.tif"
Gr_1A_hh = rasterio.open(Gr_1A_hh_path)

Gr_1A_vv_path = r"E:\SAOCOM\1A1B\Images\Griffin\130122\Source\Data\gtc-acqId0000705076-a-sm9-2201150146-vv-h.tif"
Gr_1A_vv = rasterio.open(Gr_1A_vv_path)

Gr_1A_vh_path = r"E:\SAOCOM\1A1B\Images\Griffin\130122\Source\Data\gtc-acqId0000705076-a-sm9-2201150146-vh-h.tif"
Gr_1A_vh = rasterio.open(Gr_1A_vh_path)

Gr_1A_hv_path = r"E:\SAOCOM\1A1B\Images\Griffin\130122\Source\Data\gtc-acqId0000705076-a-sm9-2201150146-hv-h.tif"
Gr_1A_hv = rasterio.open(Gr_1A_hv_path)

#reading all the rasters as arrays
array_1A_hh= Gr_1A_hh.read()
array_1A_vv= Gr_1A_vv.read()
array_1A_vh= Gr_1A_vh.read()
array_1A_hv= Gr_1A_hv.read()

#creating a dictionary so that each array would have a name that would be used as column name
A2 = {
   "HH":array_1A_hh,
   "VV":array_1A_vv,
   "VH":array_1A_vh,
   "HV":array_1A_hv}

df= pd.DataFrame(index=["min","max","mean","medien"])
for name, pol in A2.items():
   for band in pol:
       stats = {
       "min":band.min(),
       "max":band.max(),
       "mean":band.mean(),
       "median":np.median(band)}
       df[f"{name}"]=stats.values

OUTPUT:
df
                                                      HH  ...                                                 HV
min     <built-in method values of dict object at 0x00...  ...  <built-in method values of dict object at 0x00...
max     <built-in method values of dict object at 0x00...  ...  <built-in method values of dict object at 0x00...
mean    <built-in method values of dict object at 0x00...  ...  <built-in method values of dict object at 0x00...
medien  <built-in method values of dict object at 0x00...  ...  <built-in method values of dict object at 0x00...

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

千紇 2025-02-20 17:08:47

考虑到您具有图像的命令：

import numpy as np
import pandas as pd

vmin, vmax = 0, 255
C, H, W = 2, 64, 64

images_names = ["HH", "VV", "VH", "HV"]
images = {
    im_name: np.random.randint(vmin, vmax, size=(C, H, W))
    for im_name in images_names
}

以及一堆功能以按频段计算统计数据：

stats_functions = {
    "min": lambda band: band.min(),
    "max": lambda band: band.max(),
    "mean": lambda band: band.mean(),
    "median": lambda band: np.median(band),
}

您可以首先构造统计信息的dict dict ，

images_stats = {
    im_name: {
        band_idx: {
            stat_name: stat_func(band)
            for stat_name, stat_func in stats_functions.items()
        }
        for band_idx, band in enumerate(im)
    }
    for im_name, im in images.items()
}

然后将其转换为pandas dataframe：

images_stats_df = pd.concat(
    {
        im_name: pd.DataFrame(im_stats)
        for im_name, im_stats in images_stats.items()
    },
    axis="columns",
)

它给出：给出：

>>> images_stats_df
                HH                      VV                      VH                     HV
                 0           1           0           1           0          1           0           1
min       0.000000    0.000000    0.000000    0.000000    0.000000    0.00000    0.000000    0.000000
max     254.000000  254.000000  254.000000  254.000000  254.000000  254.00000  254.000000  254.000000
mean    127.070557  126.082764  126.483643  127.737061  127.270996  128.89502  128.814209  124.610352
median  129.000000  127.000000  126.000000  127.000000  127.000000  130.00000  129.000000  122.000000

编辑：在您的特殊情况下，构建images> dict的构造是什么样子：

images_paths = {
    "HH": "path/to/image_HH.tif",
    "VV": "path/to/image_VV.tif",
    "VH": "path/to/image_VH.tif",
    "HV": "path/to/image_HV.tif",
}

images = {
    im_name: rasterio.open(im_path).read()
    for im_name, im_path in images_paths
}

Considering you have a dict of images:

import numpy as np
import pandas as pd

vmin, vmax = 0, 255
C, H, W = 2, 64, 64

images_names = ["HH", "VV", "VH", "HV"]
images = {
    im_name: np.random.randint(vmin, vmax, size=(C, H, W))
    for im_name in images_names
}

And a bunch of functions to compute stats on a per band basis:

stats_functions = {
    "min": lambda band: band.min(),
    "max": lambda band: band.max(),
    "mean": lambda band: band.mean(),
    "median": lambda band: np.median(band),
}

You can first construct a dict of statistics:

images_stats = {
    im_name: {
        band_idx: {
            stat_name: stat_func(band)
            for stat_name, stat_func in stats_functions.items()
        }
        for band_idx, band in enumerate(im)
    }
    for im_name, im in images.items()
}

And then convert it to a pandas DataFrame:

images_stats_df = pd.concat(
    {
        im_name: pd.DataFrame(im_stats)
        for im_name, im_stats in images_stats.items()
    },
    axis="columns",
)

Which gives:

>>> images_stats_df
                HH                      VV                      VH                     HV
                 0           1           0           1           0          1           0           1
min       0.000000    0.000000    0.000000    0.000000    0.000000    0.00000    0.000000    0.000000
max     254.000000  254.000000  254.000000  254.000000  254.000000  254.00000  254.000000  254.000000
mean    127.070557  126.082764  126.483643  127.737061  127.270996  128.89502  128.814209  124.610352
median  129.000000  127.000000  126.000000  127.000000  127.000000  130.00000  129.000000  122.000000

Edit: What constructing the images dict might look like in your particular case:

images_paths = {
    "HH": "path/to/image_HH.tif",
    "VV": "path/to/image_VV.tif",
    "VH": "path/to/image_VH.tif",
    "HV": "path/to/image_HV.tif",
}

images = {
    im_name: rasterio.open(im_path).read()
    for im_name, im_path in images_paths
}

回复收藏 0 原文

~没有更多了~