在测试数据加载器中创建大文件名字典，并将其中所有 512x512 补丁的预测分配为其值列表

发布于 2025-01-13 15:26:05 字数 2918 浏览 0 评论 0原文

我不知道为什么像下面这样制作字典没有创建所需的输出。我最终没有得到一个包含 887 个大文件名的字典，而是得到了一个只有 2 个大文件名的字典。

快速介绍我的测试集。我有大图像，并将它们平铺成 512x512 的块。下面你可以看到每个正标签和负标签的大图像数量和 512x512 补丁：

--test
---pos_label 14, 11051
---neg_label 74, 45230

sample_fnames_labels = dataloaders_dict['test'].dataset.samples

test_large_images = {}
test_loss = 0.0
test_acc = 0



with torch.no_grad():
    
    test_running_loss = 0.0
    test_running_corrects = 0
    print(len(dataloaders_dict['test']))
    for i, (inputs, labels) in enumerate(dataloaders_dict['test']):
    
        patch_name = sample_fname.split('/')[-1]
        large_image_name = patch_name.split('_')[0]

        test_inputs = inputs.to(device)
        test_labels = labels.to(device)
        

        test_outputs = saved_model_ft(test_inputs)
    
        
        _, test_preds = torch.max(test_outputs, 1)
        
        max_bs = len(test_preds)
       

        for j in range(max_bs):
        
            sample_file_name = sample_fnames_labels[i+j][0]
            patch_name = sample_file_name.split('/')[-1]
            large_image_name = patch_name.split('_')[0]
         
            if large_image_name not in test_large_images.keys():
              
                test_large_images[large_image_name] = list()
                test_large_images[large_image_name].append(test_preds[j].item())
                
            else:
                test_large_images[large_image_name].append(test_preds[j].item())
            
          
                        
                       
        #test_running_loss += test_loss.item() * test_inputs.size(0)
        test_running_corrects += torch.sum(test_preds == test_labels.data)
    
    #test_loss = test_running_loss / len(dataloaders_dict['test'].dataset)
    test_acc = test_running_corrects / len(dataloaders_dict['test'].dataset)

这里 test_large_images 字典只有两个大图像作为键，而不是 88 个测试大图像。感谢您的关注。

本质上，我想将每个大图像的 512x512 补丁的所有标签作为列表收集到字典中，并以 large_image_filename 作为键。所以，我可以稍后进行多数投票。

这是 PyTorch 使用的数据加载器，批量大小为 512。

# Create training and validation datasets
image_datasets = {x: datasets.ImageFolder(os.path.join(data_dir, x), data_transforms[x]) for x in ['train', 'val', 'test']}
# Create training and validation dataloaders
print('batch size: ', batch_size)
dataloaders_dict = {x: torch.utils.data.DataLoader(image_datasets[x], batch_size=batch_size, shuffle=True, num_workers=4) for x in ['train', 'val', 'test']}

最终，我希望得到类似的结果：

{large_image_1: [0, 1, 1, 0], large_image_2: [1, 1, 1, 0, 0 , 0 , 0, 0, 0], large_image_3: [0, 0], ...}

请注意，我的大图像在数量上有不同的大小512x512 补丁。

我确实在下面看到了 87 个独特的大图像文件名。不知道为什么字典中只有其中两个被更新：

fnames = set()
for i in range(len(sample_fnames_labels)):
    fname = sample_fnames_labels[i][0].split('/')[-1][:23]
    fnames.add(fname)
    
print(len(fnames))

原文

I am not sure why making a dictionary as the following is not creating the desired output. Instead of ending up with a dictionary with 887large filenames, I ended up with a dictionary with only 2 large filenames.

Quick intro to my test set. I have large images and I have tiled them into 512x512 patches. Below you can see number of large images and 512x512 patches for each positive and negative label:

--test
---pos_label 14, 11051
---neg_label 74, 45230

sample_fnames_labels = dataloaders_dict['test'].dataset.samples

test_large_images = {}
test_loss = 0.0
test_acc = 0



with torch.no_grad():
    
    test_running_loss = 0.0
    test_running_corrects = 0
    print(len(dataloaders_dict['test']))
    for i, (inputs, labels) in enumerate(dataloaders_dict['test']):
    
        patch_name = sample_fname.split('/')[-1]
        large_image_name = patch_name.split('_')[0]

        test_inputs = inputs.to(device)
        test_labels = labels.to(device)
        

        test_outputs = saved_model_ft(test_inputs)
    
        
        _, test_preds = torch.max(test_outputs, 1)
        
        max_bs = len(test_preds)
       

        for j in range(max_bs):
        
            sample_file_name = sample_fnames_labels[i+j][0]
            patch_name = sample_file_name.split('/')[-1]
            large_image_name = patch_name.split('_')[0]
         
            if large_image_name not in test_large_images.keys():
              
                test_large_images[large_image_name] = list()
                test_large_images[large_image_name].append(test_preds[j].item())
                
            else:
                test_large_images[large_image_name].append(test_preds[j].item())
            
          
                        
                       
        #test_running_loss += test_loss.item() * test_inputs.size(0)
        test_running_corrects += torch.sum(test_preds == test_labels.data)
    
    #test_loss = test_running_loss / len(dataloaders_dict['test'].dataset)
    test_acc = test_running_corrects / len(dataloaders_dict['test'].dataset)

here test_large_images dictionary only has two large images as the key instead of 88 test large images. Thanks for having a look.

Essentially I want to collect all the labels of 512x512 patches of each large image as a list into a dictionary with the large_image_filename as the key. So, I could do a majority voting later on.

Here's the used dataloader from PyTorch and batch size is 512.

# Create training and validation datasets
image_datasets = {x: datasets.ImageFolder(os.path.join(data_dir, x), data_transforms[x]) for x in ['train', 'val', 'test']}
# Create training and validation dataloaders
print('batch size: ', batch_size)
dataloaders_dict = {x: torch.utils.data.DataLoader(image_datasets[x], batch_size=batch_size, shuffle=True, num_workers=4) for x in ['train', 'val', 'test']}

Ultimately, I am hoping to get something like:

{large_image_1: [0, 1, 1, 0], large_image_2: [1, 1, 1, 0, 0 , 0 , 0, 0, 0], large_image_3: [0, 0], ...}

Please note that my large images are of different sizes in terms of number of 512x512 patches.

I do actually see 87 unique large image filenames below. Not sure why in the dictionary only two of them gets updated:

fnames = set()
for i in range(len(sample_fnames_labels)):
    fname = sample_fnames_labels[i][0].split('/')[-1][:23]
    fnames.add(fname)
    
print(len(fnames))

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

爱殇璃 2025-01-20 15:26:05

通过在测试的数据加载器中将批量大小设置为 1 修复了问题

# Create training and validation datasets
image_datasets = {x: datasets.ImageFolder(os.path.join(data_dir, x), data_transforms[x]) for x in ['test']}
# Create training and validation dataloaders

dataloaders_dict = {x: torch.utils.data.DataLoader(image_datasets[x], batch_size=1, shuffle=True, num_workers=4) for x in ['test']}

test_large_images = {}
test_loss = 0.0
test_acc = 0


with torch.no_grad():
    
    test_running_loss = 0.0
    test_running_corrects = 0
    print(len(dataloaders_dict['test']))
    for i, (inputs, labels) in enumerate(dataloaders_dict['test']):
    
        print(i)
        test_input = inputs.to(device)
        test_label = labels.to(device)
        test_output = saved_model_ft(test_input)
        _, test_pred = torch.max(test_output, 1)
        sample_fname, label = dataloaders_dict['test'].dataset.samples[i]
        patch_name = sample_fname.split('/')[-1]
        large_image_name = patch_name.split('_')[0]
        if large_image_name not in test_large_images.keys():
            test_large_images[large_image_name] = list()
            test_large_images[large_image_name].append(test_pred.item())
        else:
            test_large_images[large_image_name].append(test_pred.item())
            
        #print('test_large_images.keys(): ', test_large_images.keys())
        test_running_corrects += torch.sum(test_preds == test_labels.data)
    
    test_acc = test_running_corrects / len(dataloaders_dict['test'].dataset)

print(test_acc)

fixed the problem by setting the batch size to 1 in dataloader of test

# Create training and validation datasets
image_datasets = {x: datasets.ImageFolder(os.path.join(data_dir, x), data_transforms[x]) for x in ['test']}
# Create training and validation dataloaders

dataloaders_dict = {x: torch.utils.data.DataLoader(image_datasets[x], batch_size=1, shuffle=True, num_workers=4) for x in ['test']}

test_large_images = {}
test_loss = 0.0
test_acc = 0


with torch.no_grad():
    
    test_running_loss = 0.0
    test_running_corrects = 0
    print(len(dataloaders_dict['test']))
    for i, (inputs, labels) in enumerate(dataloaders_dict['test']):
    
        print(i)
        test_input = inputs.to(device)
        test_label = labels.to(device)
        test_output = saved_model_ft(test_input)
        _, test_pred = torch.max(test_output, 1)
        sample_fname, label = dataloaders_dict['test'].dataset.samples[i]
        patch_name = sample_fname.split('/')[-1]
        large_image_name = patch_name.split('_')[0]
        if large_image_name not in test_large_images.keys():
            test_large_images[large_image_name] = list()
            test_large_images[large_image_name].append(test_pred.item())
        else:
            test_large_images[large_image_name].append(test_pred.item())
            
        #print('test_large_images.keys(): ', test_large_images.keys())
        test_running_corrects += torch.sum(test_preds == test_labels.data)
    
    test_acc = test_running_corrects / len(dataloaders_dict['test'].dataset)

print(test_acc)

回复收藏 0 原文

~没有更多了~