TensorFlow |创建图像数据集,按文件名标记
我正在尝试创建 Tensorflow 数据集来训练我的模型。我有一个充满标记照片的文件夹,标记是文件名的一部分。
您是否有合理的方法来加载数据集进行训练而不将其拆分到不同的目录?
例子: 对于文件:
- ./dataset/path/img0_cat.bmp
- ./dataset/path/img1_dog.bmp
- ./dataset/path/img2_horse.bmp
- ./dataset/path/img3_cat.bmp
- ./dataset/path/img4_dog.bmp
- ./数据集/路径/img5_horse.bmp
- ./dataset/path/img6_dog.bmp
- ./dataset/path/img7_cat.bmp
- ./dataset/path/img8_horse.bmp
- ./dataset/path/img9_cat.bmp
- ./dataset/path/img10_dog.bmp
预期输出: tf.Dataset 标记为 (cat,狗、马)
Im trying to create Tensorflow dataset to train my model. I have a folder full of tagged photos, tagging is part of the files names.
do you have a reasonable way to load the dataset for training without splitting it to different directories?
example:
for files:
- ./dataset/path/img0_cat.bmp
- ./dataset/path/img1_dog.bmp
- ./dataset/path/img2_horse.bmp
- ./dataset/path/img3_cat.bmp
- ./dataset/path/img4_dog.bmp
- ./dataset/path/img5_horse.bmp
- ./dataset/path/img6_dog.bmp
- ./dataset/path/img7_cat.bmp
- ./dataset/path/img8_horse.bmp
- ./dataset/path/img9_cat.bmp
- ./dataset/path/img10_dog.bmp
expected output: tf.Dataset labeled as one hot for (cat, dog, horse)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
感谢大家的回复
我解决这个问题的方法如下:
在此过程结束时,您将获得一个 Tensorflow 可训练数据集(批量大小需要更多配置,请参阅参考资料),标签为 one-hot 矢量。
运行时:
你得到:
参考:
https://www.tensorflow.org/tutorials/load_data/images
Thanks all for your responses
The way I solved it is as follows:
At the end of this process you get a Tensorflow trainable dataset (batch size requires more configuration, see reference), labels as one-hot vector.
when running:
you got:
Reference:
https://www.tensorflow.org/tutorials/load_data/images
您可以尝试根据您在训练集中使用的任何ID来分配ID,并根据您使用的任何ID来收集路径。
如果您使用的是tensorflow,则 dataset 文档具有信息的方法。加载数据。具体来说,
dataset_dog = tf.data.dataset.list_files(“ ./ dataset/path/path/*dog.bmp)
You can try assigning an ID to each path and gather paths based on whatever IDs you're using on your training set.
If you're using Tensorflow, the Dataset documentation has informative methods in loading data. Specifically,
dataset_dog = tf.data.Dataset.list_files("./dataset/path/*dog.bmp)
这是整个辣酱玉米饼馅。我更喜欢将 pandas 数据集与 ImageDataGenerator.flow_from_dataframe 一起使用,因为它很灵活。我创建了一个目录,其中包含 10 张起重机图像和 10 张信天翁图像。文件名的形式为
0_crane.jpf、1_crane.jpg 等... 10_albatross、11_albatross ..... 下面的代码处理此目录。创建一个数据帧 df,然后将其拆分为 train_df、valid_df 和 test_df。然后为train_gen、test_gen和valid_gen创建3个图像数据生成器。我使用了我常用的标准模型并训练了该模型。然后以 100% 的准确度评估测试步骤。代码如下
here is the whole enchilada. I prefer to use pandas datasets along with the ImageDataGenerator.flow_from_dataframe because it is flexible. I created a directory single with 10 images of cranes and 10 images of albaross. Filenames are of the form
0_crane.jpf, 1_crane.jpg etc ... 10_albatross, 11_albatross ..... Code below process this directory. Create a dataframe df, then splits it into a train_df, valid_df and a test_df. Then 3 image data generators are created for train_gen, test_gen and valid_gen. I used a standard model I usually use and trained the model. Then evaluated the test ste with 100% accuracy. Code is below