yolo-world 源码解析(四)

2024-03-09 08:50:06 浏览数 (1)

Preparing Data for YOLO-World

Overview

For pre-training YOLO-World, we adopt several datasets as listed in the below table:

Data

Samples

Type

Boxes

Objects365v1

609k

detection

9,621k

GQA

621k

grounding

3,681k

Flickr

149k

grounding

641k

CC3M-Lite

245k

image-text

821k

Dataset Directory

We put all data into the data directory, such as:

代码语言:javascript复制
├── coco
│   ├── annotations
│   ├── lvis
│   ├── train2017
│   ├── val2017
├── flickr
│   ├── annotations
│   └── images
├── mixed_grounding
│   ├── annotations
│   ├── images
├── mixed_grounding
│   ├── annotations
│   ├── images
├── objects365v1
│   ├── annotations
│   ├── train
│   ├── val

NOTE: We strongly suggest that you check the directories or paths in the dataset part of the config file, especially for the values ann_file, data_root, and data_prefix.

We provide the annotations of the pre-training data in the below table:

Data

images

Annotation File

Objects365v1

Objects365 train

objects365_train.json

MixedGrounding

GQA

final_mixed_train_no_coco.json

Flickr30k

Flickr30k

final_flickr_separateGT_train.json

LVIS-minival

COCO val2017

lvis_v1_minival_inserted_image_name.json

Acknowledgement: We sincerely thank GLIP and mdetr for providing the annotation files for pre-training.

Dataset Class

For training YOLO-World, we mainly adopt two kinds of dataset classs:

1. MultiModalDataset

MultiModalDataset is a simple wrapper for pre-defined Dataset Class, such as Objects365 or COCO, which add the texts (category texts) into the dataset instance for formatting input texts.

Text JSON

The json file is formatted as follows:

代码语言:javascript复制
[
    ['A_1','A_2'],
    ['B'],
    ['C_1', 'C_2', 'C_3'],
    ...
]

We have provided the text json for LVIS, COCO, and Objects365

2. YOLOv5MixedGroundingDataset

The YOLOv5MixedGroundingDataset extends the COCO dataset by supporting loading texts/captions from the json file. It’s desgined for MixedGrounding or Flickr30K with text tokens for each object.

0 人点赞