Supported import formats¶

Glossary

Imports are data folders imported into a project and displayed in the Label center section of the project details page. In the Filter panel of the label center, imports can be used to filter samples.

Imports are added to a project on the project details page. You can import data of the following formats from your computer:

Loose and zipped 2D images (JPG, JPEG, PNG, 8-bit TIF and TIFF, BMP, and GIF).
Directory-per-class (image folders in ZIP).
COCO format (JSON and data in ZIP).
Pascal VOC (XML and data in ZIP).
KRF and data in ZIP.
Robovision AI native format (records).

The table below contains a breakdown of the file formats that Robovision AI supports and test data. These datasets are presented to test the data import and might not be sufficient for testing, training, or other Robovision AI functionality.

Data type	Data format	Supported algorithms
Images in ZIP	2D images	All
Directory-per-class	2D image folders in ZIP	All
COCO	JSON and data in ZIP	All
Pascal VOC	XML and data in ZIP	All

Loose and zipped images¶

You can upload single image files or images in the ZIP format from the local storage.

Specification¶

The imported data must contain images of the supported formats—JPG, JPEG, PNG, 8-bit TIF and TIFF, BMP, or GIF.
Grayscale image support: Robovision AI supports the import of higher bit-depth grayscale images, which are automatically converted to 8-bit format during the import process:
- 16-bit grayscale (full range)
- 14-bit grayscale with 2 null bits (MSB-aligned)
All higher bit-depth images are converted to 8-bit using right 8-bit shifting. This conversion preserves the most significant visual information while ensuring compatibility with Robovision AI's processing pipeline.

Limitation for 14-bit grayscale images

The conversion only works correctly for MSB-aligned images, where the significant data bits are aligned to the Most Significant Bit side (with null/padding bits at the LSB end). Images with different bit alignment will not convert correctly and may result in poor image quality.
All supported image files in the root of the directory are considered for import. All other files and folders are ignored.

The directory structure of your ZIP file has to match the following:

.
├── 01.jpg
├── 02.jpg
├── 03.jpg
├── 04.jpg
├── 05.png
├── 06.png
├── ...

Directory-per-class format¶

Glossary

Image directory-per-class is a data format where the imported samples must be organized into one folder per class, so that each folder contains images of an object or scene belonging to the same class. According to the name of the folder, the annotations of a label type are added to each of the images.

Specification¶

Supported annotations: labels.
All folders in the root of the directory that contain supported image files are considered as a label; all supported image files are considered for import.

All other files and folders in the root are ignored. All other files and any folders in a subfolder are ignored.

The directory structure should follow this format:

importname.zip
├── cat
│   ├── img1.jpg
│   └── img2.jpg
└── dog
    ├── img3.jpg
    └── img4.jpg

The following format will also work, if the .zip and the top folder name are identical.

importname.zip
└── importname
    ├── cat
    │   ├── img1.jpg
    │   └── img2.jpg
    └── dog
        ├── img3.jpg
        └── img4.jpg

COCO format¶

Glossary

COCO imports consist of images and their associated annotations saved in a specific JSON structure. This format defines how classifications, bounding boxes, masks, and metadata (such as image height and width) are stored.

Specification¶

Supported annotations: labels, bounding boxes, and masks.
The import should be packaged in a single ZIP file containing one JSON file with a valid COCO format description.
Two types of image paths are supported in the COCO JSON file:
- foldername/image.jpg paths.
- image.jpg paths.

If image file paths include folder names, for example, images/img1.jpg or ..images/img1.jpg, the directory structure should be:

importname.zip
└── annotations.json
└── images
│   ├── img1.jpg
│   ├── img2.jpg`

If image file paths are just the image file names, for example, img1.jpg, the directory structure should be:

importname.zip
├── annotations.json
├── img1.jpg
├── img2.jpg

Required fields¶

The following fields are required in the JSON file:

categories: A list of classes with unique id and name.
images: Information about each image with unique id and file_name. Include height and width if using bounding boxes or masks.
annotations: Details of each annotation with unique id, image_id, category_id, and annotation data (bbox for bounding boxes, segmentation for masks).

Fields like supercategory, licenses, info, flickr_url, coco_url, and date_captured can be omitted. When present, supercategory will be prepended to the classname during import into the platform. Pruning it from the JSON leads to shorter class names.

COCO JSON examples¶

Below are examples for different types of annotations using two classes (cat and dog) and two images (img1.jpg and img2.jpg).

Classification labels

{
  "categories": [
    {"id": 1, "name": "cat"},
    {"id": 2, "name": "dog"}
  ],
  "images": [
    {"id": 1, "file_name": "img1.jpg"},
    {"id": 2, "file_name": "img2.jpg"}
  ],
  "annotations": [
    {"id": 1, "image_id": 1, "category_id": 1},
    {"id": 2, "image_id": 2, "category_id": 2}
  ]
}

Bounding boxes for object detection projects

{
  "categories": [
    {"id": 1, "name": "cat"},
    {"id": 2, "name": "dog"}
  ],
  "images": [
    {"id": 1, "file_name": "img1.jpg", "height": 480, "width": 640},
    {"id": 2, "file_name": "img2.jpg", "height": 600, "width": 800}
  ],
  "annotations": [
    {
      "id": 1,
      "image_id": 1,
      "category_id": 1,
      "bbox": [100, 150, 200, 250],
      "area": 50000,
      "iscrowd": 0
    },
    {
      "id": 2,
      "image_id": 2,
      "category_id": 2,
      "bbox": [50, 75, 300, 400],
      "area": 120000,
      "iscrowd": 0
    }
  ]
}

bbox format: [x, y, width, height].
area: Calculated as width * height.
iscrowd: Set to 0 for individual objects.

Masks for segmentation projects

{
  "categories": [
  {"id": 1, "name": "cat"},
  {"id": 2, "name": "dog"}
  ],
  "images": [
    {"id": 1, "file_name": "img1.jpg", "height": 480, "width": 640},
    {"id": 2, "file_name": "img2.jpg", "height": 600, "width": 800}
  ],
  "annotations": [
    {
      "id": 1,
      "image_id": 1,
      "category_id": 1,
      "segmentation": [
        [150, 200, 180, 180, 210, 190, 220, 220, 200, 250, 170, 240]
      ],
      "area": 3150,
      "iscrowd": 0
    },
    {
      "id": 2,
      "image_id": 2,
      "category_id": 2,
      "segmentation": [
        [300, 100, 320, 130, 310, 160, 280, 170, 260, 140, 270, 110]
      ],
      "area": 2800,
      "iscrowd": 0
    }
  ]
}

segmentation: An array of polygons, where each polygon is an array of [x1, y1, x2, y2, ..., xn, yn]. Segmentation masks can also consist of Run-Length Encoding (RLE) bitmap masks.
area: The area covered by the mask.
iscrowd: Set to 0 for individual objects.

For more information about the COCO format, see the COCO format specification.

Pascal VOC format¶

Glossary

Pascal VOC is an XML file, unlike COCO which has a JSON file. In Pascal VOC, we create an XML file for each image in the dataset, while in COCO we have one file each, for entire dataset for training, testing, and validation.

Specification¶

Supported annotations: labels, bounding boxes, and masks.
Bounding box annotations should be contained in the XML files themselves.
For mask annotations, the class information should be found in <folder>/SegmentationClass/<filename>.*, the instance information should be found in <folder>/SegmentationObject/<filename>.*.

To be imported into Robovision AI, the Pascal VOC dataset structure has to contain at least:

importname.zip
├── Annotations
│   ├── 000068.xml
│   ├── 000069.xml
│   ├── ...
├── JPEGImages
│   ├── 000068.jpg
│   ├── 000069.jpg
│   ├── ...

The complete structure of the Pascal VOC dataset has to be as follows:

importname.zip
├── Annotations
│   ├── 000068.xml
│   ├── 000069.xml
│   ├── ...
├── JPEGImages
│   ├── 000068.jpg
│   ├── 000069.jpg
│   ├── ...
├── SegmentationClass
│   ├── 000068.png
│   ├── 000069.png
│   ├── ...
├── SegmentationObject
│   ├── 0000068.png
│   ├── 000069.png
│   ├── ...
└── labelmap.txt

The directory should contain one or more XML files. These files can be anywhere in the directory (not necessarily in the root). Each XML file is considered as a sample for import.
The labelmap.txt file should be present in the root, containing a mapping between class names and RGB color values (to fetch label information from an image). The information should be presented as one class per line in the following format: <label>:<color_rgb>:<parts>:<action>. The last two values are optional. For example, background:0,0,0::, indicates that all black pixels are background.
Some of the key tags for Pascal VOC are:
- folder: the folder that contains the images. The folder value is optional and can be skipped.
- filename: the name of the physical file that exists in the folder.
- size: the size of the image in terms of width, height, and depth. If the image is black and white, then the depth will be 1. For color images, the depth will be 3.
- object: the object details. If you have multiple annotations, then the object tag with its contents is repeated. The components of the object tags are:
  - name: the name of the object that we are trying to identify.
  - truncated: indicates that the bounding box specified for the object does not correspond to the full extent of the object. For example, if an object is visible partially in the image, then we set truncated to 1. If the object is fully visible, then set truncated to 0.
  - difficult: an object is marked as difficult when the object is considered difficult to recognize. If the object is difficult to recognize, then we set difficult to 1, else we set it to 0.
  - bndbox: axis-aligned rectangle specifying the extent of the object visible in the image.

PASCAL VOC XML example¶

<annotation>
    <filename>000068.jpg</filename>
    <source>
        <database>The VOC2007 Database</database>
        <annotation>PASCAL VOC2007</annotation>
        <image>flickr</image>
        <flickrid>338904984</flickrid>
    </source>
    <owner>
        <flickrid>Manufacturer</flickrid>
        <name>?</name>
    </owner>
    <size>
        <width>500</width>
        <height>375</height>
        <depth>3</depth>
    </size>
    <segmented>1</segmented>
    <object>
        <name>bird</name>
        <pose>Right</pose>
        <truncated>1</truncated>
        <difficult>0</difficult>
        <bndbox>
            <xmin>27</xmin>
            <ymin>45</ymin>
            <xmax>266</xmax>
            <ymax>375</ymax>
        </bndbox>
    </object>
</annotation>

For more information about the Pascal VOC format, see the Pascal VOC format specification.

KRF and data in ZIP¶

Glossary

KRF or KLARF (KLA Tencor Result File) is a file format used to transfer data about defects in the semiconductor industry. KRF is a text file that can contain information about the wafer, references to external image files, data about the defects location, and the list of classes. Robovision AI supports the 1.2 and 1.8 versions of KRF files.

Specification¶

Supported project types: image classification with wafer processing tools (AI-ADC EfficientNet).
If the imported ZIP file contains more than one KRF file, the first KRF file in alphabetical order will be used.
Defect information is taken from the CLASSNUMBER column of the defect list in the KRF file. Annotations in the CLASSNUMBER column must be integers.
User who performs the import becomes the author of imported annotations.
If a class from the KRF file doesn't yet exist on the platform, it will be imported.
You will be notified of any mismatch between the DEFECTID values (sample names) in the KRF file and the image file names in your ZIP. Note that image file names in your ZIP may be prepended with zeros, so, for example, DEFECTID 31 in the KRF file may correspond to the 00000031.tif file in the ZIP.
If the same image file is imported for the second time (for example, with new annotations), it won't be duplicated, but its annotation will be updated.
Any image file that isn't listed in the KRF file will be ignored and not imported.
KRF import automatically extracts wafer configuration metadata (wafer diameter, center coordinates, die pitch) and sample positioning metadata (chip indices, relative positions, normalized coordinates). For information about using these metadata keys in API inference, see Configure custom metadata for API inference.

Robovision AI native format¶

In Robovision AI, you can export your data if you want to create backups or exchange data between the Robovision AI instances. The samples with their annotations and tags (if any) are exported as a compilation of files. The exported folder contains the records and a manifest file.