TensorFlow Training Data Pipeline

15 min read

Create efficient tf.data pipelines from your TigerLabel annotation exports for high-performance TensorFlow model training.

TFRecord Pipeline for Training

TFRecords provide the best data loading performance for TensorFlow. Export from TigerLabel in TFRecord format:

import tensorflow as tf

def parse_tfrecord(example):
    feature_description = {
        'image/encoded': tf.io.FixedLenFeature([], tf.string),
        'image/height': tf.io.FixedLenFeature([], tf.int64),
        'image/width': tf.io.FixedLenFeature([], tf.int64),
        'image/object/bbox/xmin': tf.io.VarLenFeature(tf.float32),
        'image/object/bbox/ymin': tf.io.VarLenFeature(tf.float32),
        'image/object/bbox/xmax': tf.io.VarLenFeature(tf.float32),
        'image/object/bbox/ymax': tf.io.VarLenFeature(tf.float32),
        'image/object/class/label': tf.io.VarLenFeature(tf.int64),
    }
    return tf.io.parse_single_example(example, feature_description)

# Create optimized training pipeline
dataset = tf.data.TFRecordDataset('annotations.tfrecord')
dataset = dataset.map(parse_tfrecord, num_parallel_calls=tf.data.AUTOTUNE)
dataset = dataset.shuffle(1000).batch(32).prefetch(tf.data.AUTOTUNE)