Model Implementation Guide

Building tagger containers

The common-ml library (https://github.com/eluv-io/common-ml) contains useful utilities for implementing tagger containers. This guide will show how to use this library to easily implement your own tagger containers which are pluggable into the Eluvio Tagger runtime.

Minimal starting guide

This series of steps shows the bare minimum needed to build a tagger container.

Step 0: install common-ml

pip install git+https://github.com/eluv-io/common-ml.git

Step 1: Implement a model interface

There are a few interfaces we can implement depending on the use case. Pick whichever one is relevant.

Option 1: Frame-based model

Frame-based models are models which logically run on a single input image or frame. This includes image captioning models like LLava or GIT.

To implement, define a class which extends the FrameModel interface. The class needs to implement a single function tag_frame(self, img: np.ndarray) -> List[FrameTag]. This function tags a single input frame (represented as a numpy array). For more details see the example below.

Example

import numpy as np
from common_ml.tagging.models.frame_based import FrameModel
from common_ml.tagging.models.tag_types import FrameTag

class MyFrameModel(FrameModel):
    def tag_frame(self, img: np.ndarray) -> List[FrameTag]:
        """
        Parameters
        ----------
        img : np.ndarray, shape (H, W, 3), dtype uint8
            RGB image.
        """

        return [
            FrameTag(tag="replace with your models output", box={"x1": 0.05, "y1": 0.05, "x2": 0.95, "y2": 0.95}),
            FrameTag(tag="another tag", box={"x1": 0.05, "y1": 0.05, "x2": 0.95, "y2": 0.95})
        ]

Notes

The FrameTag response struct contains only two fields: tag is the textual tag outputed by the model. box is an optional dictionary representing the salient region of the input image.
The values in the box dictionary are between 0 and 1. The boxes in the above example draw a region that covers most of the input image with a small margin.
You may output multiple frame tags per input image

Option 2: Segment or time-range based model

For segment based models, define a class which extends AVModel. The class needs to implement a single function tag(self, fpath: str) -> List[Tag]. The input file can be audio or video whichever is relevant.

Example

from common_ml.tagging.models.av import AVModel
from common_ml.tagging.models.tag_types import Tag

class MyVideoModel(AVModel):

    def tag(self, fpath: str) -> List[Tag]:
        return [
            Tag(
                tag="A dog",
                source_media=fpath,
                start_time=1500,
                end_time=8000,
            )
        ]

Step 2: Start the listener

To start the listening loop, just call run_default on your frame or segment-based model from above.

from common_ml.tagging.run_helpers import run_default

# import your own model
from src.my_model import MyFrameModel

if __name__ == "__main__":
    model = MyFrameModel()
    run_default(model)

The above main.py will run as a daemon, accepting input files via stdin, and outputting the resultant tags to an output file which the Eluvio Tagger runtime can read.

Step 3: Containerize

Now just wrap the above as an OCI image and set main.py as the entrypoint.

Barebones `Containerfile`/`Dockerfile`

FROM continuumio/miniconda3:latest
RUN conda create -n mlpod python=3.10 -y
COPY setup.py .
RUN mkdir src
RUN conda run -n mlpod /opt/conda/envs/mlpod/bin/pip install .
COPY main.py main.py

ENTRYPOINT ["/opt/conda/envs/mlpod/bin/python", "-u", "main.py"]

Step 4: Deploy

Setup google cloud, see https://github.com/qluvio/elv-ml/blob/main/docs/ops/PODMAN_REGISTRY_SETUP.md
Clone and run the Makefile in https://github.com/qluvio/buildscripts, check the README for detailed instructions

More features

Output tags to multiple separate tracks

Simply add the track field to your Tag output

If you leave the track field empty, your tags will be written to a default track.

class MyVideoModel(AVModel):

    def tag(self, fpath: str) -> List[Tag]:
        return [
            Tag(
                tag="A dog",
                source_media=fpath,
                start_time=1500,
                end_time=8000,
                track="dog_detection"
            ),
            Tag(
                tag="Not a hotdog",
                source_media=fpath,
                start_time=1500,
                end_time=8000,
                track="hotdog_detection"
            )
        ]

Attach extra data to a tag

Tags support unstructured data under the additional_info field.

from common_ml.tagging.models.tag_types import Tag, FrameInfo

Tag(
    tag="BMX Bike",
    source_media="input.mp4",
    track="vertical_video",
    frame_info={"frame_idx":45} # tagger converts relative to content
    additional_info={
        "x-coordinates": [0.31, 0.32, 0.31, ...]
    }
)

Support user arguments

Many of our models can be configured by the user at runtime. Things like confidence thresholds, prompts, tagging frequency, etc.. can be configured by the caller and injected by the tagger runtime into the container.

Use the get_params helper function. It returns a dictionary containing whatever the user chose to pass in when initiating tagging. It is your responsibility to validate that the arguments match your schema and raise a helpful error message if they don’t.

from common_ml.tagging.run_helpers get_params
params = get_params()

Returning errors

The run_default helper function will already handle inference exceptions for you, by passing them to the tagger runtime and safely exiting. But suppose an error happens during your initial setup. For instance the user submits some bad input params as per the example above.

Example:

from dataclasses import dataclass
from dacite import from_dict, Config
from common_ml.tagging.run_helpers import run_default, get_params

@dataclass
class MyInputSchema:
    confidence_threshold: float = 0.5

params = get_params() # user submits a typo: {"confdence_threshold": 1}

# raises exception!
validated_params = from_dict(data=params, data_class=MyInputSchema, config=Config(strict=True))

my_model = MyModel(validated_params)

run_default(my_model)

If the tagger runs the example above: the best it can do is return a vague error to the user like “container exited with nonzero code 1”

The fix is extremely easy, just add catch_errors() to the top of your script. This will propagate any uncaught exceptions to the user so that you don’t have to spend much time worrying about it.

from dataclasses import dataclass
from dacite import from_dict, Config
from common_ml.tagging.run_helpers import run_default, get_params, catch_errors

@dataclass
class MyInputSchema:
    confidence_threshold: float = 0.5

catch_errors()

params = get_params() # user submits a typo: {"confdence_threshold": 1}

# raises exception! ("dacite.exceptions.UnexpectedDataError: can not match "confdence_threshold" to any data class field")
validated_params = from_dict(data=params, data_class=MyInputSchema, config=Config(strict=True))

my_model = MyModel(validated_params)

run_default(my_model)

Now the tagger will be able to report the error message dacite.exceptions.UnexpectedDataError: can not match "confdence_threshold" to any data class field to the user so they can fix their mistake.

Model Implementation Guide

Building tagger containers

Minimal starting guide

Step 0: install common-ml

Step 1: Implement a model interface

Option 1: Frame-based model

Example

Notes

Option 2: Segment or time-range based model

Example

Step 2: Start the listener

Step 3: Containerize

Barebones Containerfile/Dockerfile

Step 4: Deploy

More features

Output tags to multiple separate tracks

Attach extra data to a tag

Support user arguments

Returning errors

Barebones `Containerfile`/`Dockerfile`