Model Implementation Guide
Building tagger containers
The common-ml library (https://github.com/eluv-io/common-ml) contains useful utilities for implementing tagger containers. This guide will show how to use this library to easily implement your own tagger containers which are pluggable into the Eluvio Tagger runtime.
Minimal starting guide
This series of steps shows the bare minimum needed to build a tagger container.
Step 0: install common-ml
pip install git+https://github.com/eluv-io/common-ml.git
Step 1: Implement a model interface
There are a few interfaces we can implement depending on the use case. Pick whichever one is relevant.
Option 1: Frame-based model
Frame-based models are models which logically run on a single input image or frame. This includes image captioning models like LLava or GIT.
To implement, define a class which extends the FrameModel interface. The class needs to implement a single function tag_frame(self, img: np.ndarray) -> List[FrameTag]. This function tags a single input frame (represented as a numpy array). For more details see the example below.
Example
import numpy as np
from common_ml.tagging.models.frame_based import FrameModel
from common_ml.tagging.models.tag_types import FrameTag
class MyFrameModel(FrameModel):
def tag_frame(self, img: np.ndarray) -> List[FrameTag]:
"""
Parameters
----------
img : np.ndarray, shape (H, W, 3), dtype uint8
RGB image.
"""
return [
FrameTag(tag="replace with your models output", box={"x1": 0.05, "y1": 0.05, "x2": 0.95, "y2": 0.95}),
FrameTag(tag="another tag", box={"x1": 0.05, "y1": 0.05, "x2": 0.95, "y2": 0.95})
]
Notes
- The
FrameTagresponse struct contains only two fields:tagis the textual tag outputed by the model.boxis an optional dictionary representing the salient region of the input image. - The values in the
boxdictionary are between 0 and 1. The boxes in the above example draw a region that covers most of the input image with a small margin. - You may output multiple frame tags per input image
Option 2: Segment or time-range based model
For segment based models, define a class which extends AVModel. The class needs to implement a single function tag(self, fpath: str) -> List[Tag]. The input file can be audio or video whichever is relevant.
Example
from common_ml.tagging.models.av import AVModel
from common_ml.tagging.models.tag_types import Tag
class MyVideoModel(AVModel):
def tag(self, fpath: str) -> List[Tag]:
return [
Tag(
tag="A dog",
source_media=fpath,
start_time=1500,
end_time=8000,
)
]
Step 2: Start the listener
To start the listening loop, just call run_default on your frame or segment-based model from above.
from common_ml.tagging.run_helpers import run_default
# import your own model
from src.my_model import MyFrameModel
if __name__ == "__main__":
model = MyFrameModel()
run_default(model)
The above main.py will run as a daemon, accepting input files via stdin, and outputting the resultant tags to an output file which the Eluvio Tagger runtime can read.
Step 3: Containerize
Now just wrap the above as an OCI image and set main.py as the entrypoint.
Barebones Containerfile/Dockerfile
FROM continuumio/miniconda3:latest
RUN conda create -n mlpod python=3.10 -y
COPY setup.py .
RUN mkdir src
RUN conda run -n mlpod /opt/conda/envs/mlpod/bin/pip install .
COPY main.py main.py
ENTRYPOINT ["/opt/conda/envs/mlpod/bin/python", "-u", "main.py"]
Step 4: Deploy
- Setup google cloud, see https://github.com/qluvio/elv-ml/blob/main/docs/ops/PODMAN_REGISTRY_SETUP.md
- Clone and run the Makefile in https://github.com/qluvio/buildscripts, check the README for detailed instructions
More features
Output tags to multiple separate tracks
Simply add the track field to your Tag output
If you leave the track field empty, your tags will be written to a default track.
class MyVideoModel(AVModel):
def tag(self, fpath: str) -> List[Tag]:
return [
Tag(
tag="A dog",
source_media=fpath,
start_time=1500,
end_time=8000,
track="dog_detection"
),
Tag(
tag="Not a hotdog",
source_media=fpath,
start_time=1500,
end_time=8000,
track="hotdog_detection"
)
]
Attach extra data to a tag
Tags support unstructured data under the additional_info field.
from common_ml.tagging.models.tag_types import Tag, FrameInfo
Tag(
tag="BMX Bike",
source_media="input.mp4",
track="vertical_video",
frame_info={"frame_idx":45} # tagger converts relative to content
additional_info={
"x-coordinates": [0.31, 0.32, 0.31, ...]
}
)
Support user arguments
Many of our models can be configured by the user at runtime. Things like confidence thresholds, prompts, tagging frequency, etc.. can be configured by the caller and injected by the tagger runtime into the container.
Use the get_params helper function. It returns a dictionary containing whatever the user chose to pass in when initiating tagging. It is your responsibility to validate that the arguments match your schema and raise a helpful error message if they don’t.
from common_ml.tagging.run_helpers get_params
params = get_params()
Returning errors
The run_default helper function will already handle inference exceptions for you, by passing them to the tagger runtime and safely exiting. But suppose an error happens during your initial setup. For instance the user submits some bad input params as per the example above.
Example:
from dataclasses import dataclass
from dacite import from_dict, Config
from common_ml.tagging.run_helpers import run_default, get_params
@dataclass
class MyInputSchema:
confidence_threshold: float = 0.5
params = get_params() # user submits a typo: {"confdence_threshold": 1}
# raises exception!
validated_params = from_dict(data=params, data_class=MyInputSchema, config=Config(strict=True))
my_model = MyModel(validated_params)
run_default(my_model)
If the tagger runs the example above: the best it can do is return a vague error to the user like “container exited with nonzero code 1”
The fix is extremely easy, just add catch_errors() to the top of your script. This will propagate any uncaught exceptions to the user so that you don’t have to spend much time worrying about it.
from dataclasses import dataclass
from dacite import from_dict, Config
from common_ml.tagging.run_helpers import run_default, get_params, catch_errors
@dataclass
class MyInputSchema:
confidence_threshold: float = 0.5
catch_errors()
params = get_params() # user submits a typo: {"confdence_threshold": 1}
# raises exception! ("dacite.exceptions.UnexpectedDataError: can not match "confdence_threshold" to any data class field")
validated_params = from_dict(data=params, data_class=MyInputSchema, config=Config(strict=True))
my_model = MyModel(validated_params)
run_default(my_model)
Now the tagger will be able to report the error message dacite.exceptions.UnexpectedDataError: can not match "confdence_threshold" to any data class field to the user so they can fix their mistake.