Search Configuration

We can update the configuration for our search space in order to

  • customize search settings
  • add or remove tag tracks from indexing
  • add or remove content level fabric metadata from indexing

This document will show you how to update the config as well as provide an overview of all the configurable settings.

Setting the config

When creating a new space (for decisive folks)

POST /spaces/<qid> HTTP/1.1
Host: https://ai.contentfabric.io/vectorstore
Authorization: Bearer <token>
Content-Type: application/json

{
	"collection_id": "<collection_id>",
	"name": "<name>",
	"type": "clip-search",
    "config": {...config stuff here...}
}

Later, when you’ve realized you screwed it up the first time

PATCH /spaces/<qid> HTTP/1.1
Host: https://ai.contentfabric.io/vectorstore
Authorization: Bearer <token>
Content-Type: application/json

{
	"config": {...config stuff here...}
}

Configuration Settings

Schema

The config schema is documented in openapi format: “link will go here when it exists”

Explanation

The config has two top level blocks:

Block Purpose
indexer Controls what gets indexed and how documents are built.
search Controls the default behavior when searching the space.

Here is a fully filled out example config.
{
    "indexer": {
        "document": {
            "aggregation": {
                "track": "shot_detection"
            }
        },
        "fabric": {
            "fields": {
                "title": {
                    "paths": [
                        "public.asset_metadata.display_title",
                        "public.asset_metadata.title"
                    ],
                    "options": {}
                },
                "genre": {
                    "paths": [
                        "public.asset_metadata.mpaa_genre"
                    ],
                    "options": {}
                }
            }
        },
        "tags": {
            "fields": {
                "scene_description": {
                    "tracks": [
                        "llava_caption",
                        "scene_description"
                    ],
                    "options": {
                        "chunk_strategy": "sentence"
                    }
                },
                "dialogue": {
                    "tracks": [
                        "auto_captions",
                        "transcription"
                    ],
                    "options": {
                        "chunk_strategy": "none"
                    }
                }
            },
            "ignore_tracks": ["speech_to_text"]
        }
    },
    "search": {
        "clip_search": {
            "defaults": {
                "rerank_level": "document",
                "rerank_user_query": true,
                "clips_min_duration": 15,
                "clips_max_duration": 45
            }
        }
    }
}

indexer

Document Aggregation

"document": {
    "aggregation": {
        "track": "shot_detection"
    }
}

When a track is set, the indexer automatically creates aggregated documents for every tag in this track

  • For all indexed tags (configured in the tags block) which overlap with the aggregation track’s tag time-ranges, we merge these into a single textual document.
  • Fabric level field data (configured in the fabric block) is aggregated into every document that shares a matching content id.

What’s the point?

Vector search by itself is surprisingly limited. Aggregating gives us rich contextual information for a scene that can be used to answer complex queries which might span across different fields. For example a query like: “Jennifer Lawrence talks with a mechanic in No Hard Feelings” could match 3 separate fields for cast/celebrity, visual scene description, and film title.

For more information see our docs on search design (doesn’t exist at the time of writing this but when it does it will go right here)

Field Type Description
track string The track used to aggregate overlapping tags into documents. shot_detection is recommended so that documents align to shot boundaries.

Fabric Fields

"fabric": {
    "fields": {
        "title": {
            "paths": [
                "public.asset_metadata.display_title",
                "public.asset_metadata.title"
            ],
            "options": {}
        }
    }
}

Here we define the fields to index and their associated paths in the content fabric metadata. For every content in the space’s collection, the indexer crawls the metadata at each path and indexes the value it finds.

Each entry under fields is a field you name (e.g. title, genre):

Field Type Description
paths string[] One or more dot-separated paths into the fabric metadata. The values at each of these paths will be indexed under the provided field name.
options object Optional per-field settings (see Field Options).

What’s the point?

Content objects in the fabric often store important metadata which is relevant to the whole content rather than just a small slice: e.g. “title”, “genre”, “synopsis”, “release date” etc.. We want to be able to be able to filter our search on these values as well.

Triggering fabric metadata crawling

Content object metadata is not crawled automatically and must be triggered manually with a valid token via the indexer API. If you add new contents to the collection you must recrawl.

Start crawl job (gives handle id): POST /spaces/{space_qid}/crawl Check crawl status: GET /spaces/{space_qid}/crawl/{handle_id}

Indexer API docs: https://ai.contentfabric.io/indexer/openapi.json

Tag Fields

The tags block controls how tags from the tagstore are indexed. It supports three modes:

Default (nothing specified)

If the tags block is omitted entirely, every track is indexed as its own field using default settings. This is the simplest setup and a good starting point.

"tags": {}

ignore_tracks (default settings, minus some tracks)

ignore_tracks is a convenience feature: index every track as its own field with default settings, but skip the listed tracks.

"tags": {
    "ignore_tracks": ["speech_to_text"]
}
Field Type Description
ignore_tracks string[] Tracks to exclude from indexing. All other tracks are indexed with default settings.

fields (full manual control)

When you need to group multiple tracks into a single field or customize per-field options, define each field explicitly. Each entry under fields is a field you name (e.g. scene_description, dialogue):

"tags": {
    "fields": {
        "scene_description": {
            "tracks": [
                "llava_caption",
                "scene_description"
            ],
            "options": {
                "chunk_strategy": "sentence"
            }
        }
    }
}
Field Type Description
tracks string[] The tagstore track names to pull tags from for this field.
options object Optional per-field settings (see Field Options).

Field Options

These options can be applied to any field, in either the tags or fabric block.

Option Type Default Description
chunk_strategy string "sentence" How the value is chunked into vectors. Set to "none" for no chunking.

Clip Search Defaults

"search": {
    "clip_search": {
        "defaults": {
            "rerank_level": "document",
            "rerank_user_query": true,
            "clips_min_duration": 15,
            "clips_max_duration": 45
        }
    }
}

The search config defines defaults for searching the space. These defaults allow you to override four of the clip_search API arguments so callers don’t have to specify them on every request.