Search Configuration
We can update the configuration for our search space in order to
- customize search settings
- add or remove tag tracks from indexing
- add or remove content level fabric metadata from indexing
This document will show you how to update the config as well as provide an overview of all the configurable settings.
Setting the config
When creating a new space (for decisive folks)
POST /spaces/<qid> HTTP/1.1
Host: https://ai.contentfabric.io/vectorstore
Authorization: Bearer <token>
Content-Type: application/json
{
"collection_id": "<collection_id>",
"name": "<name>",
"type": "clip-search",
"config": {...config stuff here...}
}
Later, when you’ve realized you screwed it up the first time
PATCH /spaces/<qid> HTTP/1.1
Host: https://ai.contentfabric.io/vectorstore
Authorization: Bearer <token>
Content-Type: application/json
{
"config": {...config stuff here...}
}
Configuration Settings
Schema
The config schema is documented in openapi format: “link will go here when it exists”
Explanation
The config has two top level blocks:
| Block | Purpose |
|---|---|
indexer |
Controls what gets indexed and how documents are built. |
search |
Controls the default behavior when searching the space. |
{
"indexer": {
"document": {
"aggregation": {
"track": "shot_detection"
}
},
"fabric": {
"fields": {
"title": {
"paths": [
"public.asset_metadata.display_title",
"public.asset_metadata.title"
],
"options": {}
},
"genre": {
"paths": [
"public.asset_metadata.mpaa_genre"
],
"options": {}
}
}
},
"tags": {
"fields": {
"scene_description": {
"tracks": [
"llava_caption",
"scene_description"
],
"options": {
"chunk_strategy": "sentence"
}
},
"dialogue": {
"tracks": [
"auto_captions",
"transcription"
],
"options": {
"chunk_strategy": "none"
}
}
},
"ignore_tracks": ["speech_to_text"]
}
},
"search": {
"clip_search": {
"defaults": {
"rerank_level": "document",
"rerank_user_query": true,
"clips_min_duration": 15,
"clips_max_duration": 45
}
}
}
}
indexer
Document Aggregation
"document": {
"aggregation": {
"track": "shot_detection"
}
}
When a track is set, the indexer automatically creates aggregated documents for every tag in this track
- For all indexed tags (configured in the
tagsblock) which overlap with the aggregation track’s tag time-ranges, we merge these into a single textual document. - Fabric level field data (configured in the
fabricblock) is aggregated into every document that shares a matching content id.
What’s the point?
Vector search by itself is surprisingly limited. Aggregating gives us rich contextual information for a scene that can be used to answer complex queries which might span across different fields. For example a query like: “Jennifer Lawrence talks with a mechanic in No Hard Feelings” could match 3 separate fields for cast/celebrity, visual scene description, and film title.
For more information see our docs on search design (doesn’t exist at the time of writing this but when it does it will go right here)
| Field | Type | Description |
|---|---|---|
track |
string | The track used to aggregate overlapping tags into documents. shot_detection is recommended so that documents align to shot boundaries. |
Fabric Fields
"fabric": {
"fields": {
"title": {
"paths": [
"public.asset_metadata.display_title",
"public.asset_metadata.title"
],
"options": {}
}
}
}
Here we define the fields to index and their associated paths in the content fabric metadata. For every content in the space’s collection, the indexer crawls the metadata at each path and indexes the value it finds.
Each entry under fields is a field you name (e.g. title, genre):
| Field | Type | Description |
|---|---|---|
paths |
string[] | One or more dot-separated paths into the fabric metadata. The values at each of these paths will be indexed under the provided field name. |
options |
object | Optional per-field settings (see Field Options). |
What’s the point?
Content objects in the fabric often store important metadata which is relevant to the whole content rather than just a small slice: e.g. “title”, “genre”, “synopsis”, “release date” etc.. We want to be able to be able to filter our search on these values as well.
Triggering fabric metadata crawling
Content object metadata is not crawled automatically and must be triggered manually with a valid token via the indexer API. If you add new contents to the collection you must recrawl.
Start crawl job (gives handle id): POST /spaces/{space_qid}/crawl
Check crawl status: GET /spaces/{space_qid}/crawl/{handle_id}
Indexer API docs: https://ai.contentfabric.io/indexer/openapi.json
Tag Fields
The tags block controls how tags from the tagstore are indexed. It supports three modes:
Default (nothing specified)
If the tags block is omitted entirely, every track is indexed as its own field using default settings. This is the simplest setup and a good starting point.
"tags": {}
ignore_tracks (default settings, minus some tracks)
ignore_tracks is a convenience feature: index every track as its own field with default settings, but skip the listed tracks.
"tags": {
"ignore_tracks": ["speech_to_text"]
}
| Field | Type | Description |
|---|---|---|
ignore_tracks |
string[] | Tracks to exclude from indexing. All other tracks are indexed with default settings. |
ignore_tracks is mutually exclusive with fields — provide one or the other, not both.
fields (full manual control)
When you need to group multiple tracks into a single field or customize per-field options, define each field explicitly. Each entry under fields is a field you name (e.g. scene_description, dialogue):
"tags": {
"fields": {
"scene_description": {
"tracks": [
"llava_caption",
"scene_description"
],
"options": {
"chunk_strategy": "sentence"
}
}
}
}
| Field | Type | Description |
|---|---|---|
tracks |
string[] | The tagstore track names to pull tags from for this field. |
options |
object | Optional per-field settings (see Field Options). |
Field Options
These options can be applied to any field, in either the tags or fabric block.
| Option | Type | Default | Description |
|---|---|---|---|
chunk_strategy |
string | "sentence" |
How the value is chunked into vectors. Set to "none" for no chunking. |
search
Clip Search Defaults
"search": {
"clip_search": {
"defaults": {
"rerank_level": "document",
"rerank_user_query": true,
"clips_min_duration": 15,
"clips_max_duration": 45
}
}
}
The search config defines defaults for searching the space. These defaults allow you to override four of the clip_search API arguments so callers don’t have to specify them on every request.