Consider improvements to video metadata #12

samcunliffe · 2022-11-03T17:16:18Z

Investigate standard video metadata formats (can we make use of the "comments" field?)
Investigate JSON which (would live in the same directory)
Anything else? ...

niksirbi · 2022-11-21T19:51:48Z

Here is my research so far on this topic:

Metadata schemas/standards

As one would expect, there is no single all-encompassing standard. There are several existing metadata schemas, as found on stack overflow and in this blogpost.

These are all very general purpose, but we could addopt a small subset of useful fields for our purposes.

My current favorite: Schema

schema.org is used by Google (incl. YouTube), Bing, Yahoo. Its main applications seems to be Search Engine Optimisation. It covers a lot of things, including video objects .

Schema.org vocabulary can be used with many different encodings, including RDFa, Microdata and JSON-LD. These vocabularies cover entities, relationships between entities and actions, and can easily be extended through a well-documented extension model. Over 10 million sites use Schema.org to markup their web pages and email messages. Many applications from Google, Microsoft, Pinterest, Yandex and others already use these vocabularies to power rich, extensible experiences.

I lean toward following the schema.org, for several reasons.

The communty is very active on GitHub and we can interact with them there.
It's cool to be compatible with YouTube, and I expect there are tons of relevant tutorials/resources which might be relevant (e.g. how to scrape schema.org metadata using python)
The standard was developed for webpages and html in mind, which fits our goals with the dashboard/web app.
It has support for events/timestamps, which we can use for issue Tagging events in videos #19

Extracting embedded video metadata

Using ffmpeg(-python)

Embedded video metadata can be extraxted with the ffprobe command of ffmpeg.
The ffmpeg-python library can do the same using:

import ffmpeg

metadata_dict = ffmpeg.probe(video_file)["streams"]

Using mutagen

https://mutagen.readthedocs.io/en/latest/
See tutorial here

samcunliffe · 2022-11-22T09:27:21Z

Schema LGTM. Seems to be Apache 2.0.

Did anyone check with Sanna what metadata, precisely, is recorded in the manual metadata step? [I deliberately don't tag her because perhaps you and @sfmig already discussed this.]

The 'abstract' field is probably useful for brief summary-type notes. How detailed are the experimenters' notes?

sfmig · 2022-11-23T18:28:05Z

yes @niksirbi and I had a meeting with Sanna and Lewis last week and they gave us an overview of the current pipeline, including the part recording the video metadata manually. That currently lives in master-list.xlsx in the zoo server. The notes are not very extensive so they may fit well in the 'abstract' field

sfmig · 2022-11-28T15:40:46Z

as mentioned by @samcunliffe, maybe the electronic research notebook is worth a look (RSpace under the hood).

@niksirbi also mentioned having a look at bonsai options of adding metadata (bonsai is already used in the project and widely in neuroscience)

niksirbi · 2022-12-08T22:02:08Z

A completely alternative (and much simpler) idea for handling metadata of all kinds, if we decide that fiddling with and adapting schema.org is too cumbersome.
The following solution is heavily inspired by my favorite data standard - Brain Imaging Data Structure - BIDS.

At the top level of the project (e.g. in the folder /ceph/zoo/raw/LondonZoo/Videos), we define a file named metadata_fields.yaml, with contents similar to below:

SpeciesName:
  Type: string
  LongName: The name of the species
  Description: Latin species name in a syntax of Genus_species (e.g. Ampulex_compressa)
  TermURL: https://en.wikipedia.org/wiki/Binomial_nomenclature

BodyWeight:
  Type: numerical
  LongName: Bodyweight of the animal
  Unit: kg

VideoQuality:
  Type: categorical
  Description: Subjective video quality from A to C
  Levels:
    A: No problems with video quality
    B: Some problems but still usable
    C: Unusable video quality

Field name	Definition	Required?
Type	Allowed Python type	Yes
LongName	Long (unabbreviated) name of the variable	No
Description	A description of the variable	Yes
Levels	For categorical variables: a dictionary of possible values (keys) and their descriptions (values).	Only for categorical variables
Unit	Measurement unit or None	Only for numerical variables
TermURL	URL pointing to a formal definition of this type of data in an ontology available on the web.	No

This handles strings, numerical, and categorical variables and ensures that we always know what each variable means. This solution is also easily extensible if we decide to add more metadata fields in the future, by simply defining more fields in the metadata_fields.yaml file.

If for a particular species we decide to change sth (say we think that kg is not a suitable unit to measure the bodyweight of wasps), we can define a second metadata_fields.yaml file in the species-specific /ceph/zoo/raw/LondonZoo/Videos/jewel-wasp_Ampulex-compressa subfolder. This file will simply contain what we want to change compared to the higher level files, e.g.:

BodyWeight:
  Unit: mg

The rule will be to start reading from the high-level directory, but update with new values if a yaml file with the same name exists in a lower-level directory. This is inspired by the BIDS inheritance principle.

To define the variable values for each video, we define one yaml file per video, named as <video_filename>_metadata.yaml:

SpeciesName: Ampulex_compressa
BodyWeight: 38
VideoQuality: A

Having both the high-level metadata_fields.yaml and the video-level <video_filename>_metadata.yaml would allow us to quickly and easily construct a table (xlsx/csv) showing metadata for all videos or for any given subset of them.

This solution is easily extensible if we decide to add more metadata fields in the future, by simply defining more fields in the metadata_fields.yaml file.

Let me know what you think of it @samcunliffe , @sfmig

sfmig · 2022-12-09T10:18:36Z

@niksirbi I really like this idea! It's nice that we adhere to an existing standard in neuroscience, and I think it fits very nicely with my current (very preliminary) work on using Dash/Plotly to visualise and edit the metadata (see this branch). I can give more details in the standup later.

sfmig · 2023-03-15T13:24:43Z

closing this, as we now have a more or less solid structure for the metadata

samcunliffe added this to the Minimum Viable Product: v0 milestone Nov 3, 2022

samcunliffe added the enhancement Optional feature label Nov 3, 2022

sfmig mentioned this issue Nov 9, 2022

Tagging events in videos #19

Open

1 task

sfmig added core feature Core functionality and removed enhancement Optional feature labels Nov 17, 2022

samcunliffe linked a pull request Nov 23, 2022 that will close this issue

#12 exploring options to handle video metadata with json files #23

Closed

niksirbi mentioned this issue Dec 8, 2022

#12 exploring options to handle video metadata with json files #23

Closed

sfmig mentioned this issue Dec 12, 2022

Dash/plotly webapp skeleton + metadata work #26

Merged

sfmig closed this as completed Mar 15, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consider improvements to video metadata #12

Consider improvements to video metadata #12

samcunliffe commented Nov 3, 2022

niksirbi commented Nov 21, 2022

samcunliffe commented Nov 22, 2022 •

edited

Loading

sfmig commented Nov 23, 2022

sfmig commented Nov 28, 2022

niksirbi commented Dec 8, 2022

sfmig commented Dec 9, 2022

sfmig commented Mar 15, 2023

Consider improvements to video metadata #12

Consider improvements to video metadata #12

Comments

samcunliffe commented Nov 3, 2022

niksirbi commented Nov 21, 2022

Metadata schemas/standards

My current favorite: Schema

Extracting embedded video metadata

Using ffmpeg(-python)

Using mutagen

samcunliffe commented Nov 22, 2022 • edited Loading

sfmig commented Nov 23, 2022

sfmig commented Nov 28, 2022

niksirbi commented Dec 8, 2022

sfmig commented Dec 9, 2022

sfmig commented Mar 15, 2023

samcunliffe commented Nov 22, 2022 •

edited

Loading