Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider improvements to video metadata #12

Closed
samcunliffe opened this issue Nov 3, 2022 · 7 comments
Closed

Consider improvements to video metadata #12

samcunliffe opened this issue Nov 3, 2022 · 7 comments
Labels
core feature Core functionality

Comments

@samcunliffe
Copy link
Member

  • Investigate standard video metadata formats (can we make use of the "comments" field?)
  • Investigate JSON which (would live in the same directory)
  • Anything else? ...
@samcunliffe samcunliffe added this to the Minimum Viable Product: v0 milestone Nov 3, 2022
@samcunliffe samcunliffe added the enhancement Optional feature label Nov 3, 2022
@sfmig sfmig mentioned this issue Nov 9, 2022
1 task
@sfmig sfmig added core feature Core functionality and removed enhancement Optional feature labels Nov 17, 2022
@niksirbi
Copy link
Member

Here is my research so far on this topic:

Metadata schemas/standards

As one would expect, there is no single all-encompassing standard. There are several existing metadata schemas, as found on stack overflow and in this blogpost.

These are all very general purpose, but we could addopt a small subset of useful fields for our purposes.

My current favorite: Schema

schema.org is used by Google (incl. YouTube), Bing, Yahoo. Its main applications seems to be Search Engine Optimisation. It covers a lot of things, including video objects .

Schema.org vocabulary can be used with many different encodings, including RDFa, Microdata and JSON-LD. These vocabularies cover entities, relationships between entities and actions, and can easily be extended through a well-documented extension model. Over 10 million sites use Schema.org to markup their web pages and email messages. Many applications from Google, Microsoft, Pinterest, Yandex and others already use these vocabularies to power rich, extensible experiences.

I lean toward following the schema.org, for several reasons.

Extracting embedded video metadata

Using ffmpeg(-python)

Embedded video metadata can be extraxted with the ffprobe command of ffmpeg.
The ffmpeg-python library can do the same using:

import ffmpeg

metadata_dict = ffmpeg.probe(video_file)["streams"]

Read more here

Using mutagen

https://mutagen.readthedocs.io/en/latest/
See tutorial here

@samcunliffe
Copy link
Member Author

samcunliffe commented Nov 22, 2022

Schema LGTM. Seems to be Apache 2.0.

Did anyone check with Sanna what metadata, precisely, is recorded in the manual metadata step? [I deliberately don't tag her because perhaps you and @sfmig already discussed this.]

The 'abstract' field is probably useful for brief summary-type notes. How detailed are the experimenters' notes?

@sfmig
Copy link
Collaborator

sfmig commented Nov 23, 2022

yes @niksirbi and I had a meeting with Sanna and Lewis last week and they gave us an overview of the current pipeline, including the part recording the video metadata manually. That currently lives in master-list.xlsx in the zoo server. The notes are not very extensive so they may fit well in the 'abstract' field

@sfmig
Copy link
Collaborator

sfmig commented Nov 28, 2022

as mentioned by @samcunliffe, maybe the electronic research notebook is worth a look (RSpace under the hood).

@niksirbi also mentioned having a look at bonsai options of adding metadata (bonsai is already used in the project and widely in neuroscience)

@niksirbi
Copy link
Member

niksirbi commented Dec 8, 2022

A completely alternative (and much simpler) idea for handling metadata of all kinds, if we decide that fiddling with and adapting schema.org is too cumbersome.
The following solution is heavily inspired by my favorite data standard - Brain Imaging Data Structure - BIDS.

At the top level of the project (e.g. in the folder /ceph/zoo/raw/LondonZoo/Videos), we define a file named metadata_fields.yaml, with contents similar to below:

SpeciesName:
  Type: string
  LongName: The name of the species
  Description: Latin species name in a syntax of Genus_species (e.g. Ampulex_compressa)
  TermURL: https://en.wikipedia.org/wiki/Binomial_nomenclature

BodyWeight:
  Type: numerical
  LongName: Bodyweight of the animal
  Unit: kg

VideoQuality:
  Type: categorical
  Description: Subjective video quality from A to C
  Levels:
    A: No problems with video quality
    B: Some problems but still usable
    C: Unusable video quality
Field name Definition Required?
Type Allowed Python type Yes
LongName Long (unabbreviated) name of the variable No
Description A description of the variable Yes
Levels For categorical variables: a dictionary of possible values (keys) and their descriptions (values). Only for categorical variables
Unit Measurement unit or None Only for numerical variables
TermURL URL pointing to a formal definition of this type of data in an ontology available on the web. No

This handles strings, numerical, and categorical variables and ensures that we always know what each variable means. This solution is also easily extensible if we decide to add more metadata fields in the future, by simply defining more fields in the metadata_fields.yaml file.

If for a particular species we decide to change sth (say we think that kg is not a suitable unit to measure the bodyweight of wasps), we can define a second metadata_fields.yaml file in the species-specific /ceph/zoo/raw/LondonZoo/Videos/jewel-wasp_Ampulex-compressa subfolder. This file will simply contain what we want to change compared to the higher level files, e.g.:

BodyWeight:
  Unit: mg

The rule will be to start reading from the high-level directory, but update with new values if a yaml file with the same name exists in a lower-level directory. This is inspired by the BIDS inheritance principle.

To define the variable values for each video, we define one yaml file per video, named as <video_filename>_metadata.yaml:

SpeciesName: Ampulex_compressa
BodyWeight: 38
VideoQuality: A

Having both the high-level metadata_fields.yaml and the video-level <video_filename>_metadata.yaml would allow us to quickly and easily construct a table (xlsx/csv) showing metadata for all videos or for any given subset of them.

This solution is easily extensible if we decide to add more metadata fields in the future, by simply defining more fields in the metadata_fields.yaml file.

Let me know what you think of it @samcunliffe , @sfmig

@sfmig
Copy link
Collaborator

sfmig commented Dec 9, 2022

@niksirbi I really like this idea! It's nice that we adhere to an existing standard in neuroscience, and I think it fits very nicely with my current (very preliminary) work on using Dash/Plotly to visualise and edit the metadata (see this branch). I can give more details in the standup later.

@sfmig
Copy link
Collaborator

sfmig commented Mar 15, 2023

closing this, as we now have a more or less solid structure for the metadata

@sfmig sfmig closed this as completed Mar 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core feature Core functionality
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants