This project transcribes audio from video files and generates descriptions from the transcriptions. It uses OpenAI's Whisper API for transcription and the ChatGPT-3.5-turbo model for generating descriptions. The user can opt to use a local model for both tasks if desired. The tool also supports extracting a specific duration from the start of the video and controlling the bitrate of the audio.
- 🧑💻 How to Run
- 🔑 Setting up your OpenAI API credentials
- 📦 Installing Dependencies
- 🐳 Running with Docker
- 🌐 Community
- 📄 License
You can run the main Python script from the terminal with several optional flags:
python main.py /path/to/video/folder --local --local-transcribe --local-describe --time <seconds> --bitrate <bitrate> --overwrite
- The
--model
flag sets the OpenAI model to use for description generation. The default is gpt-3.5-turbo. - The
--local
flag uses local versions of both the transcribe and describe functions. - The
--local-transcribe
flag uses the local version of the transcribe function. - The
--local-describe
flag uses the local version of the describe function. - The
--time
flag sets the duration in seconds of the video to transcribe. - The
--bitrate
flag sets the bitrate for the extracted audio. - The
--overwrite
flag enables overwriting of existing audio, transcript, or description files. - The
--translate
flag is used for translating the transcription. It should be in the format <orig_language>:<translation_language>. - The
--keep-transcripts
flag, if set, allows you to keep the generated transcript files after the program has run. - The
--keep-audio
flag, if set, allows you to keep the extracted audio files after the program has run.
If you just want to process the video files in a directory without using any flags, you can do so:
python main.py /path/to/video/folder
- Sign up for an OpenAI account if you don't already have one.
- Navigate to the API section of the OpenAI Dashboard.
- Generate a new API key by clicking the "Create API Key" button.
- Securely store your API Key.
- Set your API Key as an environment variable in your system:
export OPENAI_API_KEY="your-api-key"
.
This project uses the ffmpeg
library from your operating system, the git
utility is also required to install SubsAI for local transcription. You can install them with your package manager, for example on Ubuntu/Debian:
sudo apt-get install ffmpeg git
This project uses the openai
, moviepy
, and pydub
Python libraries. You can install them using pip:
pip install openai moviepy pydub
For local transcription, the SubsAI Python project is used, you can install it from GitHub:
pip install git+https://github.com/abdeladim-s/subsai.git
For local description, any app or library that provides an OpenAI-compatible API can be used, such as LM Studio.
To get started, you first need to pull the Docker image from the GitHub Container Registry. You can do this by running the following command in your terminal:
docker pull ghcr.io/rahb-realtors-association/transcriber-describer:latest
Run with the following command:
docker run -it ghcr.io/rahb-realtors-association/transcriber-describer:latest <flags>
Contributions of any kind are very welcome, and would be much appreciated. For Code of Conduct, see Contributor Convent.
To get started, fork the repo, make your changes, add, commit and push the code, then come back here to open a pull request. If you're new to GitHub or open source, this guide or the git docs may help you get started, but feel free to reach out if you need any support.
If you've found something that doesn't work as it should, or would like to suggest a new feature, then go ahead and raise an issue on GitHub. For bugs, please outline the steps needed to reproduce, and include relevant info like system info and resulting logs.
This project is open sourced under the MIT license. See the LICENSE file for more info. 📜