gajim-plugins/stt_voice_messages/README.md

# About

This plugin allows you in conjuction with a _general-purpose speech recognition model_ to transcribe your voice messages to text.

In order to make use of this plugin, you need to have at least one of the following models installed:

#### OpenAI Whisper
- Website: https://github.com/openai/whisper
- Installable by: `pip install -U openai-whisper`

#### Faster Whisper
- Website: https://github.com/SYSTRAN/faster-whisper
- Installable by: `pip install -U faster-whisper`

Additionally you have to checkout the following Gajim branch:
https://dev.gajim.org/mesonium/gajim/-/tree/stt_voice_messages

# Hint

_**The plugin is very much POC at this stage!**_

Currently a chosen model will be on first downloaded in the background, during which
Gajim's UI may not respond.

Typical model sizes are in case of OpenAI Whisper:

| Multi Langual Model | Download Size |
|---------------------|---------------|
| Tiny                | 70 MB         |
| Base                | 140 MB        |
| Small               | 460 MB        |
| Medium              | 1.4 GB        |
| Large               | 2.9 GB        |

# TODO

- [x] Offer multiple models
- [ ] Add various model settings
- [ ] Model receiving
  - [ ] Hint model download state
  - [ ] Allow to change model download location
  - [ ] Allow to use local models
- [ ] Database Handling
  - [ ] Store transcribed messages in a DB
  - [ ] Option to delete DB
- [ ] Update UI
  - [ ] Make it more pretty
  - [ ] Show progress bar
  - [ ] Highlight words on playback