Files
gajim-plugins/stt_voice_messages/README.md
2026-05-05 05:23:23 -03:00

50 lines
1.5 KiB
Markdown

# About
This plugin allows you in conjuction with a _general-purpose speech recognition model_ to transcribe your voice messages to text.
In order to make use of this plugin, you need to have at least one of the following models installed:
#### OpenAI Whisper
- Website: https://github.com/openai/whisper
- Installable by: `pip install -U openai-whisper`
#### Faster Whisper
- Website: https://github.com/SYSTRAN/faster-whisper
- Installable by: `pip install -U faster-whisper`
Additionally you have to checkout the following Gajim branch:
https://dev.gajim.org/mesonium/gajim/-/tree/stt_voice_messages
# Hint
_**The plugin is very much POC at this stage!**_
Currently a chosen model will be on first downloaded in the background, during which
Gajim's UI may not respond.
Typical model sizes are in case of OpenAI Whisper:
| Multi Langual Model | Download Size |
|---------------------|---------------|
| Tiny | 70 MB |
| Base | 140 MB |
| Small | 460 MB |
| Medium | 1.4 GB |
| Large | 2.9 GB |
# TODO
- [x] Offer multiple models
- [ ] Add various model settings
- [ ] Model receiving
- [ ] Hint model download state
- [ ] Allow to change model download location
- [ ] Allow to use local models
- [ ] Database Handling
- [ ] Store transcribed messages in a DB
- [ ] Option to delete DB
- [ ] Update UI
- [ ] Make it more pretty
- [ ] Show progress bar
- [ ] Highlight words on playback