HyperKitty Search‑and‑Replace Management Command
This Django management command performs search and replace operations on HyperKitty email bodies and text‑based attachments. It is designed for cases where sensitive information must be scrubbed from archived mailing‑list data.
The command supports simulation mode, selective processing, and deduplication of both emails and attachments.
Features
- Replace sensitive strings in:
- Email bodies
- Text‑based attachments (
text/plain,text/html,application/xhtml+xml)
- Simulation mode (
--simulate) to preview changes without saving - Deduplication of:
- Emails (by
message_id) - Attachments (by SHA‑1 content hash)
- Emails (by
- Flexible processing:
--only-emails--only-attachments
- Reads replacements from a simple text file using
shlexparsing
Installation
Place the command file in:
/usr/lib/python3.10/site-packages/hyperkitty/management/commands/
(or the equivalent path for your Python/Django installation)
The filename should match the command name, for example:
sanitize_hyperkitty.py
Django will automatically detect it as a management command.
Usage
Run the command from your Django project directory:
./manage.py sanitize_hyperkitty \
--list mylist@example.com \
--replacements-file replacements.txt
Common Options
| Option | Description |
|---|---|
--list |
Required. Mailing list name (e.g. team@lists.example.org) |
--replacements-file |
Required. Path to a file containing replacement rules |
--simulate |
Show changes without saving them |
--only-emails |
Process only email bodies |
--only-attachments |
Process only attachments |
Replacements File Format
The replacements file uses shlex parsing, allowing quoted strings.
Each line must contain exactly two values:
old_value new_value
Examples
password "********"
"secret token" "[REDACTED]"
john@example.com jane@example.com
Lines beginning with # are ignored.
How It Works
1. Load Replacements
The command reads the replacements file and builds a dictionary of old → new
pairs. Malformed lines are skipped with warnings.
2. Fetch and Deduplicate Emails
Emails are filtered by mailing list name and deduplicated by message_id.
3. Process Email Bodies
If enabled, each email body is scanned and replacements are applied.
4. Process Attachments
Attachments are:
- Deduplicated by SHA‑1 hash
- Checked for text‑based MIME types
- Decoded using the attachment’s encoding
- Updated and saved if modified
5. Simulation Mode
If --simulate is used:
- Changes are printed to stdout
- No data is saved
6. Rebuild Search Index
After real modifications, rebuild the HyperKitty search index:
./manage.py rebuild_index
Example
./manage.py sanitize_hyperkitty \
--list devteam@lists.example.org \
--replacements-file scrub.txt \
--simulate
This will scan all messages, show what would change, and leave the database untouched.
Notes
--only-emailsand--only-attachmentscannot be used together.- Attachments without a MIME type attempt fallback detection based on filename.
- Non‑text attachments are skipped automatically.
License
This script is intended for administrative use within Django/HyperKitty environments under GNU General Public License v3.0.