# HyperKitty Search‑and‑Replace Management Command This Django management command performs **search and replace operations** on HyperKitty email bodies and text‑based attachments. It is designed for cases where sensitive information must be scrubbed from archived mailing‑list data. The command supports simulation mode, selective processing, and deduplication of both emails and attachments. --- ## Features - Replace sensitive strings in: - Email bodies - Text‑based attachments (`text/plain`, `text/html`, `application/xhtml+xml`) - Simulation mode (`--simulate`) to preview changes without saving - Deduplication of: - Emails (by `message_id`) - Attachments (by SHA‑1 content hash) - Flexible processing: - `--only-emails` - `--only-attachments` - Reads replacements from a simple text file using `shlex` parsing --- ## Installation Place the command file in: ``` /usr/lib/python3.10/site-packages/hyperkitty/management/commands/ ``` (or the equivalent path for your Python/Django installation) The filename should match the command name, for example: ``` sanitize_hyperkitty.py ``` Django will automatically detect it as a management command. --- ## Usage Run the command from your Django project directory: ```bash ./manage.py sanitize_hyperkitty \ --list mylist@example.com \ --replacements-file replacements.txt ``` ### Common Options | Option | Description | |--------|-------------| | `--list` | **Required.** Mailing list name (e.g. `team@lists.example.org`) | | `--replacements-file` | **Required.** Path to a file containing replacement rules | | `--simulate` | Show changes without saving them | | `--only-emails` | Process only email bodies | | `--only-attachments` | Process only attachments | --- ## Replacements File Format The replacements file uses **shlex parsing**, allowing quoted strings. Each line must contain **exactly two values**: ``` old_value new_value ``` ### Examples ``` password "********" "secret token" "[REDACTED]" john@example.com jane@example.com ``` Lines beginning with `#` are ignored. --- ## How It Works ### 1. Load Replacements The command reads the replacements file and builds a dictionary of `old → new` pairs. Malformed lines are skipped with warnings. ### 2. Fetch and Deduplicate Emails Emails are filtered by mailing list name and deduplicated by `message_id`. ### 3. Process Email Bodies If enabled, each email body is scanned and replacements are applied. ### 4. Process Attachments Attachments are: - Deduplicated by SHA‑1 hash - Checked for text‑based MIME types - Decoded using the attachment’s encoding - Updated and saved if modified ### 5. Simulation Mode If `--simulate` is used: - Changes are printed to stdout - No data is saved ### 6. Rebuild Search Index After real modifications, rebuild the HyperKitty search index: ```bash ./manage.py rebuild_index ``` --- ## Example ```bash ./manage.py sanitize_hyperkitty \ --list devteam@lists.example.org \ --replacements-file scrub.txt \ --simulate ``` This will scan all messages, show what would change, and leave the database untouched. --- ## Notes - `--only-emails` and `--only-attachments` cannot be used together. - Attachments without a MIME type attempt fallback detection based on filename. - Non‑text attachments are skipped automatically. --- ## License This script is intended for administrative use within Django/HyperKitty environments under GNU General Public License v3.0.