152 lines
3.4 KiB
Markdown
152 lines
3.4 KiB
Markdown
# HyperKitty Search‑and‑Replace Management Command
|
||
|
||
This Django management command performs **search and replace operations** on
|
||
HyperKitty email bodies and text‑based attachments. It is designed for cases
|
||
where sensitive information must be scrubbed from archived mailing‑list data.
|
||
|
||
The command supports simulation mode, selective processing, and deduplication of
|
||
both emails and attachments.
|
||
|
||
---
|
||
|
||
## Features
|
||
|
||
- Replace sensitive strings in:
|
||
- Email bodies
|
||
- Text‑based attachments (`text/plain`, `text/html`, `application/xhtml+xml`)
|
||
- Simulation mode (`--simulate`) to preview changes without saving
|
||
- Deduplication of:
|
||
- Emails (by `message_id`)
|
||
- Attachments (by SHA‑1 content hash)
|
||
- Flexible processing:
|
||
- `--only-emails`
|
||
- `--only-attachments`
|
||
- Reads replacements from a simple text file using `shlex` parsing
|
||
|
||
---
|
||
|
||
## Installation
|
||
|
||
Place the command file in:
|
||
|
||
```
|
||
/usr/lib/python3.10/site-packages/hyperkitty/management/commands/
|
||
```
|
||
|
||
(or the equivalent path for your Python/Django installation)
|
||
|
||
The filename should match the command name, for example:
|
||
|
||
```
|
||
sanitize_hyperkitty.py
|
||
```
|
||
|
||
Django will automatically detect it as a management command.
|
||
|
||
---
|
||
|
||
## Usage
|
||
|
||
Run the command from your Django project directory:
|
||
|
||
```bash
|
||
./manage.py sanitize_hyperkitty \
|
||
--list mylist@example.com \
|
||
--replacements-file replacements.txt
|
||
```
|
||
|
||
### Common Options
|
||
|
||
| Option | Description |
|
||
|--------|-------------|
|
||
| `--list` | **Required.** Mailing list name (e.g. `team@lists.example.org`) |
|
||
| `--replacements-file` | **Required.** Path to a file containing replacement rules |
|
||
| `--simulate` | Show changes without saving them |
|
||
| `--only-emails` | Process only email bodies |
|
||
| `--only-attachments` | Process only attachments |
|
||
|
||
---
|
||
|
||
## Replacements File Format
|
||
|
||
The replacements file uses **shlex parsing**, allowing quoted strings.
|
||
|
||
Each line must contain **exactly two values**:
|
||
|
||
```
|
||
old_value new_value
|
||
```
|
||
|
||
### Examples
|
||
|
||
```
|
||
password "********"
|
||
"secret token" "[REDACTED]"
|
||
john@example.com jane@example.com
|
||
```
|
||
|
||
Lines beginning with `#` are ignored.
|
||
|
||
---
|
||
|
||
## How It Works
|
||
|
||
### 1. Load Replacements
|
||
The command reads the replacements file and builds a dictionary of `old → new`
|
||
pairs. Malformed lines are skipped with warnings.
|
||
|
||
### 2. Fetch and Deduplicate Emails
|
||
Emails are filtered by mailing list name and deduplicated by `message_id`.
|
||
|
||
### 3. Process Email Bodies
|
||
If enabled, each email body is scanned and replacements are applied.
|
||
|
||
### 4. Process Attachments
|
||
Attachments are:
|
||
|
||
- Deduplicated by SHA‑1 hash
|
||
- Checked for text‑based MIME types
|
||
- Decoded using the attachment’s encoding
|
||
- Updated and saved if modified
|
||
|
||
### 5. Simulation Mode
|
||
If `--simulate` is used:
|
||
|
||
- Changes are printed to stdout
|
||
- No data is saved
|
||
|
||
### 6. Rebuild Search Index
|
||
|
||
After real modifications, rebuild the HyperKitty search index:
|
||
|
||
```bash
|
||
./manage.py rebuild_index
|
||
```
|
||
|
||
---
|
||
|
||
## Example
|
||
|
||
```bash
|
||
./manage.py sanitize_hyperkitty \
|
||
--list devteam@lists.example.org \
|
||
--replacements-file scrub.txt \
|
||
--simulate
|
||
```
|
||
|
||
This will scan all messages, show what would change, and leave the database untouched.
|
||
|
||
---
|
||
|
||
## Notes
|
||
|
||
- `--only-emails` and `--only-attachments` cannot be used together.
|
||
- Attachments without a MIME type attempt fallback detection based on filename.
|
||
- Non‑text attachments are skipped automatically.
|
||
|
||
---
|
||
|
||
## License
|
||
|
||
This script is intended for administrative use within Django/HyperKitty
|
||
environments under GNU General Public License v3.0. |