2026-01-17 13:46:20 -05:00

152 lines
3.4 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# HyperKitty SearchandReplace Management Command
This Django management command performs **search and replace operations** on
HyperKitty email bodies and textbased attachments. It is designed for cases
where sensitive information must be scrubbed from archived mailinglist data.
The command supports simulation mode, selective processing, and deduplication of
both emails and attachments.
---
## Features
- Replace sensitive strings in:
- Email bodies
- Textbased attachments (`text/plain`, `text/html`, `application/xhtml+xml`)
- Simulation mode (`--simulate`) to preview changes without saving
- Deduplication of:
- Emails (by `message_id`)
- Attachments (by SHA1 content hash)
- Flexible processing:
- `--only-emails`
- `--only-attachments`
- Reads replacements from a simple text file using `shlex` parsing
---
## Installation
Place the command file in:
```
/usr/lib/python3.10/site-packages/hyperkitty/management/commands/
```
(or the equivalent path for your Python/Django installation)
The filename should match the command name, for example:
```
sanitize_hyperkitty.py
```
Django will automatically detect it as a management command.
---
## Usage
Run the command from your Django project directory:
```bash
./manage.py sanitize_hyperkitty \
--list mylist@example.com \
--replacements-file replacements.txt
```
### Common Options
| Option | Description |
|--------|-------------|
| `--list` | **Required.** Mailing list name (e.g. `team@lists.example.org`) |
| `--replacements-file` | **Required.** Path to a file containing replacement rules |
| `--simulate` | Show changes without saving them |
| `--only-emails` | Process only email bodies |
| `--only-attachments` | Process only attachments |
---
## Replacements File Format
The replacements file uses **shlex parsing**, allowing quoted strings.
Each line must contain **exactly two values**:
```
old_value new_value
```
### Examples
```
password "********"
"secret token" "[REDACTED]"
john@example.com jane@example.com
```
Lines beginning with `#` are ignored.
---
## How It Works
### 1. Load Replacements
The command reads the replacements file and builds a dictionary of `old → new`
pairs. Malformed lines are skipped with warnings.
### 2. Fetch and Deduplicate Emails
Emails are filtered by mailing list name and deduplicated by `message_id`.
### 3. Process Email Bodies
If enabled, each email body is scanned and replacements are applied.
### 4. Process Attachments
Attachments are:
- Deduplicated by SHA1 hash
- Checked for textbased MIME types
- Decoded using the attachments encoding
- Updated and saved if modified
### 5. Simulation Mode
If `--simulate` is used:
- Changes are printed to stdout
- No data is saved
### 6. Rebuild Search Index
After real modifications, rebuild the HyperKitty search index:
```bash
./manage.py rebuild_index
```
---
## Example
```bash
./manage.py sanitize_hyperkitty \
--list devteam@lists.example.org \
--replacements-file scrub.txt \
--simulate
```
This will scan all messages, show what would change, and leave the database untouched.
---
## Notes
- `--only-emails` and `--only-attachments` cannot be used together.
- Attachments without a MIME type attempt fallback detection based on filename.
- Nontext attachments are skipped automatically.
---
## License
This script is intended for administrative use within Django/HyperKitty
environments under GNU General Public License v3.0.