SanitizeMailMan3/README.md
2026-01-17 13:46:20 -05:00

3.4 KiB
Raw Blame History

HyperKitty SearchandReplace Management Command

This Django management command performs search and replace operations on HyperKitty email bodies and textbased attachments. It is designed for cases where sensitive information must be scrubbed from archived mailinglist data.

The command supports simulation mode, selective processing, and deduplication of both emails and attachments.


Features

  • Replace sensitive strings in:
    • Email bodies
    • Textbased attachments (text/plain, text/html, application/xhtml+xml)
  • Simulation mode (--simulate) to preview changes without saving
  • Deduplication of:
    • Emails (by message_id)
    • Attachments (by SHA1 content hash)
  • Flexible processing:
    • --only-emails
    • --only-attachments
  • Reads replacements from a simple text file using shlex parsing

Installation

Place the command file in:

/usr/lib/python3.10/site-packages/hyperkitty/management/commands/

(or the equivalent path for your Python/Django installation)

The filename should match the command name, for example:

sanitize_hyperkitty.py

Django will automatically detect it as a management command.


Usage

Run the command from your Django project directory:

./manage.py sanitize_hyperkitty \
    --list mylist@example.com \
    --replacements-file replacements.txt

Common Options

Option Description
--list Required. Mailing list name (e.g. team@lists.example.org)
--replacements-file Required. Path to a file containing replacement rules
--simulate Show changes without saving them
--only-emails Process only email bodies
--only-attachments Process only attachments

Replacements File Format

The replacements file uses shlex parsing, allowing quoted strings.

Each line must contain exactly two values:

old_value new_value

Examples

password "********"
"secret token" "[REDACTED]"
john@example.com jane@example.com

Lines beginning with # are ignored.


How It Works

1. Load Replacements

The command reads the replacements file and builds a dictionary of old → new pairs. Malformed lines are skipped with warnings.

2. Fetch and Deduplicate Emails

Emails are filtered by mailing list name and deduplicated by message_id.

3. Process Email Bodies

If enabled, each email body is scanned and replacements are applied.

4. Process Attachments

Attachments are:

  • Deduplicated by SHA1 hash
  • Checked for textbased MIME types
  • Decoded using the attachments encoding
  • Updated and saved if modified

5. Simulation Mode

If --simulate is used:

  • Changes are printed to stdout
  • No data is saved

6. Rebuild Search Index

After real modifications, rebuild the HyperKitty search index:

./manage.py rebuild_index

Example

./manage.py sanitize_hyperkitty \
    --list devteam@lists.example.org \
    --replacements-file scrub.txt \
    --simulate

This will scan all messages, show what would change, and leave the database untouched.


Notes

  • --only-emails and --only-attachments cannot be used together.
  • Attachments without a MIME type attempt fallback detection based on filename.
  • Nontext attachments are skipped automatically.

License

This script is intended for administrative use within Django/HyperKitty environments under GNU General Public License v3.0.