Initial commit: Split Macha autonomous system into separate flake

Macha is now a standalone NixOS flake that can be imported into other systems. This provides: - Independent versioning - Easier reusability - Cleaner separation of concerns - Better development workflow Includes: - Complete autonomous system code - NixOS module with full configuration options - Queue-based architecture with priority system - Chunked map-reduce for large outputs - ChromaDB knowledge base - Tool calling system - Multi-host SSH management - Gotify notification integration All capabilities from DESIGN.md are preserved.
2025-10-06 14:32:37 -06:00
commit 22ba493d9e
30 changed files with 10306 additions and 0 deletions
--- a/.gitignore
+++ b/.gitignore
@@ -0,0 +1,23 @@
+__pycache__/
+*.py[cod]
+*$py.class
+*.so
+.Python
+*.egg-info/
+dist/
+build/
+
+# IDE
+.vscode/
+.idea/
+*.swp
+*.swo
+
+# Nix
+result
+result-*
+
+# Test data
+test_*.db
+*.log
+
--- a/DESIGN.md
+++ b/DESIGN.md
@@ -0,0 +1,269 @@
+# Macha Autonomous System - Design Document
+
+> **⚠️ IMPORTANT - READ THIS FIRST**  
+> **FOR AI ASSISTANT**: This document is YOUR reference guide when modifying Macha's code.
+> - **ALWAYS consult this BEFORE refactoring** to ensure you don't remove existing capabilities
+> - **CHECK this when adding features** to avoid conflicts
+> - **UPDATE this document** when new capabilities are added
+> - **DO NOT DELETE ANYTHING FROM THIS DOCUMENT**
+> - During major refactors, you MUST verify each capability listed here is preserved
+
+## Overview
+Macha is an AI-powered autonomous system administrator capable of monitoring, maintaining, and managing multiple NixOS hosts in the infrastructure.
+
+## Core Capabilities
+
+### 1. Local System Management
+- Monitor system health (CPU, memory, disk, services)
+- Read and analyze logs via `journalctl`
+- Check service status and restart failed services
+- Execute system commands (with safety restrictions)
+- Monitor and repair Nix store corruption
+- Hardware awareness (CPU, GPU, network, storage)
+
+### 2. Multi-Host Management via SSH
+
+**Macha CAN and SHOULD use SSH to manage other hosts.**
+
+#### SSH Access
+- Runs as `macha` user (UID 2501)
+- Has `NOPASSWD` sudo access for administrative commands
+- Shares SSH keys with other hosts in the infrastructure
+- Can SSH to: `rhiannon`, `alexander`, `UCAR-Kinston`, and others in the flake
+
+#### SSH Usage Patterns
+1. **Direct diagnostic commands:**
+   ```bash
+   ssh rhiannon systemctl status ollama
+   ssh alexander df -h
+   ```
+   - Commands automatically prefixed with `sudo` by the tools layer
+   - Full command: `ssh macha@rhiannon sudo systemctl status ollama`
+
+2. **Status checks:**
+   - Check service health on remote hosts
+   - Gather system metrics
+   - Review logs
+   - Monitor resource usage
+
+3. **File operations:**
+   - Use `scp` to copy files between hosts
+   - Read configuration files on remote systems
+
+#### When to use SSH vs nh
+- **SSH**: For diagnostics, status checks, log review, quick commands
+- **nh remote deployment**: For applying NixOS configuration changes
+  - `nh os switch -u --target-host=rhiannon --hostname=rhiannon`
+  - Builds locally, deploys to remote host
+  - Use for permanent configuration changes
+
+### 3. NixOS Configuration Management
+
+#### Local Changes
+- Can propose changes to NixOS configuration
+- Requires human approval before applying
+- Uses `nh os switch` for local updates
+
+#### Remote Deployment
+- Can deploy to other hosts using `nh` with `--target-host`
+- Builds configuration locally (on Macha)
+- Pushes to remote system
+- Can take up to 1 hour for complex builds
+- **IMPORTANT**: Be patient with long-running builds, don't retry prematurely
+
+### 4. Hardware Awareness
+
+#### Local Hardware Detection
+- CPU: `lscpu` via `nix-shell -p util-linux`
+- GPU: `lspci` via `nix-shell -p pciutils`
+- Network: `lsblk`, `ip addr`
+- Storage: `df -h`, `lsblk`
+- USB devices: `lsusb`
+
+#### GPU Metrics
+- AMD GPUs: Try `rocm-smi`, sysfs (`/sys/class/drm/card*/device/`)
+- NVIDIA GPUs: Try `nvidia-smi`
+- Fallback: `sensors` for temperature data
+- Queries: temperature, utilization, clock speeds, power usage
+
+### 5. Ollama Queue System
+
+#### Architecture
+- **File-based queue**: `/var/lib/macha/queues/ollama/`
+- **Queue worker**: `ollama-queue-worker.service` (runs as `macha` user)
+- **Purpose**: Serialize all LLM requests to prevent resource contention
+
+#### Request Flow
+1. Any user (including regular users) → Write request to `pending/`
+2. Queue worker → Process requests serially (FIFO with priority)
+3. Queue worker → Write response to `completed/`
+4. Original requester → Read response from `completed/`
+
+#### Priority Levels
+- `INTERACTIVE` (0): User requests via `macha-chat`, `macha-ask`
+- `AUTONOMOUS` (1): Background maintenance checks
+- `BATCH` (2): Low-priority bulk operations
+
+#### Large Output Handling
+- Outputs >8KB: Split into chunks for hierarchical processing
+- Each chunk ~8KB (~2000 tokens)
+- Process chunks serially with progress feedback
+- Generate chunk summaries → meta-summary
+- Full outputs cached in `/var/lib/macha/tool_cache/`
+
+### 6. Knowledge Base & Learning
+
+#### ChromaDB Collections
+1. **System Context**: Infrastructure topology, service relationships
+2. **Issues**: Historical problems and resolutions
+3. **Knowledge**: Operational wisdom learned from experience
+
+#### Automatic Learning
+- After successful operations, Macha reflects and extracts key learnings
+- Stores: topic, knowledge content, category
+- Retrieved automatically when relevant to current tasks
+- Use `macha-knowledge` CLI to view/manage
+
+### 7. Notifications
+
+#### Gotify Integration
+- Can send notifications via `macha-notify` command
+- Tool: `send_notification(title, message, priority)`
+
+#### Priority Levels
+- `2` (Low/Info): Routine status updates, completed tasks
+- `5` (Medium/Attention): Important events, configuration changes
+- `8` (High/Critical): Service failures, critical errors, security issues
+
+#### When to Notify
+- Critical service failures
+- Successful completion of major operations
+- Configuration changes that may affect users
+- Security-related events
+- When explicitly requested by user
+
+### 8. Safety & Constraints
+
+#### Command Restrictions
+**Allowed Commands** (see `tools.py` for full list):
+- System management: `systemctl`, `journalctl`, `nh`, `nixos-rebuild`
+- Monitoring: `free`, `df`, `uptime`, `ps`, `top`, `ip`, `ss`
+- Hardware: `lscpu`, `lspci`, `lsblk`, `lshw`, `dmidecode`
+- Remote: `ssh`, `scp`
+- Power: `reboot`, `shutdown`, `poweroff` (use cautiously!)
+- File ops: `cat`, `ls`, `grep`
+- Network: `ping`, `dig`, `nslookup`, `curl`, `wget`
+- Logging: `logger`
+
+**NOT Allowed**:
+- Direct package modifications (`nix-env`, `nix profile`)
+- Destructive file operations (`rm -rf`, `dd`)
+- User management outside of NixOS config
+- Direct editing of system files (use NixOS config instead)
+
+#### Critical Services
+**Never disable or stop:**
+- SSH (network access)
+- Networking (connectivity)
+- systemd (system management)
+- Boot-related services
+
+#### Approval Required
+- Reboots or system power changes
+- Major configuration changes
+- Disabling any service
+- Changes to multiple hosts
+
+### 9. Nix Store Maintenance
+
+#### Verification & Repair
+- Command: `nix-store --verify --check-contents --repair`
+- **WARNING**: Can take 30+ minutes to several hours
+- Only use when corruption is suspected
+- Not for routine maintenance
+- Verifies all store paths, repairs corrupted files
+
+#### Garbage Collection
+- Automatic via system configuration
+- Can be triggered manually with approval
+- Frees disk space by removing unused derivations
+
+### 10. Conversational Behavior
+
+#### Distinguish Requests from Acknowledgments
+- "Thanks" / "Thank you" → Acknowledgment (don't re-execute)
+- "Can you..." / "Please..." → Request (execute)
+- "What is..." / "How do..." → Question (answer)
+
+#### Tool Calling
+- Don't repeat tool calls unnecessarily
+- If a tool succeeds, don't run it again unless asked
+- Use cached results when available (`retrieve_cached_output`)
+
+#### Context Management
+- Be aware of token limits
+- Use hierarchical processing for large outputs
+- Prune conversation history intelligently
+- Cache and summarize when needed
+
+## Infrastructure Topology
+
+### Hosts in Flake
+- **macha**: Main autonomous system (self), GPU server
+- **rhiannon**: Production server
+- **alexander**: Production server  
+- **UCAR-Kinston**: Work laptop
+- **test-vm**: Testing environment
+
+### Shared Configuration
+- All hosts share root SSH keys (for `nh` remote deployment)
+- `macha` user (UID 2501) exists on all hosts
+- Common NixOS configuration via flake
+
+## Service Ecosystem
+
+### Core Services on Macha
+- `ollama.service`: LLM inference engine
+- `ollama-queue-worker.service`: Request serialization
+- `macha-autonomous.service`: Autonomous monitoring loop
+- Servarr stack: Sonarr, Radarr, Prowlarr, Lidarr, Readarr, Whisparr
+- Media: Transmission, SABnzbd, Calibre
+
+### State Directories
+- `/var/lib/macha/`: Main state directory (0755, macha:macha)
+- `/var/lib/macha/queues/`: Queue directories (0777 for multi-user)
+- `/var/lib/macha/tool_cache/`: Cached tool outputs (0777)
+- `/var/lib/macha/system_context.db`: ChromaDB database
+
+## CLI Tools
+
+- `macha-chat`: Interactive chat with tool calling
+- `macha-ask`: Single-question interface
+- `macha-check`: Trigger immediate health check
+- `macha-approve`: Approve pending actions
+- `macha-logs`: View autonomous service logs
+- `macha-issues`: Query issue database
+- `macha-knowledge`: Query knowledge base
+- `macha-systems`: List managed systems
+- `macha-notify`: Send Gotify notification
+
+## Philosophy & Principles
+
+1. **KISS (Keep It Simple, Stupid)**: Use existing NixOS options, avoid custom wrappers
+2. **Verify first**: Check source code/documentation before acting
+3. **Safety first**: Never break critical services, always require approval for risky changes
+4. **Learn continuously**: Extract and store operational knowledge
+5. **Multi-host awareness**: Macha manages the entire infrastructure, not just herself
+6. **User-friendly**: Clear communication, appropriate notifications
+7. **Patience**: Long-running operations (builds, repairs) can take an hour - don't panic
+8. **Tool reuse**: Use existing, verified tools instead of writing custom scripts
+
+## Future Capabilities (Not Yet Implemented)
+
+- [ ] Automatic security updates across all hosts
+- [ ] Predictive failure detection
+- [ ] Resource optimization recommendations
+- [ ] Integration with other communication platforms
+- [ ] Multi-agent coordination between hosts
+- [ ] Automated testing before deployment
+
--- a/EXAMPLES.md
+++ b/EXAMPLES.md
@@ -0,0 +1,275 @@
+# Macha Autonomous System - Configuration Examples
+
+## Basic Configurations
+
+### Conservative (Recommended for Start)
+```nix
+services.macha-autonomous = {
+  enable = true;
+  autonomyLevel = "suggest";  # Require approval for all actions
+  checkInterval = 300;        # Check every 5 minutes
+  model = "llama3.1:70b";     # Most capable model
+};
+```
+
+### Moderate Autonomy
+```nix
+services.macha-autonomous = {
+  enable = true;
+  autonomyLevel = "auto-safe";  # Auto-fix safe issues
+  checkInterval = 180;          # Check every 3 minutes
+  model = "llama3.1:70b";
+};
+```
+
+### High Autonomy (Experimental)
+```nix
+services.macha-autonomous = {
+  enable = true;
+  autonomyLevel = "auto-full";  # Full autonomy
+  checkInterval = 300;
+  model = "llama3.1:70b";
+};
+```
+
+### Monitoring Only
+```nix
+services.macha-autonomous = {
+  enable = true;
+  autonomyLevel = "observe";  # No actions, just watch
+  checkInterval = 60;         # Check every minute
+  model = "qwen3:8b-fp16";    # Lighter model is fine for observation
+};
+```
+
+## Advanced Scenarios
+
+### Using a Smaller Model (Faster, Less Capable)
+```nix
+services.macha-autonomous = {
+  enable = true;
+  autonomyLevel = "auto-safe";
+  checkInterval = 120;
+  model = "qwen3:8b-fp16";  # Faster inference, less reasoning depth
+  # or
+  # model = "llama3.1:8b";  # Also good for simple tasks
+};
+```
+
+### High-Frequency Monitoring
+```nix
+services.macha-autonomous = {
+  enable = true;
+  autonomyLevel = "auto-safe";
+  checkInterval = 60;  # Check every minute
+  model = "qwen3:4b-instruct-2507-fp16";  # Lightweight model
+};
+```
+
+### Remote Ollama (if running Ollama elsewhere)
+```nix
+services.macha-autonomous = {
+  enable = true;
+  autonomyLevel = "suggest";
+  checkInterval = 300;
+  ollamaHost = "http://192.168.1.100:11434";  # Remote Ollama instance
+  model = "llama3.1:70b";
+};
+```
+
+## Manual Testing Workflow
+
+1. **Test with a one-shot run:**
+```bash
+# Run once in observe mode
+macha-check
+
+# Review what it detected
+cat /var/lib/macha-autonomous/decisions.jsonl | tail -1 | jq .
+```
+
+2. **Enable in suggest mode:**
+```nix
+services.macha-autonomous = {
+  enable = true;
+  autonomyLevel = "suggest";
+  checkInterval = 300;
+  model = "llama3.1:70b";
+};
+```
+
+3. **Rebuild and start:**
+```bash
+sudo nixos-rebuild switch --flake .#macha
+sudo systemctl status macha-autonomous
+```
+
+4. **Monitor for a while:**
+```bash
+# Watch the logs
+journalctl -u macha-autonomous -f
+
+# Or use the helper
+macha-logs service
+```
+
+5. **Review proposed actions:**
+```bash
+macha-approve list
+```
+
+6. **Graduate to auto-safe when comfortable:**
+```nix
+services.macha-autonomous.autonomyLevel = "auto-safe";
+```
+
+## Scenario-Based Examples
+
+### Media Server (Let it auto-restart services)
+```nix
+services.macha-autonomous = {
+  enable = true;
+  autonomyLevel = "auto-safe";  # Auto-restart failed arr apps
+  checkInterval = 180;
+  model = "llama3.1:70b";
+};
+```
+
+### Development Machine (Observe only, you want control)
+```nix
+services.macha-autonomous = {
+  enable = true;
+  autonomyLevel = "observe";
+  checkInterval = 600;  # Check less frequently
+  model = "llama3.1:8b";  # Lighter model
+};
+```
+
+### Critical Production (Suggest only, manual approval)
+```nix
+services.macha-autonomous = {
+  enable = true;
+  autonomyLevel = "suggest";
+  checkInterval = 120;  # More frequent monitoring
+  model = "llama3.1:70b";  # Best reasoning
+};
+```
+
+### Experimental/Learning (Full autonomy)
+```nix
+services.macha-autonomous = {
+  enable = true;
+  autonomyLevel = "auto-full";
+  checkInterval = 300;
+  model = "llama3.1:70b";
+};
+```
+
+## Customizing Behavior
+
+### The config file lives at:
+`/etc/macha-autonomous/config.json` (auto-generated from NixOS config)
+
+### To modify the AI prompts:
+Edit the Python files in `systems/macha-configs/autonomous/`:
+- `agent.py` - AI analysis and decision prompts
+- `monitor.py` - What data to collect
+- `executor.py` - Safety rules and action execution
+- `orchestrator.py` - Main control flow
+
+After editing, rebuild:
+```bash
+sudo nixos-rebuild switch --flake .#macha
+sudo systemctl restart macha-autonomous
+```
+
+## Integration with Other Services
+
+### Example: Auto-restart specific services
+The system will automatically detect and propose restarting failed services.
+
+### Example: Disk cleanup when space is low
+Monitor will detect low disk space, AI will propose cleanup, executor will run `nix-collect-garbage`.
+
+### Example: Log analysis
+AI analyzes recent error logs and can propose fixes based on error patterns.
+
+## Debugging
+
+### See what the monitor sees:
+```bash
+sudo -u macha-autonomous python3 /nix/store/.../monitor.py
+```
+
+### Test the AI agent:
+```bash
+sudo -u macha-autonomous python3 /nix/store/.../agent.py test
+```
+
+### View all snapshots:
+```bash
+ls -lh /var/lib/macha-autonomous/snapshot_*.json
+cat /var/lib/macha-autonomous/snapshot_$(ls -t /var/lib/macha-autonomous/snapshot_*.json | head -1) | jq .
+```
+
+### Check approval queue:
+```bash
+cat /var/lib/macha-autonomous/approval_queue.json | jq .
+```
+
+## Performance Tuning
+
+### Model Choice Impact:
+
+| Model | Speed | Capability | RAM Usage | Best For |
+|-------|-------|------------|-----------|----------|
+| llama3.1:70b | Slow (~30s) | Excellent | ~40GB | Complex reasoning |
+| llama3.1:8b | Fast (~3s) | Good | ~5GB | General use |
+| qwen3:8b-fp16 | Fast (~2s) | Good | ~16GB | General use |
+| qwen3:4b | Very Fast (~1s) | Moderate | ~8GB | Simple tasks |
+
+### Check Interval Impact:
+- 60s: High responsiveness, more resource usage
+- 300s (default): Good balance
+- 600s: Low overhead, slower detection
+
+### Memory Usage:
+- Monitor: ~50MB
+- Agent (per query): Depends on model (see above)
+- Executor: ~30MB
+- Orchestrator: ~20MB
+
+Total continuous overhead: ~100MB + model inference when running
+
+## Security Considerations
+
+### The autonomous user has sudo access to:
+- `systemctl restart/status` - Restart services
+- `journalctl` - Read logs
+- `nix-collect-garbage` - Clean up Nix store
+
+### It CANNOT:
+- Modify arbitrary files
+- Access user home directories (ProtectHome=true)
+- Disable protected services (SSH, networking)
+- Make changes without logging
+
+### Audit trail:
+All actions are logged in `/var/lib/macha-autonomous/actions.jsonl`
+
+### To revoke access:
+Set `enable = false` and rebuild, or stop the service.
+
+## Future: MCP Integration
+
+You already have MCP servers installed:
+- `mcp-nixos` - NixOS-specific tools
+- `gitea-mcp-server` - Git integration
+- `emcee` - General MCP orchestration
+
+Future versions could integrate these for:
+- Better NixOS config manipulation
+- Git-based config versioning
+- More sophisticated tooling
+
+Stay tuned!
--- a/LOGGING_EXAMPLE.md
+++ b/LOGGING_EXAMPLE.md
@@ -0,0 +1,217 @@
+# Enhanced Logging Example
+
+This shows what the improved journalctl output will look like for Macha's autonomous system.
+
+## Example Output
+
+### Maintenance Cycle Start
+```
+[2025-10-01T14:30:00] === Starting maintenance cycle ===
+[2025-10-01T14:30:00] Collecting system health data...
+
+[2025-10-01T14:30:02] ============================================================
+[2025-10-01T14:30:02] SYSTEM HEALTH SUMMARY
+[2025-10-01T14:30:02] ============================================================
+[2025-10-01T14:30:02] Resources: CPU 25.3%, Memory 45.2%, Load 1.24
+[2025-10-01T14:30:02] Disk: 35.6% used (/ partition)
+[2025-10-01T14:30:02] Services: 1 failed
+[2025-10-01T14:30:02]   - ollama.service (failed)
+[2025-10-01T14:30:02] Network: Internet reachable
+[2025-10-01T14:30:02] Recent logs: 3 errors in last hour
+[2025-10-01T14:30:02] ============================================================
+
+[2025-10-01T14:30:02] KEY METRICS:
+[2025-10-01T14:30:02]   CPU Usage: 25.3%
+[2025-10-01T14:30:02]   Memory Usage: 45.2%
+[2025-10-01T14:30:02]   Load Average: 1.24
+[2025-10-01T14:30:02]   Failed Services: 1
+[2025-10-01T14:30:02]   Errors (1h): 3
+[2025-10-01T14:30:02]   Disk /: 35.6% used
+[2025-10-01T14:30:02]   Disk /home: 62.1% used
+[2025-10-01T14:30:02]   Disk /var: 28.9% used
+[2025-10-01T14:30:02]   Internet: ✅ Connected
+```
+
+### AI Analysis Section
+```
+[2025-10-01T14:30:02] Analyzing system state with AI...
+
+[2025-10-01T14:30:35] ============================================================
+[2025-10-01T14:30:35] AI ANALYSIS RESULTS
+[2025-10-01T14:30:35] ============================================================
+[2025-10-01T14:30:35] Overall Status: ATTENTION_NEEDED
+[2025-10-01T14:30:35] Assessment: System has one failed service that should be restarted
+
+[2025-10-01T14:30:35] Detected 1 issue(s):
+
+[2025-10-01T14:30:35]   Issue #1:
+[2025-10-01T14:30:35]     Severity: WARNING
+[2025-10-01T14:30:35]     Category: services
+[2025-10-01T14:30:35]     Description: ollama.service has failed and needs to be restarted
+[2025-10-01T14:30:35]     ⚠️ ACTION REQUIRED
+
+[2025-10-01T14:30:35] Recommended Actions (1):
+[2025-10-01T14:30:35]   - Restart ollama.service to restore LLM functionality
+[2025-10-01T14:30:35] ============================================================
+```
+
+### Action Handling Section
+```
+[2025-10-01T14:30:35] Found 1 issues requiring action
+
+[2025-10-01T14:30:35] ────────────────────────────────────────────────────────────
+[2025-10-01T14:30:35] Addressing issue: ollama.service has failed and needs to be restarted
+[2025-10-01T14:30:35] Requesting AI fix proposal...
+
+[2025-10-01T14:30:45] AI FIX PROPOSAL:
+[2025-10-01T14:30:45]   Diagnosis: ollama.service crashed or failed to start properly
+[2025-10-01T14:30:45]   Proposed Action: Restart ollama.service using systemctl
+[2025-10-01T14:30:45]   Action Type: systemd_restart
+[2025-10-01T14:30:45]   Risk Level: LOW
+[2025-10-01T14:30:45]   Commands to execute:
+[2025-10-01T14:30:45]     - systemctl restart ollama.service
+[2025-10-01T14:30:45]   Reasoning: Restarting the service is a safe, standard troubleshooting step
+[2025-10-01T14:30:45]   Rollback Plan: Service will return to failed state if restart doesn't work
+
+[2025-10-01T14:30:45] Executing action...
+
+[2025-10-01T14:30:47] EXECUTION RESULT:
+[2025-10-01T14:30:47]   Status: QUEUED_FOR_APPROVAL
+[2025-10-01T14:30:47]   Executed: No
+[2025-10-01T14:30:47]   Reason: Autonomy level requires manual approval
+```
+
+### Cycle Complete Summary
+```
+[2025-10-01T14:30:47] No issues requiring immediate action
+
+[2025-10-01T14:30:47] ============================================================
+[2025-10-01T14:30:47] MAINTENANCE CYCLE COMPLETE
+[2025-10-01T14:30:47] ============================================================
+[2025-10-01T14:30:47] Status: ATTENTION_NEEDED
+[2025-10-01T14:30:47] Issues Found: 1
+[2025-10-01T14:30:47] Actions Taken: 1
+[2025-10-01T14:30:47]   - Executed: 0
+[2025-10-01T14:30:47]   - Queued for approval: 1
+[2025-10-01T14:30:47] Next check in: 300 seconds
+[2025-10-01T14:30:47] ============================================================
+```
+
+## When System is Healthy
+
+```
+[2025-10-01T14:35:00] === Starting maintenance cycle ===
+[2025-10-01T14:35:00] Collecting system health data...
+
+[2025-10-01T14:35:02] ============================================================
+[2025-10-01T14:35:02] SYSTEM HEALTH SUMMARY
+[2025-10-01T14:35:02] ============================================================
+[2025-10-01T14:35:02] Resources: CPU 12.5%, Memory 38.1%, Load 0.65
+[2025-10-01T14:35:02] Disk: 35.6% used (/ partition)
+[2025-10-01T14:35:02] Services: All running
+[2025-10-01T14:35:02] Network: Internet reachable
+[2025-10-01T14:35:02] Recent logs: 0 errors in last hour
+[2025-10-01T14:35:02] ============================================================
+
+[2025-10-01T14:35:02] KEY METRICS:
+[2025-10-01T14:35:02]   CPU Usage: 12.5%
+[2025-10-01T14:35:02]   Memory Usage: 38.1%
+[2025-10-01T14:35:02]   Load Average: 0.65
+[2025-10-01T14:35:02]   Failed Services: 0
+[2025-10-01T14:35:02]   Errors (1h): 0
+[2025-10-01T14:35:02]   Disk /: 35.6% used
+[2025-10-01T14:35:02]   Internet: ✅ Connected
+
+[2025-10-01T14:35:02] Analyzing system state with AI...
+
+[2025-10-01T14:35:28] ============================================================
+[2025-10-01T14:35:28] AI ANALYSIS RESULTS
+[2025-10-01T14:35:28] ============================================================
+[2025-10-01T14:35:28] Overall Status: HEALTHY
+[2025-10-01T14:35:28] Assessment: System is operating normally with no issues detected
+
+[2025-10-01T14:35:28] ✅ No issues detected
+[2025-10-01T14:35:28] ============================================================
+
+[2025-10-01T14:35:28] No issues requiring immediate action
+
+[2025-10-01T14:35:28] ============================================================
+[2025-10-01T14:35:28] MAINTENANCE CYCLE COMPLETE
+[2025-10-01T14:35:28] ============================================================
+[2025-10-01T14:35:28] Status: HEALTHY
+[2025-10-01T14:35:28] Issues Found: 0
+[2025-10-01T14:35:28] Actions Taken: 0
+[2025-10-01T14:35:28] Next check in: 300 seconds
+[2025-10-01T14:35:28] ============================================================
+```
+
+## Viewing Logs
+
+### Follow live logs
+```bash
+journalctl -u macha-autonomous.service -f
+```
+
+### See only AI decisions
+```bash
+journalctl -u macha-autonomous.service | grep "AI ANALYSIS"
+```
+
+### See only execution results
+```bash
+journalctl -u macha-autonomous.service | grep "EXECUTION RESULT"
+```
+
+### See key metrics
+```bash
+journalctl -u macha-autonomous.service | grep "KEY METRICS" -A 10
+```
+
+### Filter by status level
+```bash
+# Only show intervention required
+journalctl -u macha-autonomous.service | grep "INTERVENTION_REQUIRED"
+
+# Only show critical issues
+journalctl -u macha-autonomous.service | grep "CRITICAL"
+
+# Only show action required
+journalctl -u macha-autonomous.service | grep "ACTION REQUIRED"
+```
+
+### Summary of last cycle
+```bash
+journalctl -u macha-autonomous.service | grep "MAINTENANCE CYCLE COMPLETE" -B 5 | tail -6
+```
+
+## Benefits of Enhanced Logging
+
+### 1. **Easy to Scan**
+Clear section headers with separators make it easy to find what you need
+
+### 2. **Structured Data**
+Key metrics are labeled consistently for easy parsing/grepping
+
+### 3. **Complete Context**
+Each cycle shows:
+- What the system saw
+- What the AI thought
+- What action was proposed
+- What actually happened
+
+### 4. **AI Transparency**
+You can see:
+- The AI's reasoning for each decision
+- Risk assessment for each action
+- Rollback plans if something goes wrong
+
+### 5. **Audit Trail**
+Everything is logged to journalctl for long-term storage and analysis
+
+### 6. **Troubleshooting**
+If something goes wrong, you have complete context:
+- System state before the issue
+- AI's diagnosis
+- Action attempted
+- Result of action
+
--- a/NOTIFICATIONS.md
+++ b/NOTIFICATIONS.md
@@ -0,0 +1,224 @@
+# Gotify Notifications Setup
+
+Macha's autonomous system can now send notifications to Gotify on Rhiannon for critical events.
+
+## What Gets Notified
+
+### High Priority (🚨 Priority 8)
+- **Critical issues detected** - System problems requiring immediate attention
+- **Service failures** - When critical services fail
+- **Failed actions** - When an action execution fails
+- **Intervention required** - When system status is critical
+
+### Medium Priority (📋 Priority 5)
+- **Actions queued for approval** - When medium/high-risk actions need manual review
+- **System attention needed** - When system status needs attention
+
+### Low Priority (✅ Priority 2)
+- **Successful actions** - When safe actions execute successfully
+- **System healthy** - Periodic health check confirmations (if enabled)
+
+## Setup Instructions
+
+### Step 1: Create Gotify Application on Rhiannon
+
+1. Open Gotify web interface on Rhiannon:
+   ```bash
+   # URL: http://rhiannon:8181 (or use external access)
+   ```
+
+2. Log in to Gotify
+
+3. Go to **"Apps"** tab
+
+4. Click **"Create Application"**
+
+5. Name it: `Macha Autonomous System`
+
+6. Copy the generated **Application Token**
+
+### Step 2: Configure Macha
+
+Edit `/home/lily/Documents/gitrepos/nixos-servers/systems/macha.nix`:
+
+```nix
+services.macha-autonomous = {
+  enable = true;
+  autonomyLevel = "suggest";
+  checkInterval = 300;
+  model = "llama3.1:70b";
+  
+  # Gotify notifications
+  gotifyUrl = "http://rhiannon:8181";
+  gotifyToken = "YOUR_TOKEN_HERE";  # Paste the token from Step 1
+};
+```
+
+### Step 3: Rebuild and Deploy
+
+```bash
+cd /home/lily/Documents/gitrepos/nixos-servers
+sudo nixos-rebuild switch --flake .#macha
+```
+
+### Step 4: Test Notifications
+
+Send a test notification:
+
+```bash
+macha-notify "Test" "Macha notifications are working!" 5
+```
+
+You should see this notification appear in Gotify on Rhiannon.
+
+## CLI Tools
+
+### Send Test Notification
+```bash
+macha-notify <title> <message> [priority]
+
+# Examples:
+macha-notify "Test" "This is a test" 5
+macha-notify "Critical" "This is urgent" 8
+macha-notify "Info" "Just FYI" 2
+```
+
+Priorities:
+- `2` - Low (✅ green)
+- `5` - Medium (📋 blue)
+- `8` - High (🚨 red)
+
+### Check if Notifications are Enabled
+
+```bash
+# View the service environment
+systemctl show macha-autonomous.service | grep GOTIFY
+```
+
+## Notification Examples
+
+### Critical Issue
+```
+🚨 Macha: Critical Issue
+⚠️ Critical Issue Detected
+
+High disk usage on /var partition (95% full)
+
+Details:
+Category: disk
+```
+
+### Action Queued for Approval
+```
+📋 Macha: Action Needs Approval
+ℹ️ Action Queued for Approval
+
+Action: Restart failed service: ollama.service
+Risk Level: low
+
+Use 'macha-approve list' to review
+```
+
+### Action Executed Successfully
+```
+✅ Macha: Action Success
+✅ Action Success
+
+Restart failed service: ollama.service
+
+Output:
+Service restarted successfully
+```
+
+### Action Failed
+```
+❌ Macha: Action Failed
+❌ Action Failed
+
+Clean up disk space with nix-collect-garbage
+
+Output:
+Error: Insufficient permissions
+```
+
+## Security Notes
+
+1. **Token Storage**: The Gotify token is stored in the NixOS configuration. Consider using a secrets management solution for production.
+
+2. **Network Access**: Macha needs network access to Rhiannon. Ensure your firewall allows HTTP traffic between them.
+
+3. **Token Scope**: The Gotify token only allows sending messages, not reading or managing Gotify.
+
+## Troubleshooting
+
+### Notifications Not Appearing
+
+1. **Check Gotify is running on Rhiannon:**
+   ```bash
+   ssh rhiannon systemctl status gotify
+   ```
+
+2. **Test connectivity from Macha:**
+   ```bash
+   curl http://rhiannon:8181/health
+   ```
+
+3. **Verify token is set:**
+   ```bash
+   macha-notify "Test" "Testing" 5
+   ```
+
+4. **Check service logs:**
+   ```bash
+   macha-logs service | grep -i gotify
+   ```
+
+### Notification Spam
+
+If you're getting too many notifications, you can:
+
+1. **Disable notifications temporarily:**
+   ```nix
+   services.macha-autonomous.gotifyUrl = "";  # Empty string disables
+   ```
+
+2. **Adjust autonomy level:**
+   ```nix
+   services.macha-autonomous.autonomyLevel = "auto-safe";  # Fewer approval notifications
+   ```
+
+3. **Increase check interval:**
+   ```nix
+   services.macha-autonomous.checkInterval = 900;  # Check every 15 minutes instead of 5
+   ```
+
+## Implementation Details
+
+### Files Modified
+- `notifier.py` - Gotify notification client
+- `module.nix` - Added configuration options and CLI tool
+- `orchestrator.py` - Integrated notifications at decision points
+- `macha.nix` - Added Gotify configuration
+
+### Notification Flow
+```
+Issue Detected → AI Analysis → Decision Made → Notification Sent
+                                    ↓
+                          Queued or Executed → Notification Sent
+```
+
+### Graceful Degradation
+- If Gotify is unavailable, the system continues to operate
+- Failed notifications are logged but don't crash the service
+- Notifications have a 10-second timeout to prevent blocking
+
+## Future Enhancements
+
+Possible improvements:
+- [ ] Rate limiting to prevent notification spam
+- [ ] Notification grouping (batch similar issues)
+- [ ] Custom notification templates
+- [ ] Priority-based notification filtering
+- [ ] Integration with other notification services (email, SMS)
+- [ ] Secrets management for tokens (agenix, sops-nix)
+
--- a/QUICKSTART.md
+++ b/QUICKSTART.md
@@ -0,0 +1,229 @@
+# Macha Autonomous System - Quick Start Guide
+
+## What is This?
+
+Macha now has a self-maintenance system that uses local AI (via Ollama) to monitor, analyze, and maintain itself. Think of it as a 24/7 system administrator that watches over Macha.
+
+## How It Works
+
+1. **Monitor**: Every 5 minutes, collects system health data (services, resources, logs, etc.)
+2. **Analyze**: Uses llama3.1:70b to analyze the data and detect issues
+3. **Act**: Based on autonomy level, either proposes fixes or executes them automatically
+4. **Learn**: Logs all decisions and actions for auditing and improvement
+
+## Autonomy Levels
+
+### `observe` - Monitoring Only
+- Monitors system health
+- Logs everything
+- Takes NO actions
+- Good for: Testing, learning what the system sees
+
+### `suggest` - Approval Required (DEFAULT)
+- Monitors and analyzes
+- Proposes fixes
+- Requires manual approval before executing
+- Good for: Production use, when you want control
+
+### `auto-safe` - Limited Autonomy
+- Auto-executes "safe" actions:
+  - Restarting failed services
+  - Disk cleanup
+  - Log rotation
+  - Read-only diagnostics
+- Asks approval for risky changes
+- Good for: Hands-off operation with safety net
+
+### `auto-full` - Full Autonomy
+- Auto-executes most actions
+- Still requires approval for HIGH RISK actions
+- Never touches protected services (SSH, networking, etc.)
+- Good for: Experimental, when you trust the system
+
+## Commands
+
+### Check the status
+```bash
+# View the service status
+systemctl status macha-autonomous
+
+# View live logs
+macha-logs service
+
+# View AI decision log
+macha-logs decisions
+
+# View action execution log
+macha-logs actions
+
+# View orchestrator log
+macha-logs orchestrator
+```
+
+### Run a manual check
+```bash
+# Run one maintenance cycle now
+macha-check
+```
+
+### Approval workflow (when autonomyLevel = "suggest")
+```bash
+# List pending actions awaiting approval
+macha-approve list
+
+# Approve action number 0
+macha-approve approve 0
+```
+
+### Change autonomy level
+Edit `/home/lily/Documents/nixos-servers/systems/macha.nix`:
+```nix
+services.macha-autonomous = {
+  enable = true;
+  autonomyLevel = "auto-safe";  # Change this
+  checkInterval = 300;
+  model = "llama3.1:70b";
+};
+```
+
+Then rebuild:
+```bash
+sudo nixos-rebuild switch --flake .#macha
+```
+
+## What Can It Do?
+
+### Automatically Detects
+- Failed systemd services
+- High resource usage (CPU, RAM, disk)
+- Recent errors in logs
+- Network connectivity issues
+- Disk space problems
+- Boot/uptime anomalies
+
+### Can Propose/Execute
+- Restart failed services
+- Clean up disk space (nix store, old logs)
+- Investigate issues (run diagnostics)
+- Propose configuration changes (for manual review)
+- NixOS rebuilds (with safety checks)
+
+### Safety Features
+- **Protected services**: Never touches SSH, networking, systemd core
+- **Dry-run testing**: Tests NixOS rebuilds before applying
+- **Action logging**: Every action is logged with context
+- **Rollback capability**: Can revert changes
+- **Rate limiting**: Won't spam actions
+- **Human override**: You can always disable or intervene
+
+## Example Workflow
+
+1. **System detects failed service**
+   ```
+   Monitor: "ollama.service is failed"
+   AI Agent: "The ollama service crashed. Propose restarting it."
+   ```
+
+2. **In `suggest` mode (default)**
+   ```
+   Executor: "Action queued for approval"
+   You: Run `macha-approve list`
+   You: Review the proposed action
+   You: Run `macha-approve approve 0`
+   Executor: Restarts the service
+   ```
+
+3. **In `auto-safe` mode**
+   ```
+   Executor: "Low risk action, auto-executing"
+   Executor: Restarts the service automatically
+   You: Check logs later to see what happened
+   ```
+
+## Monitoring the System
+
+All data is stored in `/var/lib/macha-autonomous/`:
+- `orchestrator.log` - Main system log
+- `decisions.jsonl` - AI analysis decisions (JSON Lines format)
+- `actions.jsonl` - Executed actions log
+- `snapshot_*.json` - System state snapshots
+- `approval_queue.json` - Pending actions
+
+## Tips
+
+1. **Start with `suggest` mode** - Get comfortable with what it proposes
+2. **Review the logs** - See what it's detecting and proposing
+3. **Graduate to `auto-safe`** - Let it handle routine maintenance
+4. **Use `observe` for debugging** - If something seems wrong
+5. **Check approval queue regularly** - If using `suggest` mode
+
+## Troubleshooting
+
+### Service won't start
+```bash
+# Check for errors
+journalctl -u macha-autonomous -n 50
+
+# Verify Ollama is running
+systemctl status ollama
+
+# Test Ollama manually
+curl http://localhost:11434/api/generate -d '{"model": "llama3.1:70b", "prompt": "test"}'
+```
+
+### AI making bad decisions
+- Switch to `observe` mode to stop actions
+- Review `decisions.jsonl` to see reasoning
+- File an issue or adjust prompts in `agent.py`
+
+### Want to disable temporarily
+```bash
+sudo systemctl stop macha-autonomous
+```
+
+### Want to disable permanently
+Edit `systems/macha.nix`:
+```nix
+services.macha-autonomous.enable = false;
+```
+Then rebuild.
+
+## Architecture
+
+```
+┌─────────────────────────────────────────────────────────┐
+│                    Orchestrator                          │
+│         (Main loop, runs every 5 minutes)                │
+└────────────┬──────────────┬──────────────┬──────────────┘
+             │              │              │
+         ┌───▼────┐    ┌────▼────┐    ┌────▼─────┐
+         │Monitor │    │ Agent   │    │ Executor │
+         │        │───▶│  (AI)   │───▶│  (Safe)  │
+         └────────┘    └─────────┘    └──────────┘
+             │              │              │
+         Collects        Analyzes       Executes
+         System          Issues         Actions
+         Health          w/ LLM         Safely
+```
+
+## Future Enhancements
+
+Potential future capabilities:
+- Integration with MCP servers (already installed!)
+- Predictive maintenance (learning from patterns)
+- Self-optimization (tuning configs based on usage)
+- Cluster management (if you add more systems)
+- Automated backups and disaster recovery
+- Security monitoring and hardening
+- Performance tuning recommendations
+
+## Philosophy
+
+The goal is a system that maintains itself while being:
+1. **Safe** - Never breaks critical functionality
+2. **Transparent** - All decisions are logged and explainable
+3. **Conservative** - When in doubt, ask for approval
+4. **Learning** - Gets better over time
+5. **Human-friendly** - Easy to understand and override
+
+Macha is here to help you, not replace you!
--- a/README.md
+++ b/README.md
@@ -0,0 +1,93 @@
+# Macha - AI-Powered Autonomous System Administrator
+
+Macha is an AI-powered autonomous system administrator for NixOS that monitors system health, diagnoses issues, and can take corrective actions with appropriate approval workflows.
+
+## Features
+
+- **Autonomous Monitoring**: Continuous health checks with configurable intervals
+- **Multi-Host Management**: SSH-based management of multiple NixOS hosts
+- **Tool Calling**: Comprehensive system administration tools via Ollama LLM
+- **Queue-Based Architecture**: Serialized LLM requests to prevent resource contention
+- **Knowledge Base**: ChromaDB-backed learning system for operational wisdom
+- **Approval Workflows**: Safety-first approach with configurable autonomy levels
+- **Notification System**: Gotify integration for alerts
+
+## Quick Start
+
+### As a NixOS Flake Input
+
+Add to your `flake.nix`:
+
+```nix
+{
+  inputs = {
+    nixpkgs.url = "github:NixOS/nixpkgs/nixos-unstable";
+    macha-autonomous.url = "git+https://git.coven.systems/lily/macha-autonomous";
+  };
+
+  outputs = { self, nixpkgs, macha-autonomous }: {
+    nixosConfigurations.yourhost = nixpkgs.lib.nixosSystem {
+      modules = [
+        macha-autonomous.nixosModules.default
+        {
+          services.macha-autonomous = {
+            enable = true;
+            autonomyLevel = "suggest";  # observe, suggest, auto-safe, auto-full
+            checkInterval = 300;
+            ollamaHost = "http://localhost:11434";
+            model = "gpt-oss:latest";
+          };
+        }
+      ];
+    };
+  };
+}
+```
+
+## Configuration Options
+
+See `module.nix` for full configuration options including:
+- Autonomy levels (observe, suggest, auto-safe, auto-full)
+- Check intervals
+- Ollama host and model settings
+- Git repository monitoring
+- Service user/group configuration
+
+## CLI Tools
+
+- `macha-chat` - Interactive chat interface
+- `macha-ask` - Single-question interface
+- `macha-check` - Trigger immediate health check
+- `macha-approve` - Approve pending actions
+- `macha-logs` - View service logs
+- `macha-issues` - Query issue database
+- `macha-knowledge` - Query knowledge base
+- `macha-systems` - List managed systems
+- `macha-notify` - Send Gotify notification
+
+## Architecture
+
+- **Agent**: Core AI logic with tool calling
+- **Orchestrator**: Main monitoring loop
+- **Executor**: Safe action execution
+- **Queue System**: Serialized Ollama requests with priorities
+- **Context DB**: ChromaDB for system context and learning
+- **Tools**: System administration capabilities
+
+## Requirements
+
+- NixOS with flakes enabled
+- Ollama service running
+- Python 3 with requests, psutil, chromadb
+
+## Documentation
+
+See `DESIGN.md` for comprehensive architecture documentation.
+
+## License
+
+[Add your license here]
+
+## Author
+
+Lily Miller
--- a/SUMMARY.md
+++ b/SUMMARY.md
@@ -0,0 +1,317 @@
+# Macha Autonomous System - Implementation Summary
+
+## What We Built
+
+A complete self-maintaining system for Macha that uses local AI models (via Ollama) to monitor, analyze, and fix issues automatically. This is a production-ready implementation with safety mechanisms, audit trails, and multiple autonomy levels.
+
+## Components Created
+
+### 1. System Monitor (`monitor.py` - 310 lines)
+- Collects comprehensive system health data every cycle
+- Monitors: systemd services, resources (CPU/RAM), disk usage, logs, network, NixOS status
+- Saves snapshots for historical analysis
+- Generates human-readable summaries
+
+### 2. AI Agent (`agent.py` - 238 lines)
+- Analyzes system state using llama3.1:70b (or other models)
+- Detects issues and classifies severity
+- Proposes specific, actionable fixes
+- Logs all decisions for auditing
+- Uses structured JSON responses for reliability
+
+### 3. Safe Executor (`executor.py` - 371 lines)
+- Executes actions with safety checks
+- Protected services list (never touches SSH, networking, etc.)
+- Supports multiple action types:
+  - `systemd_restart` - Restart failed services
+  - `cleanup` - Disk/log cleanup
+  - `nix_rebuild` - NixOS configuration rebuilds
+  - `config_change` - Config file modifications
+  - `investigation` - Diagnostic commands
+- Approval queue for manual review
+- Complete action logging
+
+### 4. Orchestrator (`orchestrator.py` - 211 lines)
+- Main control loop
+- Coordinates monitor → agent → executor pipeline
+- Handles signals and graceful shutdown
+- Configuration management
+- Multiple run modes (once, continuous, daemon)
+
+### 5. NixOS Module (`module.nix` - 168 lines)
+- Full systemd service integration
+- Configuration options via NixOS
+- User/group management
+- Security hardening
+- CLI tools (`macha-check`, `macha-approve`, `macha-logs`)
+- Resource limits (1GB RAM, 50% CPU)
+
+### 6. Documentation
+- `README.md` - Architecture overview
+- `QUICKSTART.md` - User guide
+- `EXAMPLES.md` - Configuration examples
+- `SUMMARY.md` - This file
+
+**Total: ~1,400 lines of code**
+
+## Architecture
+
+```
+┌──────────────────────────────────────────────────────────────┐
+│                      NixOS Module                             │
+│  - Creates systemd service                                    │
+│  - Manages user/permissions                                   │
+│  - Provides CLI tools                                         │
+└───────────────────────┬──────────────────────────────────────┘
+                        │
+                        ▼
+┌──────────────────────────────────────────────────────────────┐
+│                    Orchestrator                               │
+│  - Runs main loop (every 5 minutes)                          │
+│  - Coordinates components                                     │
+│  - Handles errors and logging                                 │
+└───────┬──────────────┬──────────────┬──────────────┬─────────┘
+        │              │              │              │
+        ▼              ▼              ▼              ▼
+   ┌─────────┐   ┌──────────┐   ┌─────────┐   ┌──────────┐
+   │ Monitor │──▶│  Agent   │──▶│Executor │──▶│  Logs    │
+   │         │   │  (AI)    │   │ (Safe)  │   │          │
+   └─────────┘   └──────────┘   └─────────┘   └──────────┘
+        │              │              │              │
+        │              │              │              │
+   Collects        Analyzes       Executes        Records
+   System          with LLM       Actions         Everything
+   Health          (Ollama)       Safely
+```
+
+## Data Flow
+
+1. **Collection**: Monitor gathers system health data
+2. **Analysis**: Agent sends data + prompts to Ollama
+3. **Decision**: AI returns structured analysis (JSON)
+4. **Execution**: Executor checks permissions & autonomy level
+5. **Action**: Either executes or queues for approval
+6. **Logging**: All steps logged to JSONL files
+
+## Safety Mechanisms
+
+### Multi-Level Protection
+1. **Autonomy Levels**: observe → suggest → auto-safe → auto-full
+2. **Protected Services**: Hardcoded list of critical services
+3. **Dry-Run Testing**: NixOS rebuilds tested before applying
+4. **Approval Queue**: Manual review workflow
+5. **Action Logging**: Complete audit trail
+6. **Resource Limits**: systemd enforced (1GB RAM, 50% CPU)
+7. **Rollback Capability**: Can revert changes
+8. **Timeout Protection**: All operations have timeouts
+
+### What It Can Do Automatically (auto-safe)
+- ✅ Restart failed services (except protected ones)
+- ✅ Clean up disk space (nix-collect-garbage)
+- ✅ Rotate/clean logs
+- ✅ Run diagnostics
+- ❌ Modify configs (requires approval)
+- ❌ Rebuild NixOS (requires approval)
+- ❌ Touch protected services
+
+## Files Created
+
+```
+systems/macha-configs/autonomous/
+├── __init__.py           # Python package marker
+├── monitor.py            # System health monitoring
+├── agent.py              # AI analysis and reasoning  
+├── executor.py           # Safe action execution
+├── orchestrator.py       # Main control loop
+├── module.nix            # NixOS integration
+├── README.md             # Architecture docs
+├── QUICKSTART.md         # User guide
+├── EXAMPLES.md           # Configuration examples
+└── SUMMARY.md            # This file
+```
+
+## Integration Points
+
+### Modified Files
+- `systems/macha.nix` - Added autonomous module and configuration
+
+### Created Systemd Service
+- `macha-autonomous.service` - Main service
+- Runs continuously, checks every 5 minutes
+- Auto-starts on boot
+- Restart on failure
+
+### Created Users/Groups
+- `macha-autonomous` user (system user)
+- Limited sudo access for specific commands
+- Home: `/var/lib/macha-autonomous`
+
+### Created CLI Commands
+- `macha-check` - Run manual health check
+- `macha-approve list` - Show pending actions
+- `macha-approve approve <N>` - Approve action N
+- `macha-logs [orchestrator|decisions|actions|service]` - View logs
+
+### State Directory
+`/var/lib/macha-autonomous/` contains:
+- `orchestrator.log` - Main log
+- `decisions.jsonl` - AI analysis log
+- `actions.jsonl` - Executed actions log  
+- `snapshot_*.json` - System state snapshots
+- `approval_queue.json` - Pending actions
+- `suggested_patch_*.txt` - Config change suggestions
+
+## Configuration
+
+### Current Configuration (in systems/macha.nix)
+```nix
+services.macha-autonomous = {
+  enable = true;
+  autonomyLevel = "suggest";  # Requires approval
+  checkInterval = 300;        # 5 minutes
+  model = "llama3.1:70b";     # Most capable model
+};
+```
+
+### To Deploy
+```bash
+# Build and activate
+sudo nixos-rebuild switch --flake .#macha
+
+# Check status
+systemctl status macha-autonomous
+
+# View logs
+macha-logs service
+```
+
+## Usage Workflow
+
+### Day 1: Observation
+```bash
+# Just watch what it detects
+macha-logs decisions
+```
+
+### Day 2-7: Review Proposals
+```bash
+# Check what it wants to do
+macha-approve list
+
+# Approve good actions
+macha-approve approve 0
+```
+
+### Week 2+: Increase Autonomy
+```nix
+# Let it handle safe actions automatically
+services.macha-autonomous.autonomyLevel = "auto-safe";
+```
+
+### Monthly: Review Audit Logs
+```bash
+# See what it's been doing
+cat /var/lib/macha-autonomous/actions.jsonl | jq .
+```
+
+## Performance Characteristics
+
+### Resource Usage
+- **Idle**: ~100MB RAM
+- **Active (w/ llama3.1:70b)**: ~100MB + ~40GB model (shared with Ollama)
+- **CPU**: Limited to 50% by systemd
+- **Disk**: Minimal (logs rotate, snapshots limited to last 100)
+
+### Timing
+- **Monitor**: ~2 seconds
+- **AI Analysis**: ~30 seconds (70B model) to ~3 seconds (8B model)
+- **Execution**: Varies by action (seconds to minutes)
+- **Full Cycle**: ~1-2 minutes typically
+
+### Scalability
+- Can handle multiple issues per cycle
+- Queue system prevents action spam
+- Configurable check intervals
+- Model choice affects speed/quality tradeoff
+
+## Current Status
+
+✅ **READY TO USE** - All components implemented and integrated
+
+The system is:
+- ✅ Fully functional
+- ✅ Safety mechanisms in place
+- ✅ Well documented
+- ✅ Integrated into NixOS configuration
+- ✅ Ready for deployment
+
+Currently configured in **conservative mode** (`suggest`):
+- Monitors continuously
+- Analyzes with AI
+- Proposes actions
+- Waits for your approval
+
+## Next Steps
+
+1. **Deploy and test:**
+   ```bash
+   sudo nixos-rebuild switch --flake .#macha
+   ```
+
+2. **Monitor for a few days:**
+   ```bash
+   macha-logs service
+   ```
+
+3. **Review what it detects:**
+   ```bash
+   macha-approve list
+   cat /var/lib/macha-autonomous/decisions.jsonl | jq .
+   ```
+
+4. **Gradually increase autonomy as you gain confidence**
+
+## Future Enhancement Ideas
+
+### Short Term
+- Web dashboard for easier monitoring
+- Email/notification system for critical issues
+- More sophisticated action types
+- Historical trend analysis
+
+### Medium Term
+- Integration with MCP servers (already installed!)
+- Predictive maintenance using historical data
+- Self-tuning of check intervals based on activity
+- Multi-system orchestration (manage other NixOS hosts)
+
+### Long Term
+- Learning from past decisions to improve
+- A/B testing of configuration changes
+- Distributed consensus for multi-host decisions
+- Integration with external monitoring systems
+
+## Philosophy
+
+This implementation follows key principles:
+
+1. **Safety First**: Multiple layers of protection
+2. **Transparency**: Everything is logged and auditable
+3. **Conservative Default**: Start restricted, earn trust
+4. **Human in Loop**: Always allow override
+5. **Gradual Autonomy**: Progressive trust model
+6. **Local First**: No external dependencies
+7. **Declarative**: NixOS-native configuration
+
+## Conclusion
+
+Macha now has a sophisticated autonomous maintenance system that can:
+- Monitor itself 24/7
+- Detect and analyze issues using AI
+- Fix problems automatically (with appropriate safeguards)
+- Learn and improve over time
+- Maintain complete audit trails
+
+All powered by local AI models, no external dependencies, fully integrated with NixOS, and designed with safety as the top priority.
+
+**Welcome to the future of self-maintaining systems!** 🎉
--- a/init.py
+++ b/init.py
@@ -0,0 +1 @@
+# Macha Autonomous System Maintenance
--- a/agent.py
+++ b/agent.py
--- a/chat.py
+++ b/chat.py
@@ -0,0 +1,522 @@
+#!/usr/bin/env python3
+"""
+Interactive chat interface with Macha AI agent.
+Allows conversational interaction and directive execution.
+"""
+
+import json
+import os
+import subprocess
+import sys
+from datetime import datetime
+from pathlib import Path
+from typing import List, Dict, Any
+
+# Add parent directory to path for imports
+sys.path.insert(0, str(Path(__file__).parent))
+
+from agent import MachaAgent
+
+
+class MachaChatSession:
+    """Interactive chat session with Macha"""
+    
+    def __init__(self):
+        self.agent = MachaAgent(use_queue=True, priority="INTERACTIVE")
+        self.conversation_history: List[Dict[str, str]] = []
+        self.session_start = datetime.now().isoformat()
+        
+    def _create_chat_prompt(self, user_message: str) -> str:
+        """Create a prompt for the chat session"""
+        
+        # Build conversation context
+        context = ""
+        if self.conversation_history:
+            context = "\n\nCONVERSATION HISTORY:\n"
+            for entry in self.conversation_history[-10:]:  # Last 10 messages
+                role = entry['role'].upper()
+                msg = entry['message']
+                context += f"{role}: {msg}\n"
+        
+        prompt = f"""{MachaAgent.SYSTEM_PROMPT}
+
+TASK: INTERACTIVE CHAT SESSION
+
+You are in an interactive chat session with the system administrator.
+You can have a natural conversation and execute commands when directed.
+
+CAPABILITIES:
+- Answer questions about system status
+- Explain configurations and issues
+- Execute commands when explicitly asked
+- Provide guidance and recommendations
+
+COMMAND EXECUTION:
+When the user asks you to run a command or perform an action that requires execution:
+1. Respond with a JSON object containing the command to execute
+2. Format: {{"action": "execute", "command": "the command", "explanation": "why you're running it"}}
+3. After seeing the output, continue the conversation naturally
+
+RESPONSE FORMAT:
+- For normal conversation: Respond naturally in plain text
+- For command execution: Respond with JSON containing action/command/explanation
+- Keep responses concise but informative
+
+RULES:
+- Only execute commands when explicitly asked or when it's clearly needed
+- Explain what you're about to do before executing
+- Never execute destructive commands without explicit confirmation
+- If unsure, ask for clarification
+{context}
+
+USER: {user_message}
+
+MACHA:"""
+        
+        return prompt
+    
+    def _execute_command(self, command: str) -> Dict[str, Any]:
+        """Execute a shell command and return results"""
+        try:
+            result = subprocess.run(
+                command,
+                shell=True,
+                capture_output=True,
+                text=True,
+                timeout=30
+            )
+            
+            # Check if command failed due to permissions
+            needs_sudo = False
+            permission_errors = [
+                'Interactive authentication required',
+                'Permission denied',
+                'Operation not permitted',
+                'Must be root',
+                'insufficient privileges',
+                'authentication is required'
+            ]
+            
+            if result.returncode != 0:
+                error_text = (result.stderr + result.stdout).lower()
+                for perm_error in permission_errors:
+                    if perm_error.lower() in error_text:
+                        needs_sudo = True
+                        break
+            
+            # Retry with sudo if permission error detected
+            if needs_sudo and not command.strip().startswith('sudo'):
+                print(f"\n⚠️  Permission denied, retrying with sudo...")
+                sudo_command = f"sudo {command}"
+                result = subprocess.run(
+                    sudo_command,
+                    shell=True,
+                    capture_output=True,
+                    text=True,
+                    timeout=30
+                )
+                
+                return {
+                    'success': result.returncode == 0,
+                    'exit_code': result.returncode,
+                    'stdout': result.stdout,
+                    'stderr': result.stderr,
+                    'command': sudo_command,
+                    'retried_with_sudo': True
+                }
+            
+            return {
+                'success': result.returncode == 0,
+                'exit_code': result.returncode,
+                'stdout': result.stdout,
+                'stderr': result.stderr,
+                'command': command,
+                'retried_with_sudo': False
+            }
+        except subprocess.TimeoutExpired:
+            return {
+                'success': False,
+                'exit_code': -1,
+                'stdout': '',
+                'stderr': 'Command timed out after 30 seconds',
+                'command': command,
+                'retried_with_sudo': False
+            }
+        except Exception as e:
+            return {
+                'success': False,
+                'exit_code': -1,
+                'stdout': '',
+                'stderr': str(e),
+                'command': command,
+                'retried_with_sudo': False
+            }
+    
+    def _parse_response(self, response: str) -> Dict[str, Any]:
+        """Parse AI response to determine if it's a command or text"""
+        try:
+            # Try to parse as JSON
+            parsed = json.loads(response.strip())
+            if isinstance(parsed, dict) and 'action' in parsed:
+                return parsed
+        except json.JSONDecodeError:
+            pass
+        
+        # It's plain text conversation
+        return {'action': 'chat', 'message': response}
+    
+    def _auto_diagnose_ollama(self) -> str:
+        """Automatically diagnose Ollama issues"""
+        diagnostics = []
+        
+        diagnostics.append("🔍 AUTO-DIAGNOSIS: Investigating Ollama failure...\n")
+        
+        # Check if Ollama service is running
+        try:
+            result = subprocess.run(
+                ['systemctl', 'is-active', 'ollama.service'],
+                capture_output=True,
+                text=True,
+                timeout=5
+            )
+            if result.returncode == 0:
+                diagnostics.append("✅ Ollama service is active")
+            else:
+                diagnostics.append(f"❌ Ollama service is NOT active: {result.stdout.strip()}")
+                # Get service status
+                status_result = subprocess.run(
+                    ['systemctl', 'status', 'ollama.service', '--no-pager', '-l'],
+                    capture_output=True,
+                    text=True,
+                    timeout=5
+                )
+                diagnostics.append(f"\nService status:\n```\n{status_result.stdout[-500:]}\n```")
+        except Exception as e:
+            diagnostics.append(f"⚠️  Could not check service status: {e}")
+        
+        # Check memory usage
+        try:
+            result = subprocess.run(['free', '-h'], capture_output=True, text=True, timeout=5)
+            lines = result.stdout.split('\n')
+            for line in lines[:3]:  # First 3 lines
+                diagnostics.append(f"  {line}")
+        except Exception as e:
+            diagnostics.append(f"⚠️  Could not check memory: {e}")
+        
+        # Check which models are loaded
+        try:
+            import requests
+            response = requests.get(f"{self.agent.ollama_host}/api/tags", timeout=5)
+            if response.status_code == 200:
+                models = response.json().get('models', [])
+                diagnostics.append(f"\n📦 Loaded models ({len(models)}):")
+                for model in models:
+                    name = model.get('name', 'unknown')
+                    size = model.get('size', 0) / (1024**3)
+                    is_current = "← TARGET" if name == self.agent.model else ""
+                    diagnostics.append(f"  • {name} ({size:.1f} GB) {is_current}")
+                
+                # Check if target model is loaded
+                model_names = [m.get('name') for m in models]
+                if self.agent.model not in model_names:
+                    diagnostics.append(f"\n❌ TARGET MODEL NOT LOADED: {self.agent.model}")
+                    diagnostics.append(f"   Available models: {', '.join(model_names)}")
+            else:
+                diagnostics.append(f"❌ Ollama API returned {response.status_code}")
+        except Exception as e:
+            diagnostics.append(f"⚠️  Could not query Ollama API: {e}")
+        
+        # Check recent Ollama logs
+        try:
+            result = subprocess.run(
+                ['journalctl', '-u', 'ollama.service', '-n', '10', '--no-pager'],
+                capture_output=True,
+                text=True,
+                timeout=5
+            )
+            if result.stdout:
+                diagnostics.append(f"\n📋 Recent Ollama logs (last 10 lines):\n```\n{result.stdout}\n```")
+        except Exception as e:
+            diagnostics.append(f"⚠️  Could not check logs: {e}")
+        
+        return "\n".join(diagnostics)
+    
+    def process_message(self, user_message: str) -> str:
+        """Process a user message and return Macha's response"""
+        
+        # Add user message to history
+        self.conversation_history.append({
+            'role': 'user',
+            'message': user_message,
+            'timestamp': datetime.now().isoformat()
+        })
+        
+        # Build chat messages for tool-calling API
+        messages = []
+        
+        # Query relevant knowledge based on user message
+        knowledge_context = self.agent._query_relevant_knowledge(user_message, limit=3)
+        
+        # Add recent conversation history (last 15 messages to stay within context limits)
+        # With tool calling, messages grow quickly, so we limit more aggressively
+        recent_history = self.conversation_history[-15:]  # Last ~7 exchanges
+        for entry in recent_history:
+            content = entry['message']
+            # Truncate very long messages (e.g., command outputs)
+            if len(content) > 3000:
+                content = content[:1500] + "\n... [message truncated] ...\n" + content[-1500:]
+            # Add knowledge context to first user message if available
+            if entry == recent_history[-1] and knowledge_context:
+                content += knowledge_context
+            messages.append({
+                "role": entry['role'],
+                "content": content
+            })
+        
+        try:
+            # Use tool-aware chat API
+            ai_response = self.agent._query_ollama_with_tools(messages)
+        except Exception as e:
+            error_msg = (
+                f"❌ CRITICAL: Failed to communicate with Ollama inference engine\n\n"
+                f"Error Type: {type(e).__name__}\n"
+                f"Error Message: {str(e)}\n\n"
+            )
+            # Auto-diagnose the issue
+            diagnostics = self._auto_diagnose_ollama()
+            return error_msg + "\n" + diagnostics
+        
+        if not ai_response:
+            error_msg = (
+                f"❌ Empty response from Ollama inference engine\n\n"
+                f"The request succeeded but returned no data. This usually means:\n"
+                f"  • The model ({self.agent.model}) is still loading\n"
+                f"  • Ollama ran out of memory during generation\n"
+                f"  • The prompt was too large for the context window\n\n"
+            )
+            # Auto-diagnose the issue
+            diagnostics = self._auto_diagnose_ollama()
+            return error_msg + "\n" + diagnostics
+        
+        # Check if Ollama returned an error
+        try:
+            error_check = json.loads(ai_response)
+            if isinstance(error_check, dict) and 'error' in error_check:
+                error_msg = (
+                    f"❌ Ollama API Error\n\n"
+                    f"Error: {error_check.get('error', 'Unknown error')}\n"
+                    f"Diagnosis: {error_check.get('diagnosis', 'No details')}\n\n"
+                )
+                # Auto-diagnose the issue
+                diagnostics = self._auto_diagnose_ollama()
+                return error_msg + "\n" + diagnostics
+        except json.JSONDecodeError:
+            # Not JSON, it's a normal response
+            pass
+        
+        # Parse response
+        parsed = self._parse_response(ai_response)
+        
+        if parsed.get('action') == 'execute':
+            # AI wants to execute a command
+            command = parsed.get('command', '')
+            explanation = parsed.get('explanation', '')
+            
+            # Show what we're about to do
+            response = f"🔧 {explanation}\n\nExecuting: `{command}`\n\n"
+            
+            # Execute the command
+            result = self._execute_command(command)
+            
+            # Show if we retried with sudo
+            if result.get('retried_with_sudo'):
+                response += f"⚠️  Permission denied, retried as: `{result['command']}`\n\n"
+            
+            if result['success']:
+                response += "✅ Command succeeded:\n"
+                if result['stdout']:
+                    response += f"```\n{result['stdout']}\n```"
+                else:
+                    response += "(no output)"
+            else:
+                response += f"❌ Command failed (exit code {result['exit_code']}):\n"
+                if result['stderr']:
+                    response += f"```\n{result['stderr']}\n```"
+                elif result['stdout']:
+                    response += f"```\n{result['stdout']}\n```"
+            
+            # Add command execution to history
+            self.conversation_history.append({
+                'role': 'macha',
+                'message': response,
+                'timestamp': datetime.now().isoformat(),
+                'command_result': result
+            })
+            
+            # Now ask AI to respond to the command output
+            followup_prompt = f"""The command completed. Here's what happened:
+
+Command: {command}
+Success: {result['success']}
+Output: {result['stdout'][:500] if result['stdout'] else '(none)'}
+Error: {result['stderr'][:500] if result['stderr'] else '(none)'}
+
+Please provide a brief analysis or next steps."""
+            
+            followup_response = self.agent._query_ollama(followup_prompt)
+            
+            if followup_response:
+                response += f"\n\n{followup_response}"
+            
+            return response
+        
+        else:
+            # Normal conversation response
+            message = parsed.get('message', ai_response)
+            
+            self.conversation_history.append({
+                'role': 'macha',
+                'message': message,
+                'timestamp': datetime.now().isoformat()
+            })
+            
+            return message
+    
+    def run(self):
+        """Run the interactive chat session"""
+        print("=" * 70)
+        print("🌐 MACHA INTERACTIVE CHAT")
+        print("=" * 70)
+        print("Type your message and press Enter. Commands:")
+        print("  /exit or /quit - End the chat session")
+        print("  /clear - Clear conversation history")
+        print("  /history - Show conversation history")
+        print("  /debug - Show Ollama connection status")
+        print("=" * 70)
+        print()
+        
+        while True:
+            try:
+                # Get user input
+                user_input = input("\n💬 YOU: ").strip()
+                
+                if not user_input:
+                    continue
+                
+                # Handle special commands
+                if user_input.lower() in ['/exit', '/quit']:
+                    print("\n👋 Ending chat session. Goodbye!")
+                    break
+                
+                elif user_input.lower() == '/clear':
+                    self.conversation_history.clear()
+                    print("🧹 Conversation history cleared.")
+                    continue
+                
+                elif user_input.lower() == '/history':
+                    print("\n" + "=" * 70)
+                    print("CONVERSATION HISTORY")
+                    print("=" * 70)
+                    for entry in self.conversation_history:
+                        role = entry['role'].upper()
+                        msg = entry['message'][:100] + "..." if len(entry['message']) > 100 else entry['message']
+                        print(f"{role}: {msg}")
+                    print("=" * 70)
+                    continue
+                
+                elif user_input.lower() == '/debug':
+                    import os
+                    import subprocess
+                    
+                    print("\n" + "=" * 70)
+                    print("MACHA ARCHITECTURE & STATUS")
+                    print("=" * 70)
+                    
+                    print("\n🏗️  SYSTEM ARCHITECTURE:")
+                    print(f"  Hostname: macha.coven.systems")
+                    print(f"  Service: macha-autonomous.service (systemd)")
+                    print(f"  Working Directory: /var/lib/macha")
+                    
+                    print("\n👤 EXECUTION CONTEXT:")
+                    current_user = os.getenv('USER') or os.getenv('USERNAME') or 'unknown'
+                    print(f"  Current User: {current_user}")
+                    print(f"  UID: {os.getuid()}")
+                    
+                    # Check if user has sudo access
+                    try:
+                        result = subprocess.run(['sudo', '-n', 'true'], 
+                                              capture_output=True, timeout=1)
+                        if result.returncode == 0:
+                            print(f"  Sudo Access: ✓ Yes (passwordless)")
+                        else:
+                            print(f"  Sudo Access: ⚠ Requires password")
+                    except:
+                        print(f"  Sudo Access: ❌ No")
+                    
+                    print(f"  Note: Chat runs as invoking user (you), not as macha-autonomous")
+                    
+                    print("\n🧠 INFERENCE ENGINE:")
+                    print(f"  Backend: Ollama")
+                    print(f"  Host: {self.agent.ollama_host}")
+                    print(f"  Model: {self.agent.model}")
+                    print(f"  Service: ollama.service (systemd)")
+                    
+                    print("\n💾 DATABASE:")
+                    print(f"  Backend: ChromaDB")
+                    print(f"  Host: http://localhost:8000")
+                    print(f"  Data: /var/lib/chromadb")
+                    print(f"  Service: chromadb.service (systemd)")
+                    
+                    print("\n🔍 OLLAMA STATUS:")
+                    # Try to query Ollama status
+                    try:
+                        import requests
+                        # Check if Ollama is running
+                        response = requests.get(f"{self.agent.ollama_host}/api/tags", timeout=5)
+                        if response.status_code == 200:
+                            models = response.json().get('models', [])
+                            print(f"  Status: ✓ Running")
+                            print(f"  Loaded models: {len(models)}")
+                            for model in models:
+                                name = model.get('name', 'unknown')
+                                size = model.get('size', 0) / (1024**3)  # GB
+                                is_current = "← ACTIVE" if name == self.agent.model else ""
+                                print(f"    • {name} ({size:.1f} GB) {is_current}")
+                        else:
+                            print(f"  Status: ❌ Error (HTTP {response.status_code})")
+                    except Exception as e:
+                        print(f"  Status: ❌ Cannot connect: {e}")
+                        print(f"  Hint: Check 'systemctl status ollama.service'")
+                    
+                    print("\n💡 CONVERSATION:")
+                    print(f"  History: {len(self.conversation_history)} messages")
+                    print(f"  Session started: {self.session_start}")
+                    
+                    print("=" * 70)
+                    continue
+                
+                # Process the message
+                print("\n🤖 MACHA: ", end='', flush=True)
+                response = self.process_message(user_input)
+                print(response)
+                
+            except KeyboardInterrupt:
+                print("\n\n👋 Chat interrupted. Use /exit to quit properly.")
+                continue
+            except EOFError:
+                print("\n\n👋 Ending chat session. Goodbye!")
+                break
+            except Exception as e:
+                print(f"\n❌ Error: {e}")
+                continue
+
+
+def main():
+    """Main entry point"""
+    session = MachaChatSession()
+    session.run()
+
+
+if __name__ == "__main__":
+    main()
+
--- a/config_parser.py
+++ b/config_parser.py
@@ -0,0 +1,245 @@
+#!/usr/bin/env python3
+"""
+Config Parser - Extract imports and content from NixOS configuration files
+"""
+
+import re
+import subprocess
+from pathlib import Path
+from typing import List, Dict, Set, Optional
+from datetime import datetime
+
+
+class ConfigParser:
+    """Parse NixOS flake and configuration files"""
+    
+    def __init__(self, repo_url: str, local_path: Path = Path("/var/lib/macha/config-repo")):
+        """
+        Initialize config parser
+        
+        Args:
+            repo_url: Git repository URL (e.g., git+https://...)
+            local_path: Where to clone/update the repository
+        """
+        # Strip git+ prefix if present for git commands
+        self.repo_url = repo_url.replace("git+", "")
+        self.local_path = local_path
+        self.local_path.mkdir(parents=True, exist_ok=True)
+        
+    def ensure_repo(self) -> bool:
+        """Clone or update the repository"""
+        try:
+            if (self.local_path / ".git").exists():
+                # Update existing repo
+                result = subprocess.run(
+                    ["git", "-C", str(self.local_path), "pull"],
+                    capture_output=True,
+                    text=True,
+                    timeout=30
+                )
+                return result.returncode == 0
+            else:
+                # Clone new repo
+                result = subprocess.run(
+                    ["git", "clone", self.repo_url, str(self.local_path)],
+                    capture_output=True,
+                    text=True,
+                    timeout=60
+                )
+                return result.returncode == 0
+        except Exception as e:
+            print(f"Error updating repository: {e}")
+            return False
+    
+    def get_systems_from_flake(self) -> List[str]:
+        """Extract system names from flake.nix"""
+        flake_path = self.local_path / "flake.nix"
+        if not flake_path.exists():
+            return []
+        
+        systems = []
+        try:
+            content = flake_path.read_text()
+            # Match patterns like: "macha" = nixpkgs.lib.nixosSystem
+            matches = re.findall(r'"([^"]+)"\s*=\s*nixpkgs\.lib\.nixosSystem', content)
+            systems = matches
+        except Exception as e:
+            print(f"Error parsing flake.nix: {e}")
+        
+        return systems
+    
+    def extract_imports(self, nix_file: Path) -> List[str]:
+        """Extract imports from a .nix file"""
+        if not nix_file.exists():
+            return []
+        
+        imports = []
+        try:
+            content = nix_file.read_text()
+            
+            # Find the imports = [ ... ]; block
+            imports_match = re.search(
+                r'imports\s*=\s*\[(.*?)\];',
+                content,
+                re.DOTALL
+            )
+            
+            if imports_match:
+                imports_block = imports_match.group(1)
+                # Extract all paths (relative paths starting with ./ or ../)
+                paths = re.findall(r'[./]+[^\s\]]+\.nix', imports_block)
+                imports = paths
+                
+        except Exception as e:
+            print(f"Error parsing {nix_file}: {e}")
+        
+        return imports
+    
+    def resolve_import_path(self, base_file: Path, import_path: str) -> Optional[Path]:
+        """Resolve a relative import path to absolute path within repo"""
+        try:
+            # Get directory of the base file
+            base_dir = base_file.parent
+            # Resolve the relative path
+            resolved = (base_dir / import_path).resolve()
+            # Make sure it's within the repo
+            if self.local_path in resolved.parents or resolved == self.local_path:
+                return resolved
+        except Exception as e:
+            print(f"Error resolving import {import_path} from {base_file}: {e}")
+        return None
+    
+    def get_system_config(self, system_name: str) -> Dict[str, any]:
+        """
+        Get configuration for a specific system
+        
+        Returns:
+            Dict with:
+            - main_file: Path to systems/<name>.nix
+            - imports: List of imported file paths (relative to repo root)
+            - all_files: Set of all .nix files used (including recursive imports)
+        """
+        main_file = self.local_path / "systems" / f"{system_name}.nix"
+        
+        if not main_file.exists():
+            return {
+                "main_file": None,
+                "imports": [],
+                "all_files": set()
+            }
+        
+        # Track all files (avoid infinite loops)
+        all_files = set()
+        files_to_process = [main_file]
+        processed = set()
+        
+        while files_to_process:
+            current_file = files_to_process.pop(0)
+            
+            if current_file in processed:
+                continue
+            processed.add(current_file)
+            
+            # Get relative path from repo root
+            try:
+                rel_path = current_file.relative_to(self.local_path)
+                all_files.add(str(rel_path))
+            except ValueError:
+                continue
+            
+            # Extract imports from this file
+            imports = self.extract_imports(current_file)
+            
+            # Resolve and queue imported files
+            for imp in imports:
+                resolved = self.resolve_import_path(current_file, imp)
+                if resolved and resolved not in processed:
+                    files_to_process.append(resolved)
+        
+        return {
+            "main_file": str(main_file.relative_to(self.local_path)),
+            "imports": self.extract_imports(main_file),
+            "all_files": sorted(all_files)
+        }
+    
+    def read_file_content(self, relative_path: str) -> Optional[str]:
+        """Read content of a file by its path relative to repo root"""
+        try:
+            file_path = self.local_path / relative_path
+            if file_path.exists():
+                return file_path.read_text()
+        except Exception as e:
+            print(f"Error reading {relative_path}: {e}")
+        return None
+    
+    def get_all_config_files(self) -> List[Dict[str, str]]:
+        """
+        Get all .nix files in the repository with their content
+        
+        Returns:
+            List of dicts with:
+            - path: relative path from repo root
+            - content: file contents
+            - category: apps/systems/osconfigs/users based on path
+        """
+        files = []
+        
+        # Categories to scan
+        categories = {
+            "apps": self.local_path / "apps",
+            "systems": self.local_path / "systems",
+            "osconfigs": self.local_path / "osconfigs",
+            "users": self.local_path / "users"
+        }
+        
+        for category, path in categories.items():
+            if not path.exists():
+                continue
+            
+            for nix_file in path.rglob("*.nix"):
+                try:
+                    rel_path = nix_file.relative_to(self.local_path)
+                    content = nix_file.read_text()
+                    
+                    files.append({
+                        "path": str(rel_path),
+                        "content": content,
+                        "category": category
+                    })
+                except Exception as e:
+                    print(f"Error reading {nix_file}: {e}")
+        
+        return files
+
+
+if __name__ == "__main__":
+    # Test the parser
+    import sys
+    
+    repo_url = "git+https://git.coven.systems/lily/nixos-servers"
+    parser = ConfigParser(repo_url)
+    
+    print("Ensuring repository is up to date...")
+    if parser.ensure_repo():
+        print("✓ Repository ready")
+    else:
+        print("✗ Failed to update repository")
+        sys.exit(1)
+    
+    print("\nSystems defined in flake:")
+    systems = parser.get_systems_from_flake()
+    for system in systems:
+        print(f"  - {system}")
+    
+    if len(sys.argv) > 1:
+        system_name = sys.argv[1]
+        print(f"\nConfiguration for {system_name}:")
+        config = parser.get_system_config(system_name)
+        
+        print(f"  Main file: {config['main_file']}")
+        print(f"  Direct imports: {len(config['imports'])}")
+        print(f"  All files used: {len(config['all_files'])}")
+        
+        for f in config['all_files']:
+            print(f"    - {f}")
+
--- a/context_db.py
+++ b/context_db.py
@@ -0,0 +1,947 @@
+#!/usr/bin/env python3
+"""
+Context Database - Store and retrieve system context using ChromaDB for RAG
+"""
+
+import json
+import os
+from typing import Dict, List, Any, Optional, Set
+from datetime import datetime
+from pathlib import Path
+
+# Set environment variable BEFORE importing chromadb to prevent .env file reading
+os.environ.setdefault("CHROMA_ENV_FILE", "")
+
+import chromadb
+from chromadb.config import Settings
+
+
+class ContextDatabase:
+    """Manage system context and relationships in ChromaDB"""
+    
+    def __init__(
+        self,
+        host: str = "localhost",
+        port: int = 8000,
+        persist_directory: str = "/var/lib/chromadb"
+    ):
+        """Initialize ChromaDB client"""
+        
+        self.client = chromadb.HttpClient(
+            host=host,
+            port=port,
+            settings=Settings(
+                anonymized_telemetry=False,
+                allow_reset=False,
+                chroma_api_impl="chromadb.api.fastapi.FastAPI"
+            )
+        )
+        
+        # Create or get collections
+        self.systems_collection = self.client.get_or_create_collection(
+            name="systems",
+            metadata={"description": "System definitions and metadata"}
+        )
+        
+        self.relationships_collection = self.client.get_or_create_collection(
+            name="relationships",
+            metadata={"description": "System relationships and dependencies"}
+        )
+        
+        self.issues_collection = self.client.get_or_create_collection(
+            name="issues",
+            metadata={"description": "Issue tracking and resolution history"}
+        )
+        
+        self.decisions_collection = self.client.get_or_create_collection(
+            name="decisions",
+            metadata={"description": "AI decisions and outcomes"}
+        )
+        
+        self.config_files_collection = self.client.get_or_create_collection(
+            name="config_files",
+            metadata={"description": "NixOS configuration files for RAG"}
+        )
+        
+        self.knowledge_collection = self.client.get_or_create_collection(
+            name="knowledge",
+            metadata={"description": "Operational knowledge: commands, patterns, best practices"}
+        )
+    
+    # ============ System Registry ============
+    
+    def register_system(
+        self,
+        hostname: str,
+        system_type: str,
+        services: List[str],
+        capabilities: List[str] = None,
+        metadata: Dict[str, Any] = None,
+        config_repo: str = None,
+        config_branch: str = None,
+        os_type: str = "nixos"
+    ):
+        """Register a system in the database
+        
+        Args:
+            hostname: FQDN of the system
+            system_type: Role (e.g., 'workstation', 'server')
+            services: List of running services
+            capabilities: System capabilities
+            metadata: Additional metadata
+            config_repo: Git repository URL
+            config_branch: Git branch name
+            os_type: Operating system (e.g., 'nixos', 'ubuntu', 'debian', 'arch', 'windows', 'macos')
+        """
+        doc_parts = [
+            f"System: {hostname}",
+            f"Type: {system_type}",
+            f"OS: {os_type}",
+            f"Services: {', '.join(services)}",
+            f"Capabilities: {', '.join(capabilities or [])}"
+        ]
+        
+        if config_repo:
+            doc_parts.append(f"Configuration Repository: {config_repo}")
+        if config_branch:
+            doc_parts.append(f"Configuration Branch: {config_branch}")
+        
+        doc = "\n".join(doc_parts)
+        
+        metadata_dict = {
+            "hostname": hostname,
+            "type": system_type,
+            "os_type": os_type,
+            "services": json.dumps(services),
+            "capabilities": json.dumps(capabilities or []),
+            "metadata": json.dumps(metadata or {}),
+            "config_repo": config_repo or "",
+            "config_branch": config_branch or "",
+            "updated_at": datetime.now().isoformat()
+        }
+        
+        self.systems_collection.upsert(
+            ids=[hostname],
+            documents=[doc],
+            metadatas=[metadata_dict]
+        )
+    
+    def get_system(self, hostname: str) -> Optional[Dict[str, Any]]:
+        """Get system information"""
+        try:
+            result = self.systems_collection.get(
+                ids=[hostname],
+                include=["metadatas", "documents"]
+            )
+            
+            if result['ids']:
+                metadata = result['metadatas'][0]
+                return {
+                    "hostname": metadata["hostname"],
+                    "type": metadata["type"],
+                    "services": json.loads(metadata["services"]),
+                    "capabilities": json.loads(metadata["capabilities"]),
+                    "metadata": json.loads(metadata["metadata"]),
+                    "document": result['documents'][0]
+                }
+        except:
+            pass
+        
+        return None
+    
+    def get_all_systems(self) -> List[Dict[str, Any]]:
+        """Get all registered systems"""
+        result = self.systems_collection.get(include=["metadatas"])
+        
+        systems = []
+        for metadata in result['metadatas']:
+            systems.append({
+                "hostname": metadata["hostname"],
+                "type": metadata["type"],
+                "os_type": metadata.get("os_type", "unknown"),
+                "services": json.loads(metadata["services"]),
+                "capabilities": json.loads(metadata["capabilities"]),
+                "config_repo": metadata.get("config_repo", ""),
+                "config_branch": metadata.get("config_branch", "")
+            })
+        
+        return systems
+    
+    def is_system_known(self, hostname: str) -> bool:
+        """Check if a system is already registered"""
+        try:
+            result = self.systems_collection.get(ids=[hostname])
+            return len(result['ids']) > 0
+        except:
+            return False
+    
+    def get_known_hostnames(self) -> Set[str]:
+        """Get set of all known system hostnames"""
+        result = self.systems_collection.get(include=["metadatas"])
+        return set(metadata["hostname"] for metadata in result['metadatas'])
+    
+    # ============ Relationships ============
+    
+    def add_relationship(
+        self,
+        source: str,
+        target: str,
+        relationship_type: str,
+        description: str = ""
+    ):
+        """Add a relationship between systems"""
+        rel_id = f"{source}→{target}:{relationship_type}"
+        doc = f"{source} {relationship_type} {target}. {description}"
+        
+        self.relationships_collection.upsert(
+            ids=[rel_id],
+            documents=[doc],
+            metadatas=[{
+                "source": source,
+                "target": target,
+                "type": relationship_type,
+                "description": description,
+                "created_at": datetime.now().isoformat()
+            }]
+        )
+    
+    def get_dependencies(self, hostname: str) -> List[Dict[str, Any]]:
+        """Get what a system depends on"""
+        result = self.relationships_collection.get(
+            where={"source": hostname},
+            include=["metadatas"]
+        )
+        
+        return [
+            {
+                "target": m["target"],
+                "type": m["type"],
+                "description": m.get("description", "")
+            }
+            for m in result['metadatas']
+        ]
+    
+    def get_dependents(self, hostname: str) -> List[Dict[str, Any]]:
+        """Get what depends on a system"""
+        result = self.relationships_collection.get(
+            where={"target": hostname},
+            include=["metadatas"]
+        )
+        
+        return [
+            {
+                "source": m["source"],
+                "type": m["type"],
+                "description": m.get("description", "")
+            }
+            for m in result['metadatas']
+        ]
+    
+    # ============ Issue History ============
+    
+    def store_issue(
+        self,
+        system: str,
+        issue_description: str,
+        resolution: str = "",
+        severity: str = "unknown",
+        metadata: Dict[str, Any] = None
+    ) -> str:
+        """Store an issue and its resolution"""
+        issue_id = f"{system}_{datetime.now().timestamp()}"
+        
+        doc = f"""
+System: {system}
+Issue: {issue_description}
+Resolution: {resolution}
+Severity: {severity}
+"""
+        
+        self.issues_collection.add(
+            ids=[issue_id],
+            documents=[doc],
+            metadatas=[{
+                "system": system,
+                "severity": severity,
+                "resolved": bool(resolution),
+                "timestamp": datetime.now().isoformat(),
+                "metadata": json.dumps(metadata or {})
+            }]
+        )
+        
+        return issue_id
+    
+    def store_investigation(
+        self,
+        system: str,
+        issue_description: str,
+        commands: List[str],
+        output: str,
+        timestamp: str = None
+    ) -> str:
+        """Store investigation results for an issue"""
+        if timestamp is None:
+            timestamp = datetime.now().isoformat()
+        
+        investigation_id = f"investigation_{system}_{datetime.now().timestamp()}"
+        
+        doc = f"""
+System: {system}
+Issue: {issue_description}
+Commands executed: {', '.join(commands)}
+Output:
+{output[:2000]}  # Limit output to prevent token overflow
+"""
+        
+        self.issues_collection.add(
+            ids=[investigation_id],
+            documents=[doc],
+            metadatas=[{
+                "system": system,
+                "issue": issue_description,
+                "type": "investigation",
+                "commands": json.dumps(commands),
+                "timestamp": timestamp,
+                "metadata": json.dumps({"output_length": len(output)})
+            }]
+        )
+        
+        return investigation_id
+    
+    def get_recent_investigations(
+        self,
+        issue_description: str,
+        system: str,
+        hours: int = 24
+    ) -> List[Dict[str, Any]]:
+        """Get recent investigations for a similar issue"""
+        # Query for similar issues
+        try:
+            result = self.issues_collection.query(
+                query_texts=[f"System: {system}\nIssue: {issue_description}"],
+                n_results=10,
+                where={"type": "investigation"},
+                include=["documents", "metadatas", "distances"]
+            )
+            
+            investigations = []
+            if result['ids'] and result['ids'][0]:
+                cutoff_time = datetime.now().timestamp() - (hours * 3600)
+                
+                for i, doc_id in enumerate(result['ids'][0]):
+                    meta = result['metadatas'][0][i]
+                    timestamp = datetime.fromisoformat(meta['timestamp'])
+                    
+                    # Only include recent investigations
+                    if timestamp.timestamp() > cutoff_time:
+                        investigations.append({
+                            "id": doc_id,
+                            "system": meta['system'],
+                            "issue": meta['issue'],
+                            "commands": json.loads(meta['commands']),
+                            "output": result['documents'][0][i],
+                            "timestamp": meta['timestamp'],
+                            "relevance": 1 - result['distances'][0][i]
+                        })
+            
+            return investigations
+        except Exception as e:
+            print(f"Error querying investigations: {e}")
+            return []
+    
+    def find_similar_issues(
+        self,
+        issue_description: str,
+        system: Optional[str] = None,
+        n_results: int = 5
+    ) -> List[Dict[str, Any]]:
+        """Find similar past issues using semantic search"""
+        where = {"system": system} if system else None
+        
+        results = self.issues_collection.query(
+            query_texts=[issue_description],
+            n_results=n_results,
+            where=where,
+            include=["documents", "metadatas", "distances"]
+        )
+        
+        similar = []
+        for i, doc in enumerate(results['documents'][0]):
+            similar.append({
+                "issue": doc,
+                "metadata": results['metadatas'][0][i],
+                "similarity": 1 - results['distances'][0][i]  # Convert distance to similarity
+            })
+        
+        return similar
+    
+    # ============ AI Decisions ============
+    
+    def store_decision(
+        self,
+        system: str,
+        analysis: Dict[str, Any],
+        action: Dict[str, Any],
+        outcome: Dict[str, Any] = None
+    ):
+        """Store an AI decision for learning"""
+        decision_id = f"decision_{datetime.now().timestamp()}"
+        
+        doc = f"""
+System: {system}
+Status: {analysis.get('status', 'unknown')}
+Assessment: {analysis.get('overall_assessment', '')}
+Action: {action.get('proposed_action', '')}
+Risk: {action.get('risk_level', 'unknown')}
+Outcome: {outcome.get('status', 'pending') if outcome else 'pending'}
+"""
+        
+        self.decisions_collection.add(
+            ids=[decision_id],
+            documents=[doc],
+            metadatas=[{
+                "system": system,
+                "timestamp": datetime.now().isoformat(),
+                "analysis": json.dumps(analysis),
+                "action": json.dumps(action),
+                "outcome": json.dumps(outcome or {})
+            }]
+        )
+    
+    def get_recent_decisions(
+        self,
+        system: Optional[str] = None,
+        n_results: int = 10
+    ) -> List[Dict[str, Any]]:
+        """Get recent decisions, optionally filtered by system"""
+        where = {"system": system} if system else None
+        
+        results = self.decisions_collection.query(
+            query_texts=["recent decisions"],
+            n_results=n_results,
+            where=where,
+            include=["documents", "metadatas"]
+        )
+        
+        decisions = []
+        for i, doc in enumerate(results['documents'][0]):
+            meta = results['metadatas'][0][i]
+            decisions.append({
+                "system": meta["system"],
+                "timestamp": meta["timestamp"],
+                "analysis": json.loads(meta["analysis"]),
+                "action": json.loads(meta["action"]),
+                "outcome": json.loads(meta["outcome"])
+            })
+        
+        return decisions
+    
+    # ============ Context Generation for AI ============
+    
+    def get_system_context(self, hostname: str, git_context=None) -> str:
+        """Generate rich context about a system for AI prompts"""
+        context_parts = []
+        
+        # System info
+        system = self.get_system(hostname)
+        if system:
+            context_parts.append(f"System: {hostname} ({system['type']})")
+            context_parts.append(f"Services: {', '.join(system['services'])}")
+            if system['capabilities']:
+                context_parts.append(f"Capabilities: {', '.join(system['capabilities'])}")
+        
+        # Git repository info
+        if system and system.get('metadata'):
+            metadata = json.loads(system['metadata']) if isinstance(system['metadata'], str) else system['metadata']
+            config_repo = metadata.get('config_repo', '')
+            if config_repo:
+                context_parts.append(f"\nConfiguration Repository: {config_repo}")
+        
+        # Recent git changes for this system
+        if git_context:
+            try:
+                # Extract system name from FQDN
+                system_name = hostname.split('.')[0]
+                git_summary = git_context.get_system_context_summary(system_name)
+                if git_summary:
+                    context_parts.append(f"\n{git_summary}")
+            except:
+                pass
+        
+        # Dependencies
+        deps = self.get_dependencies(hostname)
+        if deps:
+            context_parts.append("\nDependencies:")
+            for dep in deps:
+                context_parts.append(f"  - Depends on {dep['target']} for {dep['type']}")
+        
+        # Dependents
+        dependents = self.get_dependents(hostname)
+        if dependents:
+            context_parts.append("\nUsed by:")
+            for dependent in dependents:
+                context_parts.append(f"  - {dependent['source']} uses this for {dependent['type']}")
+        
+        return "\n".join(context_parts)
+    
+    def get_issue_context(self, issue_description: str, system: str) -> str:
+        """Get context about similar past issues"""
+        similar = self.find_similar_issues(issue_description, system, n_results=3)
+        
+        if not similar:
+            return ""
+        
+        context_parts = ["Similar past issues:"]
+        for i, issue in enumerate(similar, 1):
+            if issue['similarity'] > 0.7:  # Only include if fairly similar
+                context_parts.append(f"\n{i}. {issue['issue']}")
+                context_parts.append(f"   Similarity: {issue['similarity']:.2%}")
+        
+        return "\n".join(context_parts) if len(context_parts) > 1 else ""
+    
+    # ============ Config Files (for RAG) ============
+    
+    def store_config_file(
+        self,
+        file_path: str,
+        content: str,
+        category: str = "unknown",
+        systems_using: List[str] = None
+    ):
+        """
+        Store a configuration file for RAG retrieval
+        
+        Args:
+            file_path: Path relative to repo root (e.g., "apps/gotify.nix")
+            content: Full file contents
+            category: apps/systems/osconfigs/users
+            systems_using: List of system hostnames that import this file
+        """
+        self.config_files_collection.upsert(
+            ids=[file_path],
+            documents=[content],
+            metadatas=[{
+                "path": file_path,
+                "category": category,
+                "systems": json.dumps(systems_using or []),
+                "updated_at": datetime.now().isoformat()
+            }]
+        )
+    
+    def get_config_file(self, file_path: str) -> Optional[Dict[str, Any]]:
+        """Get a specific config file by path"""
+        try:
+            result = self.config_files_collection.get(
+                ids=[file_path],
+                include=["documents", "metadatas"]
+            )
+            
+            if result['ids']:
+                return {
+                    "path": file_path,
+                    "content": result['documents'][0],
+                    "metadata": result['metadatas'][0]
+                }
+        except:
+            pass
+        return None
+    
+    def query_config_files(
+        self,
+        query: str,
+        system: str = None,
+        category: str = None,
+        n_results: int = 5
+    ) -> List[Dict[str, Any]]:
+        """
+        Query config files using semantic search
+        
+        Args:
+            query: Natural language query (e.g., "gotify configuration")
+            system: Optional filter by system hostname
+            category: Optional filter by category (apps/systems/etc)
+            n_results: Number of results to return
+            
+        Returns:
+            List of dicts with path, content, and metadata
+        """
+        where = {}
+        if category:
+            where["category"] = category
+        
+        try:
+            result = self.config_files_collection.query(
+                query_texts=[query],
+                n_results=n_results,
+                where=where if where else None,
+                include=["documents", "metadatas", "distances"]
+            )
+            
+            configs = []
+            if result['ids'] and result['ids'][0]:
+                for i, doc_id in enumerate(result['ids'][0]):
+                    config = {
+                        "path": doc_id,
+                        "content": result['documents'][0][i],
+                        "metadata": result['metadatas'][0][i],
+                        "relevance": 1 - result['distances'][0][i]  # Convert distance to relevance
+                    }
+                    
+                    # Filter by system if specified
+                    if system:
+                        systems = json.loads(config['metadata'].get('systems', '[]'))
+                        if system not in systems:
+                            continue
+                    
+                    configs.append(config)
+            
+            return configs
+        except Exception as e:
+            print(f"Error querying config files: {e}")
+            return []
+    
+    def get_system_config_files(self, system: str) -> List[str]:
+        """Get all config file paths used by a system"""
+        # This is stored in the system's metadata now
+        system_info = self.get_system(system)
+        if system_info and 'config_files' in system_info.get('metadata', {}):
+            # metadata is already a dict, config_files is already a list
+            return system_info['metadata']['config_files']
+        return []
+    
+    def update_system_config_files(self, system: str, config_files: List[str]):
+        """Update the list of config files used by a system"""
+        system_info = self.get_system(system)
+        if system_info:
+            # metadata is already a dict from get_system(), no need to json.loads()
+            metadata = system_info.get('metadata', {})
+            metadata['config_files'] = config_files
+            metadata['config_updated_at'] = datetime.now().isoformat()
+            
+            # Re-register with updated metadata
+            self.register_system(
+                hostname=system,
+                system_type=system_info['type'],
+                services=system_info['services'],
+                capabilities=system_info.get('capabilities', []),
+                metadata=metadata,
+                config_repo=system_info.get('config_repo'),
+                config_branch=system_info.get('config_branch')
+            )
+    
+    # =========================================================================
+    # ISSUE TRACKING
+    # =========================================================================
+    
+    def store_issue(self, issue: Dict[str, Any]):
+        """Store a new issue in the database"""
+        issue_id = issue['issue_id']
+        
+        # Store in ChromaDB with the issue as document
+        self.issues_collection.add(
+            documents=[json.dumps(issue)],
+            metadatas=[{
+                'issue_id': issue_id,
+                'hostname': issue['hostname'],
+                'title': issue['title'],
+                'status': issue['status'],
+                'severity': issue['severity'],
+                'created_at': issue['created_at'],
+                'source': issue['source']
+            }],
+            ids=[issue_id]
+        )
+    
+    def get_issue(self, issue_id: str) -> Optional[Dict[str, Any]]:
+        """Retrieve an issue by ID"""
+        try:
+            results = self.issues_collection.get(ids=[issue_id])
+            if results['documents']:
+                return json.loads(results['documents'][0])
+            return None
+        except Exception as e:
+            print(f"Error retrieving issue {issue_id}: {e}")
+            return None
+    
+    def update_issue(self, issue: Dict[str, Any]):
+        """Update an existing issue"""
+        issue_id = issue['issue_id']
+        
+        # Delete old version
+        try:
+            self.issues_collection.delete(ids=[issue_id])
+        except:
+            pass
+        
+        # Store updated version
+        self.store_issue(issue)
+    
+    def delete_issue(self, issue_id: str):
+        """Remove an issue from the database (used when archiving)"""
+        try:
+            self.issues_collection.delete(ids=[issue_id])
+        except Exception as e:
+            print(f"Error deleting issue {issue_id}: {e}")
+    
+    def list_issues(
+        self,
+        hostname: Optional[str] = None,
+        status: Optional[str] = None,
+        severity: Optional[str] = None
+    ) -> List[Dict[str, Any]]:
+        """List issues with optional filters"""
+        try:
+            # Build query filter
+            where_filter = {}
+            if hostname:
+                where_filter['hostname'] = hostname
+            if status:
+                where_filter['status'] = status
+            if severity:
+                where_filter['severity'] = severity
+            
+            if where_filter:
+                results = self.issues_collection.get(where=where_filter)
+            else:
+                results = self.issues_collection.get()
+            
+            issues = []
+            for doc in results['documents']:
+                issues.append(json.loads(doc))
+            
+            # Sort by created_at descending
+            issues.sort(key=lambda x: x.get('created_at', ''), reverse=True)
+            
+            return issues
+        except Exception as e:
+            print(f"Error listing issues: {e}")
+            return []
+    
+    # ============ Knowledge Base ============
+    
+    def store_knowledge(
+        self,
+        topic: str,
+        knowledge: str,
+        category: str = "general",
+        source: str = "experience",
+        confidence: str = "medium",
+        tags: list = None
+    ) -> str:
+        """
+        Store a piece of operational knowledge
+        
+        Args:
+            topic: Main subject (e.g., "nh os switch", "systemd-journal-remote")
+            knowledge: The actual knowledge/insight/pattern
+            category: Type of knowledge (command, pattern, troubleshooting, performance, etc.)
+            source: Where this came from (experience, documentation, user-provided)
+            confidence: How confident we are (low, medium, high)
+            tags: Optional tags for categorization
+        
+        Returns:
+            Knowledge ID
+        """
+        import uuid
+        from datetime import datetime
+        
+        knowledge_id = str(uuid.uuid4())
+        
+        knowledge_doc = {
+            "id": knowledge_id,
+            "topic": topic,
+            "knowledge": knowledge,
+            "category": category,
+            "source": source,
+            "confidence": confidence,
+            "tags": tags or [],
+            "created_at": datetime.utcnow().isoformat(),
+            "last_verified": datetime.utcnow().isoformat(),
+            "times_referenced": 0
+        }
+        
+        try:
+            self.knowledge_collection.add(
+                ids=[knowledge_id],
+                documents=[knowledge],
+                metadatas=[{
+                    "topic": topic,
+                    "category": category,
+                    "source": source,
+                    "confidence": confidence,
+                    "tags": json.dumps(tags or []),
+                    "created_at": knowledge_doc["created_at"],
+                    "full_doc": json.dumps(knowledge_doc)
+                }]
+            )
+            return knowledge_id
+        except Exception as e:
+            print(f"Error storing knowledge: {e}")
+            return None
+    
+    def query_knowledge(
+        self,
+        query: str,
+        category: str = None,
+        limit: int = 5
+    ) -> list:
+        """
+        Query the knowledge base for relevant information
+        
+        Args:
+            query: What to search for
+            category: Optional category filter
+            limit: Maximum results to return
+        
+        Returns:
+            List of relevant knowledge entries
+        """
+        try:
+            where_filter = {}
+            if category:
+                where_filter["category"] = category
+            
+            results = self.knowledge_collection.query(
+                query_texts=[query],
+                n_results=limit,
+                where=where_filter if where_filter else None
+            )
+            
+            knowledge_items = []
+            if results and results['documents']:
+                for i, doc in enumerate(results['documents'][0]):
+                    metadata = results['metadatas'][0][i]
+                    full_doc = json.loads(metadata.get('full_doc', '{}'))
+                    
+                    # Increment reference count
+                    full_doc['times_referenced'] = full_doc.get('times_referenced', 0) + 1
+                    
+                    knowledge_items.append(full_doc)
+            
+            return knowledge_items
+        except Exception as e:
+            print(f"Error querying knowledge: {e}")
+            return []
+    
+    def get_knowledge_by_topic(self, topic: str) -> list:
+        """Get all knowledge entries for a specific topic"""
+        try:
+            results = self.knowledge_collection.get(
+                where={"topic": topic}
+            )
+            
+            knowledge_items = []
+            for metadata in results['metadatas']:
+                full_doc = json.loads(metadata.get('full_doc', '{}'))
+                knowledge_items.append(full_doc)
+            
+            return knowledge_items
+        except Exception as e:
+            print(f"Error getting knowledge by topic: {e}")
+            return []
+    
+    def update_knowledge(
+        self,
+        knowledge_id: str,
+        knowledge: str = None,
+        confidence: str = None,
+        verify: bool = False
+    ):
+        """
+        Update an existing knowledge entry
+        
+        Args:
+            knowledge_id: ID of knowledge to update
+            knowledge: New knowledge text (optional)
+            confidence: New confidence level (optional)
+            verify: Mark as verified (updates last_verified timestamp)
+        """
+        from datetime import datetime
+        
+        try:
+            # Get existing entry
+            result = self.knowledge_collection.get(ids=[knowledge_id])
+            if not result['documents']:
+                return False
+            
+            metadata = result['metadatas'][0]
+            full_doc = json.loads(metadata.get('full_doc', '{}'))
+            
+            # Update fields
+            if knowledge:
+                full_doc['knowledge'] = knowledge
+            if confidence:
+                full_doc['confidence'] = confidence
+            if verify:
+                full_doc['last_verified'] = datetime.utcnow().isoformat()
+            
+            # Update in collection
+            self.knowledge_collection.update(
+                ids=[knowledge_id],
+                documents=[full_doc['knowledge']],
+                metadatas=[{
+                    "topic": full_doc['topic'],
+                    "category": full_doc['category'],
+                    "source": full_doc['source'],
+                    "confidence": full_doc['confidence'],
+                    "tags": json.dumps(full_doc['tags']),
+                    "created_at": full_doc['created_at'],
+                    "full_doc": json.dumps(full_doc)
+                }]
+            )
+            return True
+        except Exception as e:
+            print(f"Error updating knowledge: {e}")
+            return False
+    
+    def list_knowledge_topics(self, category: str = None) -> list:
+        """List all unique topics in the knowledge base"""
+        try:
+            where_filter = {"category": category} if category else None
+            results = self.knowledge_collection.get(where=where_filter)
+            
+            topics = set()
+            for metadata in results['metadatas']:
+                topics.add(metadata.get('topic'))
+            
+            return sorted(list(topics))
+        except Exception as e:
+            print(f"Error listing knowledge topics: {e}")
+            return []
+
+
+if __name__ == "__main__":
+    import sys
+    
+    # Test the database
+    db = ContextDatabase()
+    
+    # Register test systems
+    db.register_system(
+        "macha",
+        "workstation",
+        ["ollama"],
+        capabilities=["ai-inference"]
+    )
+    
+    db.register_system(
+        "rhiannon",
+        "server",
+        ["gotify", "nextcloud", "prowlarr"],
+        capabilities=["notifications", "cloud-storage"]
+    )
+    
+    # Add relationship
+    db.add_relationship(
+        "macha",
+        "rhiannon",
+        "uses-service",
+        "Macha uses Rhiannon's Gotify for notifications"
+    )
+    
+    # Test queries
+    print("All systems:", db.get_all_systems())
+    print("\nMacha's dependencies:", db.get_dependencies("macha"))
+    print("\nRhiannon's dependents:", db.get_dependents("rhiannon"))
+    print("\nSystem context:", db.get_system_context("macha"))
+
--- a/conversation.py
+++ b/conversation.py
@@ -0,0 +1,328 @@
+#!/usr/bin/env python3
+"""
+Conversational Interface - Allows questioning Macha about decisions and system state
+"""
+
+import json
+import requests
+from typing import Dict, List, Any, Optional
+from pathlib import Path
+from datetime import datetime
+from agent import MachaAgent
+
+
+class MachaConversation:
+    """Conversational interface for Macha"""
+    
+    def __init__(
+        self,
+        ollama_host: str = "http://localhost:11434",
+        model: str = "gpt-oss:latest",
+        state_dir: Path = Path("/var/lib/macha")
+    ):
+        self.ollama_host = ollama_host
+        self.model = model
+        self.state_dir = state_dir
+        self.decision_log = self.state_dir / "decisions.jsonl"
+        self.approval_queue = self.state_dir / "approval_queue.json"
+        self.orchestrator_log = self.state_dir / "orchestrator.log"
+        
+        # Initialize agent with tool support and queue
+        self.agent = MachaAgent(
+            ollama_host=ollama_host,
+            model=model,
+            state_dir=state_dir,
+            enable_tools=True,
+            use_queue=True,
+            priority="INTERACTIVE"
+        )
+    
+    def ask(self, question: str, include_context: bool = True) -> str:
+        """Ask Macha a question with optional system context"""
+        
+        context = ""
+        if include_context:
+            context = self._gather_context()
+        
+        # Build messages for tool-aware chat
+        content = self._create_conversational_prompt(question, context)
+        messages = [{"role": "user", "content": content}]
+        
+        response = self.agent._query_ollama_with_tools(messages)
+        
+        return response
+    
+    def discuss_action(self, action_index: int) -> str:
+        """Discuss a specific queued action by its queue position (0-based index)"""
+        
+        action = self._get_action_from_queue(action_index)
+        if not action:
+            return f"No action found at queue position {action_index}. Use 'macha-approve list' to see available actions."
+        
+        context = self._gather_context()
+        action_context = json.dumps(action, indent=2)
+        
+        content = f"""TASK: DISCUSS PROPOSED ACTION
+================================================================================
+
+A user is asking about a proposed action in your approval queue.
+
+QUEUED ACTION (Queue Position #{action_index}):
+{action_context}
+
+RECENT SYSTEM CONTEXT:
+{context}
+
+The user wants to discuss this action. Explain:
+1. Why you proposed this action
+2. What problem it solves
+3. The risks involved
+4. What could go wrong
+5. Alternative approaches if any
+
+Be conversational, helpful, and honest about uncertainties.
+"""
+        
+        messages = [{"role": "user", "content": content}]
+        return self.agent._query_ollama_with_tools(messages)
+    
+    def _gather_context(self) -> str:
+        """Gather relevant system context for the conversation"""
+        
+        context_parts = []
+        
+        # System infrastructure from ChromaDB
+        try:
+            from context_db import ContextDatabase
+            db = ContextDatabase()
+            systems = db.get_all_systems()
+            
+            if systems:
+                context_parts.append("INFRASTRUCTURE:")
+                for system in systems:
+                    context_parts.append(f"  - {system['hostname']} ({system.get('type', 'unknown')})")
+                    if system.get('config_repo'):
+                        context_parts.append(f"    Config Repo: {system['config_repo']}")
+                        context_parts.append(f"    Branch: {system.get('config_branch', 'unknown')}")
+                    if system.get('capabilities'):
+                        context_parts.append(f"    Capabilities: {', '.join(system['capabilities'])}")
+        except Exception as e:
+            # ChromaDB not available, skip
+            pass
+        
+        # Recent decisions
+        recent_decisions = self._get_recent_decisions(5)
+        if recent_decisions:
+            context_parts.append("\nRECENT DECISIONS:")
+            for i, dec in enumerate(recent_decisions, 1):
+                timestamp = dec.get("timestamp", "unknown")
+                analysis = dec.get("analysis", {})
+                status = analysis.get("status", "unknown")
+                context_parts.append(f"{i}. [{timestamp}] Status: {status}")
+                if "issues" in analysis:
+                    for issue in analysis.get("issues", [])[:3]:
+                        context_parts.append(f"   - {issue.get('description', 'N/A')}")
+        
+        # Pending approvals
+        pending = self._get_pending_approvals()
+        if pending:
+            context_parts.append(f"\nPENDING APPROVALS: {len(pending)} action(s) awaiting approval")
+        
+        # Recent log excerpts (last 10 lines)
+        recent_logs = self._get_recent_logs(10)
+        if recent_logs:
+            context_parts.append("\nRECENT LOG ENTRIES:")
+            context_parts.extend(recent_logs)
+        
+        return "\n".join(context_parts)
+    
+    def _create_conversational_prompt(self, question: str, context: str) -> str:
+        """Create a conversational prompt"""
+        
+        return f"""{MachaAgent.SYSTEM_PROMPT}
+
+TASK: ANSWER QUESTION
+================================================================================
+
+You monitor system health, analyze issues using AI, and propose fixes. Be helpful, 
+honest about what you know and don't know, and reference the context provided below.
+
+SYSTEM CONTEXT:
+{context if context else "No recent activity"}
+
+USER QUESTION:
+{question}
+
+Respond conversationally and helpfully. If the question is about your recent decisions 
+or actions, reference the context above. If you don't have enough information, say so.
+Keep responses concise but informative.
+"""
+    
+    def _query_ollama(self, prompt: str, temperature: float = 0.7) -> str:
+        """Query Ollama API"""
+        try:
+            response = requests.post(
+                f"{self.ollama_host}/api/generate",
+                json={
+                    "model": self.model,
+                    "prompt": prompt,
+                    "stream": False,
+                    "temperature": temperature,
+                },
+                timeout=60
+            )
+            response.raise_for_status()
+            return response.json().get("response", "")
+        except requests.exceptions.HTTPError as e:
+            error_detail = ""
+            try:
+                error_detail = f" - {response.text}"
+            except:
+                pass
+            return f"Error: Ollama returned HTTP {response.status_code}{error_detail}"
+        except Exception as e:
+            return f"Error querying Ollama: {str(e)}"
+    
+    def _get_recent_decisions(self, count: int = 5) -> List[Dict[str, Any]]:
+        """Get recent decisions from log"""
+        if not self.decision_log.exists():
+            return []
+        
+        decisions = []
+        try:
+            with open(self.decision_log, 'r') as f:
+                for line in f:
+                    if line.strip():
+                        try:
+                            decisions.append(json.loads(line))
+                        except:
+                            pass
+        except:
+            pass
+        
+        return decisions[-count:]
+    
+    def _get_pending_approvals(self) -> List[Dict[str, Any]]:
+        """Get pending approvals from queue"""
+        if not self.approval_queue.exists():
+            return []
+        
+        try:
+            with open(self.approval_queue, 'r') as f:
+                data = json.load(f)
+                # Queue is a JSON array, not an object with "pending" key
+                if isinstance(data, list):
+                    return data
+                return data.get("pending", [])
+        except:
+            return []
+    
+    def _get_action_from_queue(self, action_index: int) -> Optional[Dict[str, Any]]:
+        """Get a specific action from the queue by index"""
+        pending = self._get_pending_approvals()
+        if 0 <= action_index < len(pending):
+            return pending[action_index]
+        return None
+    
+    def _get_recent_logs(self, count: int = 10) -> List[str]:
+        """Get recent orchestrator log lines"""
+        if not self.orchestrator_log.exists():
+            return []
+        
+        try:
+            with open(self.orchestrator_log, 'r') as f:
+                lines = f.readlines()
+                return [line.strip() for line in lines[-count:] if line.strip()]
+        except:
+            return []
+
+
+if __name__ == "__main__":
+    import sys
+    import argparse
+    
+    parser = argparse.ArgumentParser(description="Ask Macha a question or discuss an action")
+    parser.add_argument("--discuss", type=int, metavar="ACTION_ID", help="Discuss a specific queued action")
+    parser.add_argument("--follow-up", type=str, metavar="QUESTION", help="Follow-up question about the action")
+    parser.add_argument("question", nargs="*", help="Your question for Macha")
+    parser.add_argument("--no-context", action="store_true", help="Don't include system context")
+    
+    args = parser.parse_args()
+    
+    # Load config if available
+    config_file = Path("/etc/macha-autonomous/config.json")
+    ollama_host = "http://localhost:11434"
+    model = "gpt-oss:latest"
+    
+    if config_file.exists():
+        try:
+            with open(config_file, 'r') as f:
+                config = json.load(f)
+                ollama_host = config.get("ollama_host", ollama_host)
+                model = config.get("model", model)
+        except:
+            pass
+    
+    conversation = MachaConversation(
+        ollama_host=ollama_host,
+        model=model
+    )
+    
+    if args.discuss is not None:
+        if args.follow_up:
+            # Follow-up question about a specific action
+            action = conversation._get_action_from_queue(args.discuss)
+            if not action:
+                print(f"No action found at queue position {args.discuss}. Use 'macha-approve list' to see available actions.")
+                sys.exit(1)
+            
+            # Build context with the action details
+            action_context = f"""
+QUEUED ACTION #{args.discuss}:
+Diagnosis: {action.get('proposal', {}).get('diagnosis', 'N/A')}
+Proposed Action: {action.get('proposal', {}).get('proposed_action', 'N/A')}
+Action Type: {action.get('proposal', {}).get('action_type', 'N/A')}
+Risk Level: {action.get('proposal', {}).get('risk_level', 'N/A')}
+Commands: {json.dumps(action.get('proposal', {}).get('commands', []), indent=2)}
+Reasoning: {action.get('proposal', {}).get('reasoning', 'N/A')}
+
+FOLLOW-UP QUESTION:
+{args.follow_up}
+"""
+            
+            # Query the AI with the action context
+            response = conversation._query_ollama(f"""{MachaAgent.SYSTEM_PROMPT}
+
+TASK: ANSWER FOLLOW-UP QUESTION ABOUT QUEUED ACTION
+================================================================================
+
+You are answering a follow-up question about a proposed fix that is awaiting approval.
+Be helpful and answer directly. If the user is concerned about risks, explain them clearly.
+If they ask about alternatives, suggest them.
+
+{action_context}
+
+RESPOND CONCISELY AND DIRECTLY.
+""")
+            
+        else:
+            # Initial discussion about the action
+            response = conversation.discuss_action(args.discuss)
+    elif args.question:
+        # Ask a general question
+        question = " ".join(args.question)
+        response = conversation.ask(question, include_context=not args.no_context)
+    else:
+        parser.print_help()
+        sys.exit(1)
+    
+    # Only print formatted output for initial discussion, not for follow-ups
+    if args.follow_up:
+        print(response)
+    else:
+        print("\n" + "="*60)
+        print("MACHA:")
+        print("="*60)
+        print(response)
+        print("="*60 + "\n")
+
--- a/executor.py
+++ b/executor.py
@@ -0,0 +1,537 @@
+#!/usr/bin/env python3
+"""
+Action Executor - Safely executes proposed fixes with rollback capability
+"""
+
+import json
+import subprocess
+import shutil
+from typing import Dict, List, Any, Optional
+from pathlib import Path
+from datetime import datetime
+import time
+
+
+class SafeExecutor:
+    """Executes system maintenance actions with safety checks"""
+    
+    # Actions that are considered safe to auto-execute
+    SAFE_ACTIONS = {
+        "systemd_restart",  # Restart failed services
+        "cleanup",  # Disk cleanup, log rotation
+        "investigation",  # Read-only diagnostics
+    }
+    
+    # Services that should NEVER be stopped/disabled
+    PROTECTED_SERVICES = {
+        "sshd",
+        "systemd-networkd",
+        "NetworkManager",
+        "systemd-resolved",
+        "dbus",
+    }
+    
+    def __init__(
+        self,
+        state_dir: Path = Path("/var/lib/macha"),
+        autonomy_level: str = "suggest",  # observe, suggest, auto-safe, auto-full
+        dry_run: bool = False,
+        agent = None  # Optional agent for learning from actions
+    ):
+        self.state_dir = state_dir
+        self.state_dir.mkdir(parents=True, exist_ok=True)
+        self.autonomy_level = autonomy_level
+        self.dry_run = dry_run
+        self.agent = agent
+        self.action_log = self.state_dir / "actions.jsonl"
+        self.approval_queue = self.state_dir / "approval_queue.json"
+        
+    def execute_action(self, action: Dict[str, Any], monitoring_context: Dict[str, Any]) -> Dict[str, Any]:
+        """Execute a proposed action with appropriate safety checks"""
+        
+        action_type = action.get("action_type", "unknown")
+        risk_level = action.get("risk_level", "high")
+        
+        # Determine if we should execute
+        should_execute, reason = self._should_execute(action_type, risk_level)
+        
+        if not should_execute:
+            if self.autonomy_level == "suggest":
+                # Queue for approval
+                self._queue_for_approval(action, monitoring_context)
+                return {
+                    "executed": False,
+                    "status": "queued_for_approval",
+                    "reason": reason,
+                    "queue_file": str(self.approval_queue)
+                }
+            else:
+                return {
+                    "executed": False,
+                    "status": "blocked",
+                    "reason": reason
+                }
+        
+        # Execute the action
+        if self.dry_run:
+            return self._dry_run_action(action)
+        
+        return self._execute_action_impl(action, monitoring_context)
+    
+    def _should_execute(self, action_type: str, risk_level: str) -> tuple[bool, str]:
+        """Determine if an action should be auto-executed based on autonomy level"""
+        
+        if self.autonomy_level == "observe":
+            return False, "Autonomy level set to observe-only"
+        
+        # Auto-approve low-risk investigation actions
+        if action_type == "investigation" and risk_level == "low":
+            return True, "Auto-approved: Low-risk information gathering"
+        
+        if self.autonomy_level == "suggest":
+            return False, "Autonomy level requires manual approval"
+        
+        if self.autonomy_level == "auto-safe":
+            if action_type in self.SAFE_ACTIONS and risk_level == "low":
+                return True, "Auto-executing safe action"
+            return False, "Action requires higher autonomy level"
+        
+        if self.autonomy_level == "auto-full":
+            if risk_level == "high":
+                return False, "High risk actions always require approval"
+            return True, "Auto-executing approved action"
+        
+        return False, "Unknown autonomy level"
+    
+    def _execute_action_impl(self, action: Dict[str, Any], context: Dict[str, Any]) -> Dict[str, Any]:
+        """Actually execute the action"""
+        
+        action_type = action.get("action_type")
+        result = {
+            "executed": True,
+            "timestamp": datetime.now().isoformat(),
+            "action": action,
+            "success": False,
+            "output": "",
+            "error": None
+        }
+        
+        try:
+            if action_type == "systemd_restart":
+                result.update(self._restart_services(action))
+            
+            elif action_type == "cleanup":
+                result.update(self._perform_cleanup(action))
+            
+            elif action_type == "nix_rebuild":
+                result.update(self._nix_rebuild(action))
+            
+            elif action_type == "config_change":
+                result.update(self._apply_config_change(action))
+            
+            elif action_type == "investigation":
+                result.update(self._run_investigation(action))
+            
+            else:
+                result["error"] = f"Unknown action type: {action_type}"
+                
+        except Exception as e:
+            result["error"] = str(e)
+            result["success"] = False
+        
+        # Log the action
+        self._log_action(result)
+        
+        # Learn from successful operations
+        if result.get("success") and self.agent:
+            try:
+                self.agent.reflect_and_learn(
+                    situation=action.get("diagnosis", "Unknown situation"),
+                    action_taken=action.get("proposed_action", "Unknown action"),
+                    outcome=result.get("output", ""),
+                    success=True
+                )
+            except Exception as e:
+                # Don't fail the action if learning fails
+                print(f"Note: Could not record learning: {e}")
+        
+        return result
+    
+    def _restart_services(self, action: Dict[str, Any]) -> Dict[str, Any]:
+        """Restart systemd services"""
+        commands = action.get("commands", [])
+        output_lines = []
+        
+        for cmd in commands:
+            if not cmd.startswith("systemctl restart "):
+                continue
+            
+            service = cmd.split()[-1]
+            
+            # Safety check
+            if any(protected in service for protected in self.PROTECTED_SERVICES):
+                output_lines.append(f"BLOCKED: {service} is protected")
+                continue
+            
+            try:
+                result = subprocess.run(
+                    ["systemctl", "restart", service],
+                    capture_output=True,
+                    text=True,
+                    timeout=30
+                )
+                
+                if result.returncode == 0:
+                    output_lines.append(f"✓ Restarted {service}")
+                else:
+                    output_lines.append(f"✗ Failed to restart {service}: {result.stderr}")
+                
+            except subprocess.TimeoutExpired:
+                output_lines.append(f"✗ Timeout restarting {service}")
+        
+        return {
+            "success": len(output_lines) > 0,
+            "output": "\n".join(output_lines)
+        }
+    
+    def _perform_cleanup(self, action: Dict[str, Any]) -> Dict[str, Any]:
+        """Perform system cleanup tasks"""
+        output_lines = []
+        
+        # Nix store cleanup
+        if "nix" in action.get("proposed_action", "").lower():
+            try:
+                result = subprocess.run(
+                    ["nix-collect-garbage", "--delete-old"],
+                    capture_output=True,
+                    text=True,
+                    timeout=300
+                )
+                output_lines.append(f"Nix cleanup: {result.stdout}")
+            except Exception as e:
+                output_lines.append(f"Nix cleanup failed: {e}")
+        
+        # Journal cleanup (keep last 7 days)
+        try:
+            result = subprocess.run(
+                ["journalctl", "--vacuum-time=7d"],
+                capture_output=True,
+                text=True,
+                timeout=60
+            )
+            output_lines.append(f"Journal cleanup: {result.stdout}")
+        except Exception as e:
+            output_lines.append(f"Journal cleanup failed: {e}")
+        
+        return {
+            "success": True,
+            "output": "\n".join(output_lines)
+        }
+    
+    def _nix_rebuild(self, action: Dict[str, Any]) -> Dict[str, Any]:
+        """Rebuild NixOS configuration"""
+        
+        # This is HIGH RISK - always requires approval or full autonomy
+        # And we should test first
+        
+        output_lines = []
+        
+        # First, try a dry build
+        try:
+            result = subprocess.run(
+                ["nixos-rebuild", "dry-build", "--flake", ".#macha"],
+                capture_output=True,
+                text=True,
+                timeout=600,
+                cwd="/home/lily/Documents/nixos-servers"
+            )
+            
+            if result.returncode != 0:
+                return {
+                    "success": False,
+                    "output": f"Dry build failed:\n{result.stderr}"
+                }
+            
+            output_lines.append("✓ Dry build successful")
+            
+        except Exception as e:
+            return {
+                "success": False,
+                "output": f"Dry build error: {e}"
+            }
+        
+        # Now do the actual rebuild
+        try:
+            result = subprocess.run(
+                ["nixos-rebuild", "switch", "--flake", ".#macha"],
+                capture_output=True,
+                text=True,
+                timeout=1200,
+                cwd="/home/lily/Documents/nixos-servers"
+            )
+            
+            output_lines.append(result.stdout)
+            
+            return {
+                "success": result.returncode == 0,
+                "output": "\n".join(output_lines),
+                "error": result.stderr if result.returncode != 0 else None
+            }
+            
+        except Exception as e:
+            return {
+                "success": False,
+                "output": "\n".join(output_lines),
+                "error": str(e)
+            }
+    
+    def _apply_config_change(self, action: Dict[str, Any]) -> Dict[str, Any]:
+        """Apply a configuration file change"""
+        
+        config_changes = action.get("config_changes", {})
+        file_path = config_changes.get("file")
+        
+        if not file_path:
+            return {
+                "success": False,
+                "output": "No file specified in config_changes"
+            }
+        
+        # For now, we DON'T auto-modify configs - too risky
+        # Instead, we create a suggested patch file
+        
+        patch_file = self.state_dir / f"suggested_patch_{int(time.time())}.txt"
+        with open(patch_file, 'w') as f:
+            f.write(f"Suggested change to {file_path}:\n\n")
+            f.write(config_changes.get("change", "No change description"))
+            f.write(f"\n\nReasoning: {action.get('reasoning', 'No reasoning provided')}")
+        
+        return {
+            "success": True,
+            "output": f"Config change suggestion saved to {patch_file}\nThis requires manual review and application."
+        }
+    
+    def _run_investigation(self, action: Dict[str, Any]) -> Dict[str, Any]:
+        """Run diagnostic commands"""
+        commands = action.get("commands", [])
+        output_lines = []
+        
+        for cmd in commands:
+            # Only allow safe read-only commands
+            safe_commands = ["journalctl", "systemctl status", "df", "free", "ps", "netstat", "ss"]
+            if not any(cmd.startswith(safe) for safe in safe_commands):
+                output_lines.append(f"BLOCKED unsafe command: {cmd}")
+                continue
+            
+            try:
+                result = subprocess.run(
+                    cmd,
+                    shell=True,
+                    capture_output=True,
+                    text=True,
+                    timeout=30
+                )
+                output_lines.append(f"$ {cmd}")
+                output_lines.append(result.stdout)
+            except Exception as e:
+                output_lines.append(f"Error running {cmd}: {e}")
+        
+        return {
+            "success": True,
+            "output": "\n".join(output_lines)
+        }
+    
+    def _dry_run_action(self, action: Dict[str, Any]) -> Dict[str, Any]:
+        """Simulate action execution"""
+        return {
+            "executed": False,
+            "status": "dry_run",
+            "action": action,
+            "output": "Dry run mode - no actual changes made"
+        }
+    
+    def _queue_for_approval(self, action: Dict[str, Any], context: Dict[str, Any]):
+        """Add action to approval queue"""
+        queue = []
+        if self.approval_queue.exists():
+            with open(self.approval_queue, 'r') as f:
+                queue = json.load(f)
+        
+        # Check for duplicate pending actions
+        proposed_action = action.get("proposed_action", "")
+        diagnosis = action.get("diagnosis", "")
+        
+        for existing in queue:
+            # Skip already approved/rejected items
+            if existing.get("approved") is not None:
+                continue
+            
+            existing_action = existing.get("action", {})
+            existing_proposed = existing_action.get("proposed_action", "")
+            existing_diagnosis = existing_action.get("diagnosis", "")
+            
+            # Check if this is essentially the same issue
+            # Match if diagnosis is very similar OR proposed action is very similar
+            if (diagnosis and existing_diagnosis and 
+                self._similarity_check(diagnosis, existing_diagnosis) > 0.7):
+                print(f"Skipping duplicate action - similar diagnosis already queued")
+                return
+            
+            if (proposed_action and existing_proposed and
+                self._similarity_check(proposed_action, existing_proposed) > 0.7):
+                print(f"Skipping duplicate action - similar proposal already queued")
+                return
+        
+        queue.append({
+            "timestamp": datetime.now().isoformat(),
+            "action": action,
+            "context": context,
+            "approved": None
+        })
+        
+        with open(self.approval_queue, 'w') as f:
+            json.dump(queue, f, indent=2)
+    
+    def _similarity_check(self, str1: str, str2: str) -> float:
+        """Simple similarity check between two strings"""
+        # Normalize strings
+        s1 = str1.lower().strip()
+        s2 = str2.lower().strip()
+        
+        # Exact match
+        if s1 == s2:
+            return 1.0
+        
+        # Check for significant word overlap
+        words1 = set(s1.split())
+        words2 = set(s2.split())
+        
+        # Remove common words that don't indicate similarity
+        common_words = {'the', 'a', 'an', 'and', 'or', 'but', 'in', 'on', 'at', 'to', 'for', 'of', 'with', 'by', 'is', 'are', 'was', 'were', 'be', 'been', 'have', 'has', 'had'}
+        words1 = words1 - common_words
+        words2 = words2 - common_words
+        
+        if not words1 or not words2:
+            return 0.0
+        
+        # Calculate Jaccard similarity
+        intersection = len(words1 & words2)
+        union = len(words1 | words2)
+        
+        return intersection / union if union > 0 else 0.0
+    
+    def _log_action(self, result: Dict[str, Any]):
+        """Log executed actions"""
+        with open(self.action_log, 'a') as f:
+            f.write(json.dumps(result) + '\n')
+    
+    def get_approval_queue(self) -> List[Dict[str, Any]]:
+        """Get pending actions awaiting approval"""
+        if not self.approval_queue.exists():
+            return []
+        
+        with open(self.approval_queue, 'r') as f:
+            return json.load(f)
+    
+    def approve_action(self, index: int) -> bool:
+        """Approve and execute a queued action, then remove it from queue"""
+        queue = self.get_approval_queue()
+        if 0 <= index < len(queue):
+            action_item = queue[index]
+            
+            # Execute the approved action
+            result = self._execute_action_impl(action_item["action"], action_item["context"])
+            
+            # Archive the action (success or failure)
+            self._archive_action(action_item, result)
+            
+            # Remove from queue regardless of outcome
+            queue.pop(index)
+            
+            with open(self.approval_queue, 'w') as f:
+                json.dump(queue, f, indent=2)
+            
+            return result.get("success", False)
+        
+        return False
+    
+    def _archive_action(self, action_item: Dict[str, Any], result: Dict[str, Any]):
+        """Archive an approved action with its execution result"""
+        archive_file = self.state_dir / "approved_actions.jsonl"
+        
+        archive_entry = {
+            "timestamp": datetime.now().isoformat(),
+            "original_timestamp": action_item.get("timestamp"),
+            "action": action_item.get("action"),
+            "context": action_item.get("context"),
+            "result": result
+        }
+        
+        with open(archive_file, 'a') as f:
+            f.write(json.dumps(archive_entry) + '\n')
+    
+    def reject_action(self, index: int) -> bool:
+        """Reject and remove a queued action"""
+        queue = self.get_approval_queue()
+        if 0 <= index < len(queue):
+            removed_action = queue.pop(index)
+            
+            with open(self.approval_queue, 'w') as f:
+                json.dump(queue, f, indent=2)
+            
+            return True
+        
+        return False
+
+
+if __name__ == "__main__":
+    import sys
+    
+    if len(sys.argv) > 1:
+        if sys.argv[1] == "queue":
+            executor = SafeExecutor()
+            queue = executor.get_approval_queue()
+            if queue:
+                print("\n" + "="*70)
+                print(f"PENDING ACTIONS: {len(queue)}")
+                print("="*70)
+                for i, item in enumerate(queue):
+                    action = item.get("action", {})
+                    timestamp = item.get("timestamp", "unknown")
+                    approved = item.get("approved")
+                    
+                    status = "✓ APPROVED" if approved else "⏳ PENDING" if approved is None else "✗ REJECTED"
+                    
+                    print(f"\n[{i}] {status} - {timestamp}")
+                    print("-" * 70)
+                    print(f"DIAGNOSIS: {action.get('diagnosis', 'N/A')}")
+                    print(f"\nPROPOSED ACTION: {action.get('proposed_action', 'N/A')}")
+                    print(f"TYPE: {action.get('action_type', 'N/A')}")
+                    print(f"RISK: {action.get('risk_level', 'N/A')}")
+                    
+                    if action.get('commands'):
+                        print(f"\nCOMMANDS:")
+                        for cmd in action['commands']:
+                            print(f"  - {cmd}")
+                    
+                    if action.get('config_changes'):
+                        print(f"\nCONFIG CHANGES:")
+                        for key, value in action['config_changes'].items():
+                            print(f"  {key}: {value}")
+                    
+                    print(f"\nREASONING: {action.get('reasoning', 'N/A')}")
+                print("\n" + "="*70 + "\n")
+            else:
+                print("No pending actions")
+        
+        elif sys.argv[1] == "approve" and len(sys.argv) > 2:
+            executor = SafeExecutor()
+            index = int(sys.argv[2])
+            success = executor.approve_action(index)
+            print(f"Approval {'succeeded' if success else 'failed'}")
+        
+        elif sys.argv[1] == "reject" and len(sys.argv) > 2:
+            executor = SafeExecutor()
+            index = int(sys.argv[2])
+            success = executor.reject_action(index)
+            print(f"Action {'rejected and removed from queue' if success else 'rejection failed'}")
--- a/flake.nix
+++ b/flake.nix
@@ -0,0 +1,41 @@
+{
+  description = "Macha - AI-Powered Autonomous System Administrator";
+
+  inputs = {
+    nixpkgs.url = "github:NixOS/nixpkgs/nixos-unstable";
+  };
+
+  outputs = { self, nixpkgs }: {
+    # NixOS module
+    nixosModules.default = import ./module.nix;
+    
+    # Alternative explicit name
+    nixosModules.macha-autonomous = import ./module.nix;
+
+    # For development
+    devShells = nixpkgs.lib.genAttrs [ "x86_64-linux" "aarch64-linux" ] (system:
+      let
+        pkgs = nixpkgs.legacyPackages.${system};
+        pythonEnv = pkgs.python3.withPackages (ps: with ps; [
+          requests
+          psutil
+          chromadb
+        ]);
+      in {
+        default = pkgs.mkShell {
+          packages = [ pythonEnv pkgs.git ];
+          shellHook = ''
+            echo "Macha Autonomous Development Environment"
+            echo "Python packages: requests, psutil, chromadb"
+          '';
+        };
+      }
+    );
+
+    # Formatter
+    formatter = nixpkgs.lib.genAttrs [ "x86_64-linux" "aarch64-linux" ] (system:
+      nixpkgs.legacyPackages.${system}.nixpkgs-fmt
+    );
+  };
+}
+
--- a/git_context.py
+++ b/git_context.py
@@ -0,0 +1,222 @@
+#!/usr/bin/env python3
+"""
+Git Context - Extract context from NixOS configuration repository
+"""
+
+import subprocess
+from typing import Dict, List, Any, Optional
+from datetime import datetime, timedelta
+from pathlib import Path
+
+
+class GitContext:
+    """Extract context from git repository"""
+    
+    def __init__(self, repo_path: str = "/etc/nixos"):
+        """
+        Initialize git context extractor
+        
+        Args:
+            repo_path: Path to the git repository (default: /etc/nixos for NixOS systems)
+        """
+        self.repo_path = Path(repo_path)
+        
+    def _run_git(self, args: List[str]) -> tuple[bool, str]:
+        """Run git command"""
+        try:
+            result = subprocess.run(
+                ["git", "-C", str(self.repo_path)] + args,
+                capture_output=True,
+                text=True,
+                timeout=10
+            )
+            return (result.returncode == 0, result.stdout.strip())
+        except Exception as e:
+            return (False, str(e))
+    
+    def get_current_branch(self) -> str:
+        """Get current git branch"""
+        success, output = self._run_git(["rev-parse", "--abbrev-ref", "HEAD"])
+        return output if success else "unknown"
+    
+    def get_remote_url(self) -> str:
+        """Get git remote URL"""
+        success, output = self._run_git(["remote", "get-url", "origin"])
+        return output if success else ""
+    
+    def get_recent_commits(self, count: int = 10, since: str = "1 week ago") -> List[Dict[str, str]]:
+        """
+        Get recent commits
+        
+        Args:
+            count: Number of commits to retrieve
+            since: Time range (e.g., "1 week ago", "3 days ago")
+            
+        Returns:
+            List of commit dictionaries with hash, author, date, message
+        """
+        success, output = self._run_git([
+            "log",
+            f"--since={since}",
+            f"-n{count}",
+            "--format=%H|%an|%ar|%s"
+        ])
+        
+        if not success:
+            return []
+        
+        commits = []
+        for line in output.split('\n'):
+            if not line.strip():
+                continue
+            parts = line.split('|', 3)
+            if len(parts) == 4:
+                commits.append({
+                    "hash": parts[0][:8],  # Short hash
+                    "author": parts[1],
+                    "date": parts[2],
+                    "message": parts[3]
+                })
+        
+        return commits
+    
+    def get_system_config_files(self, system_name: str) -> List[str]:
+        """
+        Get configuration files for a specific system
+        
+        Args:
+            system_name: Name of the system (e.g., "macha", "rhiannon")
+            
+        Returns:
+            List of configuration file paths
+        """
+        system_dir = self.repo_path / "systems" / system_name
+        config_files = []
+        
+        if system_dir.exists():
+            # Main config
+            if (system_dir.parent / f"{system_name}.nix").exists():
+                config_files.append(f"systems/{system_name}.nix")
+            
+            # System-specific configs
+            for config_file in system_dir.rglob("*.nix"):
+                config_files.append(str(config_file.relative_to(self.repo_path)))
+        
+        return config_files
+    
+    def get_recent_changes_for_system(self, system_name: str, since: str = "1 week ago") -> List[Dict[str, str]]:
+        """
+        Get recent changes affecting a specific system
+        
+        Args:
+            system_name: Name of the system
+            since: Time range
+            
+        Returns:
+            List of commits that affected this system
+        """
+        config_files = self.get_system_config_files(system_name)
+        
+        if not config_files:
+            return []
+        
+        # Get commits that touched these files
+        file_args = []
+        for f in config_files:
+            file_args.extend(["--", f])
+        
+        success, output = self._run_git([
+            "log",
+            f"--since={since}",
+            "-n10",
+            "--format=%H|%an|%ar|%s"
+        ] + file_args)
+        
+        if not success:
+            return []
+        
+        commits = []
+        for line in output.split('\n'):
+            if not line.strip():
+                continue
+            parts = line.split('|', 3)
+            if len(parts) == 4:
+                commits.append({
+                    "hash": parts[0][:8],
+                    "author": parts[1],
+                    "date": parts[2],
+                    "message": parts[3]
+                })
+        
+        return commits
+    
+    def get_system_context_summary(self, system_name: str) -> str:
+        """
+        Get a summary of git context for a system
+        
+        Args:
+            system_name: Name of the system
+            
+        Returns:
+            Human-readable summary
+        """
+        lines = []
+        
+        # Repository info
+        repo_url = self.get_remote_url()
+        branch = self.get_current_branch()
+        
+        if repo_url:
+            lines.append(f"Configuration Repository: {repo_url}")
+        lines.append(f"Branch: {branch}")
+        
+        # Recent changes to this system
+        recent_changes = self.get_recent_changes_for_system(system_name, "2 weeks ago")
+        
+        if recent_changes:
+            lines.append(f"\nRecent configuration changes (last 2 weeks):")
+            for commit in recent_changes[:5]:
+                lines.append(f"  - {commit['date']}: {commit['message']} ({commit['author']})")
+        else:
+            lines.append("\nNo recent configuration changes")
+        
+        return "\n".join(lines)
+    
+    def get_all_managed_systems(self) -> List[str]:
+        """
+        Get list of all systems managed by this repository
+        
+        Returns:
+            List of system names
+        """
+        systems = []
+        systems_dir = self.repo_path / "systems"
+        
+        if systems_dir.exists():
+            for system_file in systems_dir.glob("*.nix"):
+                if system_file.stem not in ["default"]:
+                    systems.append(system_file.stem)
+        
+        return sorted(systems)
+
+
+if __name__ == "__main__":
+    import sys
+    
+    git = GitContext()
+    
+    print("Repository:", git.get_remote_url())
+    print("Branch:", git.get_current_branch())
+    print("\nManaged Systems:")
+    for system in git.get_all_managed_systems():
+        print(f"  - {system}")
+    
+    print("\nRecent Commits:")
+    for commit in git.get_recent_commits(5):
+        print(f"  {commit['hash']}: {commit['message']} - {commit['author']}, {commit['date']}")
+    
+    if len(sys.argv) > 1:
+        system = sys.argv[1]
+        print(f"\nContext for {system}:")
+        print(git.get_system_context_summary(system))
+
--- a/issue_tracker.py
+++ b/issue_tracker.py
@@ -0,0 +1,219 @@
+#!/usr/bin/env python3
+"""
+Issue Tracker - Internal ticketing system for tracking problems and their resolution
+"""
+
+import json
+import uuid
+from datetime import datetime
+from typing import Dict, List, Any, Optional
+from pathlib import Path
+
+
+class IssueTracker:
+    """Manages issue lifecycle: detection -> investigation -> resolution"""
+    
+    def __init__(self, context_db, log_dir: str = "/var/lib/macha/logs"):
+        self.context_db = context_db
+        self.log_dir = Path(log_dir)
+        self.log_dir.mkdir(parents=True, exist_ok=True)
+        self.closed_log = self.log_dir / "closed_issues.jsonl"
+    
+    def create_issue(
+        self,
+        hostname: str,
+        title: str,
+        description: str,
+        severity: str = "medium",
+        source: str = "auto-detected"
+    ) -> str:
+        """Create a new issue and return its ID"""
+        issue_id = str(uuid.uuid4())
+        now = datetime.utcnow().isoformat()
+        
+        issue = {
+            "issue_id": issue_id,
+            "hostname": hostname,
+            "title": title,
+            "description": description,
+            "status": "open",
+            "severity": severity,
+            "created_at": now,
+            "updated_at": now,
+            "source": source,
+            "investigations": [],
+            "actions": [],
+            "resolution": None
+        }
+        
+        self.context_db.store_issue(issue)
+        return issue_id
+    
+    def get_issue(self, issue_id: str) -> Optional[Dict[str, Any]]:
+        """Retrieve an issue by ID"""
+        return self.context_db.get_issue(issue_id)
+    
+    def update_issue(
+        self,
+        issue_id: str,
+        status: Optional[str] = None,
+        investigation: Optional[Dict[str, Any]] = None,
+        action: Optional[Dict[str, Any]] = None
+    ) -> bool:
+        """Update an issue with new information"""
+        issue = self.get_issue(issue_id)
+        if not issue:
+            return False
+        
+        if status:
+            issue["status"] = status
+        
+        if investigation:
+            investigation["timestamp"] = datetime.utcnow().isoformat()
+            issue["investigations"].append(investigation)
+        
+        if action:
+            action["timestamp"] = datetime.utcnow().isoformat()
+            issue["actions"].append(action)
+        
+        issue["updated_at"] = datetime.utcnow().isoformat()
+        
+        self.context_db.update_issue(issue)
+        return True
+    
+    def find_similar_issue(
+        self,
+        hostname: str,
+        title: str,
+        description: str = None
+    ) -> Optional[Dict[str, Any]]:
+        """Find an existing open issue that matches this problem"""
+        open_issues = self.list_issues(hostname=hostname, status="open")
+        
+        # Simple similarity check on title
+        title_lower = title.lower()
+        for issue in open_issues:
+            issue_title_lower = issue.get("title", "").lower()
+            
+            # Check for keyword overlap
+            title_words = set(title_lower.split())
+            issue_words = set(issue_title_lower.split())
+            
+            # If >50% of words overlap, consider it similar
+            if len(title_words & issue_words) / max(len(title_words), 1) > 0.5:
+                return issue
+        
+        return None
+    
+    def list_issues(
+        self,
+        hostname: Optional[str] = None,
+        status: Optional[str] = None,
+        severity: Optional[str] = None
+    ) -> List[Dict[str, Any]]:
+        """List issues with optional filters"""
+        return self.context_db.list_issues(
+            hostname=hostname,
+            status=status,
+            severity=severity
+        )
+    
+    def resolve_issue(self, issue_id: str, resolution: str) -> bool:
+        """Mark an issue as resolved with a resolution note"""
+        issue = self.get_issue(issue_id)
+        if not issue:
+            return False
+        
+        issue["status"] = "resolved"
+        issue["resolution"] = resolution
+        issue["updated_at"] = datetime.utcnow().isoformat()
+        
+        self.context_db.update_issue(issue)
+        return True
+    
+    def close_issue(self, issue_id: str) -> bool:
+        """Archive a resolved issue to the closed log"""
+        issue = self.get_issue(issue_id)
+        if not issue:
+            return False
+        
+        # Can only close resolved issues
+        if issue["status"] != "resolved":
+            return False
+        
+        issue["status"] = "closed"
+        issue["closed_at"] = datetime.utcnow().isoformat()
+        
+        # Archive to closed log
+        self._archive_issue(issue)
+        
+        # Remove from active database
+        self.context_db.delete_issue(issue_id)
+        
+        return True
+    
+    def get_issue_history(self, issue_id: str) -> Dict[str, Any]:
+        """Get full history for an issue (investigations + actions)"""
+        issue = self.get_issue(issue_id)
+        if not issue:
+            return {}
+        
+        return {
+            "issue": issue,
+            "investigation_count": len(issue.get("investigations", [])),
+            "action_count": len(issue.get("actions", [])),
+            "age_hours": self._calculate_age(issue["created_at"]),
+            "last_activity": issue["updated_at"]
+        }
+    
+    def auto_resolve_if_fixed(self, hostname: str, detected_problems: List[str]) -> int:
+        """
+        Auto-resolve open issues if their problems are no longer detected.
+        Returns count of auto-resolved issues.
+        """
+        open_issues = self.list_issues(hostname=hostname, status="open")
+        resolved_count = 0
+        
+        # Convert detected problems to lowercase for comparison
+        detected_lower = [p.lower() for p in detected_problems]
+        
+        for issue in open_issues:
+            title_lower = issue.get("title", "").lower()
+            desc_lower = issue.get("description", "").lower()
+            
+            # Check if issue keywords are still in detected problems
+            still_present = False
+            for detected in detected_lower:
+                if any(word in detected for word in title_lower.split()) or \
+                   any(word in detected for word in desc_lower.split()):
+                    still_present = True
+                    break
+            
+            # If problem is no longer detected, auto-resolve
+            if not still_present:
+                self.resolve_issue(
+                    issue["issue_id"],
+                    "Auto-resolved: Problem no longer detected in system monitoring"
+                )
+                resolved_count += 1
+        
+        return resolved_count
+    
+    def _archive_issue(self, issue: Dict[str, Any]):
+        """Append closed issue to the archive log"""
+        try:
+            with open(self.closed_log, "a") as f:
+                f.write(json.dumps(issue) + "\n")
+        except Exception as e:
+            print(f"Failed to archive issue {issue.get('issue_id')}: {e}")
+    
+    def _calculate_age(self, created_at: str) -> float:
+        """Calculate age of issue in hours"""
+        try:
+            created = datetime.fromisoformat(created_at)
+            now = datetime.utcnow()
+            delta = now - created
+            return delta.total_seconds() / 3600
+        except:
+            return 0
+
--- a/journal_monitor.py
+++ b/journal_monitor.py
@@ -0,0 +1,358 @@
+#!/usr/bin/env python3
+"""
+Journal Monitor - Monitor remote systems via centralized journald
+"""
+
+import json
+import subprocess
+from typing import Dict, List, Any, Optional, Set
+from datetime import datetime, timedelta
+from pathlib import Path
+from collections import defaultdict
+
+
+class JournalMonitor:
+    """Monitor systems via centralized journald logs"""
+    
+    def __init__(self, domain: str = "coven.systems"):
+        """
+        Initialize journal monitor
+        
+        Args:
+            domain: Domain suffix for FQDNs
+        """
+        self.domain = domain
+        self.known_hosts: Set[str] = set()
+        
+    def _run_journalctl(self, args: List[str], timeout: int = 30) -> tuple[bool, str, str]:
+        """
+        Run journalctl command
+        
+        Args:
+            args: Arguments to journalctl
+            timeout: Timeout in seconds
+            
+        Returns:
+            (success, stdout, stderr)
+        """
+        try:
+            cmd = ["journalctl"] + args
+            
+            result = subprocess.run(
+                cmd,
+                capture_output=True,
+                text=True,
+                timeout=timeout
+            )
+            
+            return (
+                result.returncode == 0,
+                result.stdout.strip(),
+                result.stderr.strip()
+            )
+            
+        except subprocess.TimeoutExpired:
+            return False, "", f"Command timed out after {timeout}s"
+        except Exception as e:
+            return False, "", str(e)
+    
+    def discover_hosts(self) -> List[str]:
+        """
+        Discover hosts reporting to centralized journal
+        
+        Returns:
+            List of discovered FQDNs
+        """
+        success, output, _ = self._run_journalctl([
+            "--output=json",
+            "--since=1 day ago",
+            "-n", "10000"
+        ])
+        
+        if not success:
+            return []
+        
+        hosts = set()
+        for line in output.split('\n'):
+            if not line.strip():
+                continue
+            try:
+                entry = json.loads(line)
+                hostname = entry.get('_HOSTNAME', '')
+                
+                # Ensure FQDN format
+                if hostname and not hostname.endswith(f'.{self.domain}'):
+                    if '.' not in hostname:
+                        hostname = f"{hostname}.{self.domain}"
+                
+                if hostname:
+                    hosts.add(hostname)
+                    
+            except json.JSONDecodeError:
+                continue
+        
+        self.known_hosts = hosts
+        return sorted(hosts)
+    
+    def collect_resources(self, hostname: str, since: str = "5 minutes ago") -> Dict[str, Any]:
+        """
+        Collect resource usage from journal entries
+        
+        This extracts CPU/memory info from systemd service messages
+        """
+        # For now, return empty - we'll primarily use this for service/log monitoring
+        # Resource metrics could be added if systems log them
+        return {
+            "cpu_percent": 0,
+            "memory_percent": 0,
+            "load_average": {"1min": 0, "5min": 0, "15min": 0}
+        }
+    
+    def collect_systemd_status(self, hostname: str, since: str = "5 minutes ago") -> Dict[str, Any]:
+        """
+        Collect systemd service status from journal
+        
+        Args:
+            hostname: FQDN of the system
+            since: Time range to check
+            
+        Returns:
+            Dictionary with failed service information
+        """
+        # Query for systemd service failures
+        success, output, _ = self._run_journalctl([
+            f"_HOSTNAME={hostname}",
+            "--priority=err",
+            "--unit=*.service",
+            f"--since={since}",
+            "--output=json"
+        ])
+        
+        if not success:
+            return {"failed_count": 0, "failed_services": []}
+        
+        failed_services = {}
+        for line in output.split('\n'):
+            if not line.strip():
+                continue
+            try:
+                entry = json.loads(line)
+                unit = entry.get('_SYSTEMD_UNIT', '')
+                if unit and unit.endswith('.service'):
+                    service_name = unit.replace('.service', '')
+                    if service_name not in failed_services:
+                        failed_services[service_name] = {
+                            "unit": unit,
+                            "message": entry.get('MESSAGE', ''),
+                            "timestamp": entry.get('__REALTIME_TIMESTAMP', '')
+                        }
+            except json.JSONDecodeError:
+                continue
+        
+        return {
+            "failed_count": len(failed_services),
+            "failed_services": list(failed_services.values())
+        }
+    
+    def collect_log_errors(self, hostname: str, since: str = "1 hour ago") -> Dict[str, Any]:
+        """
+        Collect error logs from journal
+        
+        Args:
+            hostname: FQDN of the system
+            since: Time range to check
+            
+        Returns:
+            Dictionary with error log information
+        """
+        success, output, _ = self._run_journalctl([
+            f"_HOSTNAME={hostname}",
+            "--priority=err",
+            f"--since={since}",
+            "--output=json"
+        ])
+        
+        if not success:
+            return {"error_count_1h": 0, "recent_errors": []}
+        
+        errors = []
+        error_count = 0
+        
+        for line in output.split('\n'):
+            if not line.strip():
+                continue
+            try:
+                entry = json.loads(line)
+                error_count += 1
+                
+                if len(errors) < 10:  # Keep last 10 errors
+                    errors.append({
+                        "message": entry.get('MESSAGE', ''),
+                        "unit": entry.get('_SYSTEMD_UNIT', 'unknown'),
+                        "priority": entry.get('PRIORITY', ''),
+                        "timestamp": entry.get('__REALTIME_TIMESTAMP', '')
+                    })
+                    
+            except json.JSONDecodeError:
+                continue
+        
+        return {
+            "error_count_1h": error_count,
+            "recent_errors": errors
+        }
+    
+    def collect_disk_usage(self, hostname: str) -> Dict[str, Any]:
+        """
+        Collect disk usage - Note: This would require systems to log disk metrics
+        For now, returns empty. Could be enhanced if systems periodically log disk usage
+        """
+        return {"partitions": []}
+    
+    def collect_network_status(self, hostname: str, since: str = "5 minutes ago") -> Dict[str, Any]:
+        """
+        Check network connectivity based on recent journal activity
+        
+        If we see recent logs from a host, it's reachable
+        """
+        success, output, _ = self._run_journalctl([
+            f"_HOSTNAME={hostname}",
+            f"--since={since}",
+            "-n", "1",
+            "--output=json"
+        ])
+        
+        # If we got recent logs, network is working
+        internet_reachable = bool(success and output.strip())
+        
+        return {
+            "internet_reachable": internet_reachable,
+            "last_seen": datetime.now().isoformat() if internet_reachable else None
+        }
+    
+    def collect_all(self, hostname: str) -> Dict[str, Any]:
+        """
+        Collect all monitoring data for a host from journal
+        
+        Args:
+            hostname: FQDN of the system to monitor
+            
+        Returns:
+            Complete monitoring data
+        """
+        # First check if we have recent logs from this host
+        net_status = self.collect_network_status(hostname)
+        
+        if not net_status.get("internet_reachable"):
+            return {
+                "hostname": hostname,
+                "reachable": False,
+                "error": "No recent journal entries from this host"
+            }
+        
+        return {
+            "hostname": hostname,
+            "reachable": True,
+            "source": "journal",
+            "resources": self.collect_resources(hostname),
+            "systemd": self.collect_systemd_status(hostname),
+            "disk": self.collect_disk_usage(hostname),
+            "network": net_status,
+            "logs": self.collect_log_errors(hostname),
+        }
+    
+    def get_summary(self, data: Dict[str, Any]) -> str:
+        """Generate human-readable summary from journal data"""
+        hostname = data.get("hostname", "unknown")
+        
+        if not data.get("reachable", False):
+            return f"❌ {hostname}: {data.get('error', 'Unreachable')}"
+        
+        lines = [f"System: {hostname} (via journal)"]
+        
+        # Services
+        systemd = data.get("systemd", {})
+        failed_count = systemd.get("failed_count", 0)
+        if failed_count > 0:
+            lines.append(f"Services: {failed_count} failed")
+            for svc in systemd.get("failed_services", [])[:3]:
+                lines.append(f"  - {svc.get('unit', 'unknown')}")
+        else:
+            lines.append("Services: No recent failures")
+        
+        # Network
+        net = data.get("network", {})
+        last_seen = net.get("last_seen")
+        if last_seen:
+            lines.append(f"Last seen: {last_seen}")
+        
+        # Logs
+        logs = data.get("logs", {})
+        error_count = logs.get("error_count_1h", 0)
+        if error_count > 0:
+            lines.append(f"Recent logs: {error_count} errors in last hour")
+        
+        return "\n".join(lines)
+    
+    def get_active_services(self, hostname: str, since: str = "1 hour ago") -> List[str]:
+        """
+        Get list of active services on a host by looking at journal entries
+        
+        This helps with auto-discovery of what's running on each system
+        """
+        success, output, _ = self._run_journalctl([
+            f"_HOSTNAME={hostname}",
+            f"--since={since}",
+            "--output=json",
+            "-n", "1000"
+        ])
+        
+        if not success:
+            return []
+        
+        services = set()
+        for line in output.split('\n'):
+            if not line.strip():
+                continue
+            try:
+                entry = json.loads(line)
+                unit = entry.get('_SYSTEMD_UNIT', '')
+                if unit and unit.endswith('.service'):
+                    # Extract service name
+                    service = unit.replace('.service', '')
+                    # Filter out common system services, focus on application services
+                    if service not in ['systemd-journald', 'systemd-logind', 'sshd', 'dbus']:
+                        services.add(service)
+            except json.JSONDecodeError:
+                continue
+        
+        return sorted(services)
+
+
+if __name__ == "__main__":
+    import sys
+    
+    monitor = JournalMonitor()
+    
+    # Discover hosts
+    print("Discovering hosts from journal...")
+    hosts = monitor.discover_hosts()
+    print(f"Found {len(hosts)} hosts:")
+    for host in hosts:
+        print(f"  - {host}")
+    
+    # Monitor first host if available
+    if hosts:
+        hostname = hosts[0]
+        print(f"\nMonitoring {hostname}...")
+        data = monitor.collect_all(hostname)
+        
+        print("\n" + "="*60)
+        print(monitor.get_summary(data))
+        print("="*60)
+        
+        # Discover services
+        print(f"\nActive services on {hostname}:")
+        services = monitor.get_active_services(hostname)
+        for svc in services[:10]:
+            print(f"  - {svc}")
+
--- a/module.nix
+++ b/module.nix
@@ -0,0 +1,847 @@
+{ config, lib, pkgs, ... }:
+
+with lib;
+
+let
+  cfg = config.services.macha-autonomous;
+  
+  # Python environment with all dependencies
+  pythonEnv = pkgs.python3.withPackages (ps: with ps; [
+    requests
+    psutil
+    chromadb
+  ]);
+  
+  # Main autonomous system package
+  macha-autonomous = pkgs.writeScriptBin "macha-autonomous" ''
+    #!${pythonEnv}/bin/python3
+    import sys
+    sys.path.insert(0, "${./.}")
+    from orchestrator import main
+    main()
+  '';
+  
+  # Config file
+  configFile = pkgs.writeText "macha-autonomous-config.json" (builtins.toJSON {
+    check_interval = cfg.checkInterval;
+    autonomy_level = cfg.autonomyLevel;
+    ollama_host = cfg.ollamaHost;
+    model = cfg.model;
+    config_repo = cfg.configRepo;
+    config_branch = cfg.configBranch;
+  });
+
+in {
+  options.services.macha-autonomous = {
+    enable = mkEnableOption "Macha autonomous system maintenance";
+    
+    autonomyLevel = mkOption {
+      type = types.enum [ "observe" "suggest" "auto-safe" "auto-full" ];
+      default = "suggest";
+      description = ''
+        Level of autonomy for the system:
+        - observe: Only monitor and log, no actions
+        - suggest: Propose actions, require manual approval
+        - auto-safe: Auto-execute low-risk actions (restarts, cleanup)
+        - auto-full: Full autonomy with safety limits (still requires approval for high-risk)
+      '';
+    };
+    
+    checkInterval = mkOption {
+      type = types.int;
+      default = 300;
+      description = "Interval in seconds between system checks";
+    };
+    
+    ollamaHost = mkOption {
+      type = types.str;
+      default = "http://localhost:11434";
+      description = "Ollama API host";
+    };
+    
+    model = mkOption {
+      type = types.str;
+      default = "llama3.1:70b";
+      description = "LLM model to use for reasoning";
+    };
+    
+    user = mkOption {
+      type = types.str;
+      default = "macha";
+      description = "User to run the autonomous system as";
+    };
+    
+    group = mkOption {
+      type = types.str;
+      default = "macha";
+      description = "Group to run the autonomous system as";
+    };
+    
+    gotifyUrl = mkOption {
+      type = types.str;
+      default = "";
+      example = "http://rhiannon:8181";
+      description = "Gotify server URL for notifications (empty to disable)";
+    };
+    
+    gotifyToken = mkOption {
+      type = types.str;
+      default = "";
+      description = "Gotify application token for notifications";
+    };
+    
+    remoteSystems = mkOption {
+      type = types.listOf types.str;
+      default = [];
+      example = [ "rhiannon" "alexander" ];
+      description = "List of remote NixOS systems to monitor and maintain";
+    };
+
+    configRepo = mkOption {
+      type = types.str;
+      default = if config.programs.nh.flake != null 
+                then config.programs.nh.flake
+                else "git+https://git.coven.systems/lily/nixos-servers";
+      description = "URL of the NixOS configuration repository (auto-detected from programs.nh.flake if available)";
+    };
+
+    configBranch = mkOption {
+      type = types.str;
+      default = "main";
+      description = "Branch of the NixOS configuration repository";
+    };
+  };
+  
+  config = mkIf cfg.enable {
+    # Create user and group
+    users.users.${cfg.user} = {
+      isSystemUser = true;
+      group = cfg.group;
+      uid = 2501;
+      description = "Macha autonomous system maintenance";
+      home = "/var/lib/macha";
+      createHome = true;
+    };
+    
+    users.groups.${cfg.group} = {};
+    
+    # Git configuration for credential storage
+    programs.git = {
+      enable = true;
+      config = {
+        credential.helper = "store";
+      };
+    };
+    
+    # Ollama service for AI inference
+    services.ollama = {
+      enable = true;
+      acceleration = "rocm";
+      host = "0.0.0.0";
+      port = 11434;
+      environmentVariables = {
+        "OLLAMA_DEBUG" = "1";
+        "OLLAMA_KEEP_ALIVE" = "600";
+        "OLLAMA_NEW_ENGINE" = "true";
+        "OLLAMA_CONTEXT_LENGTH" = "131072";
+      };
+      openFirewall = false;  # Keep internal only
+      loadModels = [
+        "qwen3"
+        "gpt-oss"
+        "gemma3"
+        "gpt-oss:20b"
+        "qwen3:4b-instruct-2507-fp16"
+        "qwen3:8b-fp16"
+        "mistral:7b"
+        "chroma/all-minilm-l6-v2-f32:latest"
+      ];
+    };
+    
+    # ChromaDB service for vector storage
+    services.chromadb = {
+      enable = true;
+      port = 8000;
+      dbpath = "/var/lib/chromadb";
+    };
+    
+    # Give the user permissions it needs
+    security.sudo.extraRules = [{
+      users = [ cfg.user ];
+      commands = [
+        # Local system management
+        { command = "${pkgs.systemd}/bin/systemctl restart *"; options = [ "NOPASSWD" ]; }
+        { command = "${pkgs.systemd}/bin/systemctl status *"; options = [ "NOPASSWD" ]; }
+        { command = "${pkgs.systemd}/bin/journalctl *"; options = [ "NOPASSWD" ]; }
+        { command = "${pkgs.nix}/bin/nix-collect-garbage *"; options = [ "NOPASSWD" ]; }
+        # Remote system access (uses existing root SSH keys)
+        { command = "${pkgs.openssh}/bin/ssh *"; options = [ "NOPASSWD" ]; }
+        { command = "${pkgs.openssh}/bin/scp *"; options = [ "NOPASSWD" ]; }
+        { command = "${pkgs.nixos-rebuild}/bin/nixos-rebuild *"; options = [ "NOPASSWD" ]; }
+      ];
+    }];
+    
+    # Config file
+    environment.etc."macha-autonomous/config.json".source = configFile;
+    
+    # State directory and queue directories (world-writable queues for multi-user access)
+    # Using 'z' to set permissions even if directory exists
+    systemd.tmpfiles.rules = [
+      "d /var/lib/macha 0755 ${cfg.user} ${cfg.group} -"
+      "z /var/lib/macha 0755 ${cfg.user} ${cfg.group} -"  # Ensure permissions are set
+      "d /var/lib/macha/queues 0777 ${cfg.user} ${cfg.group} -"
+      "d /var/lib/macha/queues/ollama 0777 ${cfg.user} ${cfg.group} -"
+      "d /var/lib/macha/queues/ollama/pending 0777 ${cfg.user} ${cfg.group} -"
+      "d /var/lib/macha/queues/ollama/processing 0777 ${cfg.user} ${cfg.group} -"
+      "d /var/lib/macha/queues/ollama/completed 0777 ${cfg.user} ${cfg.group} -"
+      "d /var/lib/macha/queues/ollama/failed 0777 ${cfg.user} ${cfg.group} -"
+      "d /var/lib/macha/tool_cache 0777 ${cfg.user} ${cfg.group} -"
+    ];
+    
+    # Systemd service
+    systemd.services.macha-autonomous = {
+      description = "Macha Autonomous System Maintenance";
+      after = [ "network.target" "ollama.service" ];
+      wants = [ "ollama.service" ];
+      wantedBy = [ "multi-user.target" ];
+      
+      serviceConfig = {
+        Type = "simple";
+        User = cfg.user;
+        Group = cfg.group;
+        WorkingDirectory = "/var/lib/macha";
+        ExecStart = "${macha-autonomous}/bin/macha-autonomous --mode continuous --autonomy ${cfg.autonomyLevel} --interval ${toString cfg.checkInterval}";
+        Restart = "on-failure";
+        RestartSec = "30s";
+        
+        # Security hardening
+        PrivateTmp = true;
+        NoNewPrivileges = false;  # Need privileges for sudo
+        ProtectSystem = "strict";
+        ProtectHome = true;
+        ReadWritePaths = [ "/var/lib/macha" "/var/lib/macha/tool_cache" "/var/lib/macha/queues" ];
+        
+        # Resource limits
+        MemoryLimit = "1G";
+        CPUQuota = "50%";
+      };
+      
+      environment = {
+        PYTHONPATH = toString ./.;
+        GOTIFY_URL = cfg.gotifyUrl;
+        GOTIFY_TOKEN = cfg.gotifyToken;
+        CHROMA_ENV_FILE = "";  # Prevent ChromaDB from trying to read .env files
+        ANONYMIZED_TELEMETRY = "False";  # Disable ChromaDB telemetry
+      };
+      
+      path = [ pkgs.git ];  # Make git available for config parsing
+    };
+    
+    # Ollama Queue Worker Service (serializes all Ollama requests)
+    systemd.services.ollama-queue-worker = {
+      description = "Macha Ollama Queue Worker";
+      after = [ "network.target" "ollama.service" ];
+      wants = [ "ollama.service" ];
+      wantedBy = [ "multi-user.target" ];
+      
+      serviceConfig = {
+        Type = "simple";
+        User = cfg.user;
+        Group = cfg.group;
+        WorkingDirectory = "/var/lib/macha";
+        ExecStart = "${pythonEnv}/bin/python3 ${./.}/ollama_worker.py";
+        Restart = "on-failure";
+        RestartSec = "10s";
+        
+        # Security hardening
+        PrivateTmp = true;
+        NoNewPrivileges = true;
+        ProtectSystem = "strict";
+        ProtectHome = true;
+        ReadWritePaths = [ "/var/lib/macha/queues" "/var/lib/macha/tool_cache" ];
+        
+        # Resource limits
+        MemoryLimit = "512M";
+        CPUQuota = "25%";
+      };
+      
+      environment = {
+        PYTHONPATH = toString ./.;
+        CHROMA_ENV_FILE = "";
+        ANONYMIZED_TELEMETRY = "False";
+      };
+    };
+    
+    # CLI tools for manual control and system packages
+    environment.systemPackages = with pkgs; [
+      macha-autonomous
+      # Python packages for ChromaDB
+      python313
+      python313Packages.pip
+      python313Packages.chromadb.pythonModule
+      
+      # Tool to check approval queue
+      (pkgs.writeScriptBin "macha-approve" ''
+        #!${pkgs.bash}/bin/bash
+        if [ "$1" == "list" ]; then
+          sudo -u ${cfg.user} ${pythonEnv}/bin/python3 ${./.}/executor.py queue
+        elif [ "$1" == "discuss" ] && [ -n "$2" ]; then
+          ACTION_ID="$2"
+          echo "==================================================================="
+          echo "Interactive Discussion with Macha about Action #$ACTION_ID"
+          echo "==================================================================="
+          echo ""
+          
+          # Initial explanation
+          sudo -u ${cfg.user} ${pkgs.coreutils}/bin/env CHROMA_ENV_FILE="" ANONYMIZED_TELEMETRY="False" ${pythonEnv}/bin/python3 ${./.}/conversation.py --discuss "$ACTION_ID"
+          
+          echo ""
+          echo "==================================================================="
+          echo "You can now ask follow-up questions about this action."
+          echo "Type 'approve' to approve it, 'reject' to reject it, or 'exit' to quit."
+          echo "==================================================================="
+          
+          # Interactive loop
+          while true; do
+            echo ""
+            echo -n "You: "
+            read -r USER_INPUT
+            
+            # Check for special commands
+            if [ "$USER_INPUT" = "exit" ] || [ "$USER_INPUT" = "quit" ] || [ -z "$USER_INPUT" ]; then
+              echo "Exiting discussion."
+              break
+            elif [ "$USER_INPUT" = "approve" ]; then
+              echo "Approving action #$ACTION_ID..."
+              sudo -u ${cfg.user} ${pythonEnv}/bin/python3 ${./.}/executor.py approve "$ACTION_ID"
+              break
+            elif [ "$USER_INPUT" = "reject" ]; then
+              echo "Rejecting and removing action #$ACTION_ID from queue..."
+              sudo -u ${cfg.user} ${pythonEnv}/bin/python3 ${./.}/executor.py reject "$ACTION_ID"
+              break
+            fi
+            
+            # Ask Macha the follow-up question in context of the action
+            echo ""
+            echo -n "Macha: "
+            sudo -u ${cfg.user} ${pkgs.coreutils}/bin/env CHROMA_ENV_FILE="" ANONYMIZED_TELEMETRY="False" ${pythonEnv}/bin/python3 ${./.}/conversation.py --discuss "$ACTION_ID" --follow-up "$USER_INPUT"
+            echo ""
+          done
+        elif [ "$1" == "approve" ] && [ -n "$2" ]; then
+          sudo -u ${cfg.user} ${pythonEnv}/bin/python3 ${./.}/executor.py approve "$2"
+        elif [ "$1" == "reject" ] && [ -n "$2" ]; then
+          sudo -u ${cfg.user} ${pythonEnv}/bin/python3 ${./.}/executor.py reject "$2"
+        else
+          echo "Usage:"
+          echo "  macha-approve list          - Show pending actions"
+          echo "  macha-approve discuss <N>   - Discuss action number N with Macha (interactive)"
+          echo "  macha-approve approve <N>   - Approve action number N"
+          echo "  macha-approve reject <N>    - Reject and remove action number N from queue"
+        fi
+      '')
+      
+      # Tool to run manual check
+      (pkgs.writeScriptBin "macha-check" ''
+        #!${pkgs.bash}/bin/bash
+        sudo -u ${cfg.user} sh -c 'cd /var/lib/macha && CHROMA_ENV_FILE="" ANONYMIZED_TELEMETRY="False" ${macha-autonomous}/bin/macha-autonomous --mode once --autonomy ${cfg.autonomyLevel}'
+      '')
+      
+      # Tool to view logs
+      (pkgs.writeScriptBin "macha-logs" ''
+        #!${pkgs.bash}/bin/bash
+        case "$1" in
+          orchestrator)
+            sudo tail -f /var/lib/macha/orchestrator.log
+            ;;
+          decisions)
+            sudo tail -f /var/lib/macha/decisions.jsonl
+            ;;
+          actions)
+            sudo tail -f /var/lib/macha/actions.jsonl
+            ;;
+          service)
+            journalctl -u macha-autonomous.service -f
+            ;;
+          *)
+            echo "Usage: macha-logs [orchestrator|decisions|actions|service]"
+            ;;
+        esac
+      '')
+      
+      # Tool to send test notification
+      (pkgs.writeScriptBin "macha-notify" ''
+        #!${pkgs.bash}/bin/bash
+        if [ -z "$1" ] || [ -z "$2" ]; then
+          echo "Usage: macha-notify <title> <message> [priority]"
+          echo "Example: macha-notify 'Test' 'This is a test' 5"
+          echo "Priorities: 2 (low), 5 (medium), 8 (high)"
+          exit 1
+        fi
+        
+        export GOTIFY_URL="${cfg.gotifyUrl}"
+        export GOTIFY_TOKEN="${cfg.gotifyToken}"
+        
+        ${pythonEnv}/bin/python3 ${./.}/notifier.py "$1" "$2" "''${3:-5}"
+      '')
+      
+      # Tool to query config files
+      (pkgs.writeScriptBin "macha-configs" ''
+        #!${pkgs.bash}/bin/bash
+        export PYTHONPATH=${toString ./.}
+        export CHROMA_ENV_FILE=""
+        export ANONYMIZED_TELEMETRY="False"
+        
+        if [ $# -eq 0 ]; then
+          echo "Usage: macha-configs <search-query> [system-name]"
+          echo "Examples:"
+          echo "  macha-configs gotify"
+          echo "  macha-configs 'journald configuration'"
+          echo "  macha-configs ollama macha.coven.systems"
+          exit 1
+        fi
+        
+        QUERY="$1"
+        SYSTEM="''${2:-}"
+        
+        ${pythonEnv}/bin/python3 -c "
+from context_db import ContextDatabase
+import sys
+
+db = ContextDatabase()
+query = sys.argv[1]
+system = sys.argv[2] if len(sys.argv) > 2 else None
+
+print(f'Searching for: {query}')
+if system:
+    print(f'Filtered to system: {system}')
+print('='*60)
+
+configs = db.query_config_files(query, system=system, n_results=5)
+
+if not configs:
+    print('No matching configuration files found.')
+else:
+    for i, cfg in enumerate(configs, 1):
+        print(f\"\\n{i}. {cfg['path']} (relevance: {cfg['relevance']:.1%})\")
+        print(f\"   Category: {cfg['metadata']['category']}\")
+        print('   Preview:')
+        preview = cfg['content'][:300].replace('\\n', '\\n   ')
+        print(f'   {preview}')
+        if len(cfg['content']) > 300:
+            print('   ... (use macha-configs-read to see full file)')
+        " "$QUERY" "$SYSTEM"
+      '')
+      
+      # Interactive chat tool (runs as invoking user, not as macha-autonomous)
+      (pkgs.writeScriptBin "macha-chat" ''
+        #!${pkgs.bash}/bin/bash
+        export PYTHONPATH=${toString ./.}
+        export CHROMA_ENV_FILE=""
+        export ANONYMIZED_TELEMETRY="False"
+        
+        # Run as the current user, not as macha-autonomous
+        # This allows the chat to execute privileged commands with the user's permissions
+        ${pythonEnv}/bin/python3 ${./.}/chat.py
+      '')
+      
+      # Tool to read full config file
+      (pkgs.writeScriptBin "macha-configs-read" ''
+        #!${pkgs.bash}/bin/bash
+        export PYTHONPATH=${toString ./.}
+        export CHROMA_ENV_FILE=""
+        export ANONYMIZED_TELEMETRY="False"
+        
+        if [ $# -eq 0 ]; then
+          echo "Usage: macha-configs-read <file-path>"
+          echo "Example: macha-configs-read apps/gotify.nix"
+          exit 1
+        fi
+        
+        ${pythonEnv}/bin/python3 -c "
+from context_db import ContextDatabase
+import sys
+
+db = ContextDatabase()
+file_path = sys.argv[1]
+
+cfg = db.get_config_file(file_path)
+
+if not cfg:
+    print(f'Config file not found: {file_path}')
+    sys.exit(1)
+
+print(f'File: {cfg[\"path\"]}')
+print(f'Category: {cfg[\"metadata\"][\"category\"]}')
+print('='*60)
+print(cfg['content'])
+        " "$1"
+      '')
+      
+      # Tool to view system registry
+      (pkgs.writeScriptBin "macha-systems" ''
+        #!${pkgs.bash}/bin/bash
+        export PYTHONPATH=${toString ./.}
+        export CHROMA_ENV_FILE=""
+        export ANONYMIZED_TELEMETRY="False"
+        ${pythonEnv}/bin/python3 -c "
+from context_db import ContextDatabase
+import json
+
+db = ContextDatabase()
+systems = db.get_all_systems()
+
+print('Registered Systems:')
+print('='*60)
+for system in systems:
+    os_type = system.get('os_type', 'unknown').upper()
+    print(f\"\\n{system['hostname']} ({system['type']}) [{os_type}]\")
+    print(f\"  Config Repo: {system.get('config_repo') or '(not set)'}\")
+    print(f\"  Branch: {system.get('config_branch', 'unknown')}\")
+    if system.get('services'):
+        print(f\"  Services: {', '.join(system['services'][:10])}\")
+        if len(system['services']) > 10:
+            print(f\"    ... and {len(system['services']) - 10} more\")
+    if system.get('capabilities'):
+        print(f\"  Capabilities: {', '.join(system['capabilities'])}\")
+print('='*60)
+        "
+      '')
+      
+      # Tool to ask Macha questions
+      (pkgs.writeScriptBin "macha-ask" ''
+        #!${pkgs.bash}/bin/bash
+        if [ $# -eq 0 ]; then
+          echo "Usage: macha-ask <your question>"
+          echo "Example: macha-ask Why did you recommend restarting that service?"
+          exit 1
+        fi
+        sudo -u ${cfg.user} ${pkgs.coreutils}/bin/env CHROMA_ENV_FILE="" ANONYMIZED_TELEMETRY="False" ${pythonEnv}/bin/python3 ${./.}/conversation.py "$@"
+      '')
+      
+      # Issue tracking CLI
+      (pkgs.writeScriptBin "macha-issues" ''
+        #!${pythonEnv}/bin/python3
+        import sys
+        import os
+        os.environ["CHROMA_ENV_FILE"] = ""
+        os.environ["ANONYMIZED_TELEMETRY"] = "False"
+        sys.path.insert(0, "${./.}")
+        
+        from context_db import ContextDatabase
+        from issue_tracker import IssueTracker
+        from datetime import datetime
+        import json
+        
+        db = ContextDatabase()
+        tracker = IssueTracker(db)
+        
+        def list_issues(show_all=False):
+            """List issues"""
+            if show_all:
+                issues = tracker.list_issues()
+            else:
+                issues = tracker.list_issues(status="open")
+            
+            if not issues:
+                print("No issues found")
+                return
+            
+            print("="*70)
+            print(f"ISSUES: {len(issues)}")
+            print("="*70)
+            
+            for issue in issues:
+                issue_id = issue['issue_id'][:8]
+                age_hours = (datetime.utcnow() - datetime.fromisoformat(issue['created_at'])).total_seconds() / 3600
+                inv_count = len(issue.get('investigations', []))
+                action_count = len(issue.get('actions', []))
+                
+                print(f"\n[{issue_id}] {issue['title']}")
+                print(f"  Host: {issue['hostname']}")
+                print(f"  Status: {issue['status'].upper()} | Severity: {issue['severity'].upper()}")
+                print(f"  Age: {age_hours:.1f}h | Activity: {inv_count} investigations, {action_count} actions")
+                print(f"  Source: {issue['source']}")
+                if issue.get('resolution'):
+                    print(f"  Resolution: {issue['resolution']}")
+        
+        def show_issue(issue_id):
+            """Show detailed issue information"""
+            # Find issue by partial ID
+            all_issues = tracker.list_issues()
+            matching = [i for i in all_issues if i['issue_id'].startswith(issue_id)]
+            
+            if not matching:
+                print(f"Issue {issue_id} not found")
+                return
+            
+            issue = matching[0]
+            full_id = issue['issue_id']
+            
+            print("="*70)
+            print(f"ISSUE: {issue['title']}")
+            print("="*70)
+            print(f"ID: {full_id}")
+            print(f"Host: {issue['hostname']}")
+            print(f"Status: {issue['status'].upper()}")
+            print(f"Severity: {issue['severity'].upper()}")
+            print(f"Source: {issue['source']}")
+            print(f"Created: {issue['created_at']}")
+            print(f"Updated: {issue['updated_at']}")
+            print(f"\nDescription:\n{issue['description']}")
+            
+            investigations = issue.get('investigations', [])
+            if investigations:
+                print(f"\n{'─'*70}")
+                print(f"INVESTIGATIONS ({len(investigations)}):")
+                for i, inv in enumerate(investigations, 1):
+                    print(f"\n  [{i}] {inv.get('timestamp', 'N/A')}")
+                    print(f"  Diagnosis: {inv.get('diagnosis', 'N/A')}")
+                    print(f"  Commands: {', '.join(inv.get('commands', []))}")
+                    print(f"  Success: {inv.get('success', False)}")
+                    if inv.get('output'):
+                        print(f"  Output: {inv['output'][:200]}...")
+            
+            actions = issue.get('actions', [])
+            if actions:
+                print(f"\n{'─'*70}")
+                print(f"ACTIONS ({len(actions)}):")
+                for i, action in enumerate(actions, 1):
+                    print(f"\n  [{i}] {action.get('timestamp', 'N/A')}")
+                    print(f"  Action: {action.get('proposed_action', 'N/A')}")
+                    print(f"  Risk: {action.get('risk_level', 'N/A').upper()}")
+                    print(f"  Commands: {', '.join(action.get('commands', []))}")
+                    print(f"  Success: {action.get('success', False)}")
+            
+            if issue.get('resolution'):
+                print(f"\n{'─'*70}")
+                print(f"RESOLUTION:")
+                print(f"  {issue['resolution']}")
+            
+            print("="*70)
+        
+        def create_issue(description):
+            """Create a new issue manually"""
+            import socket
+            hostname = f"{socket.gethostname()}.coven.systems"
+            
+            issue_id = tracker.create_issue(
+                hostname=hostname,
+                title=description[:100],
+                description=description,
+                severity="medium",
+                source="user-reported"
+            )
+            
+            print(f"Created issue: {issue_id[:8]}")
+            print(f"Title: {description[:100]}")
+        
+        def resolve_issue(issue_id, resolution="Manually resolved"):
+            """Mark an issue as resolved"""
+            # Find issue by partial ID
+            all_issues = tracker.list_issues()
+            matching = [i for i in all_issues if i['issue_id'].startswith(issue_id)]
+            
+            if not matching:
+                print(f"Issue {issue_id} not found")
+                return
+            
+            full_id = matching[0]['issue_id']
+            success = tracker.resolve_issue(full_id, resolution)
+            
+            if success:
+                print(f"Resolved issue {issue_id[:8]}")
+            else:
+                print(f"Failed to resolve issue {issue_id}")
+        
+        def close_issue(issue_id):
+            """Archive a resolved issue"""
+            # Find issue by partial ID
+            all_issues = tracker.list_issues()
+            matching = [i for i in all_issues if i['issue_id'].startswith(issue_id)]
+            
+            if not matching:
+                print(f"Issue {issue_id} not found")
+                return
+            
+            full_id = matching[0]['issue_id']
+            
+            if matching[0]['status'] != 'resolved':
+                print(f"Issue {issue_id} must be resolved before closing")
+                print(f"Use: macha-issues resolve {issue_id}")
+                return
+            
+            success = tracker.close_issue(full_id)
+            
+            if success:
+                print(f"Closed and archived issue {issue_id[:8]}")
+            else:
+                print(f"Failed to close issue {issue_id}")
+        
+        # Main CLI
+        if len(sys.argv) < 2:
+            print("Usage: macha-issues <command> [options]")
+            print("")
+            print("Commands:")
+            print("  list               List open issues")
+            print("  list --all         List all issues (including resolved/closed)")
+            print("  show <id>          Show detailed issue information")
+            print("  create <desc>      Create a new issue manually")
+            print("  resolve <id>       Mark issue as resolved")
+            print("  close <id>         Archive a resolved issue")
+            sys.exit(1)
+        
+        command = sys.argv[1]
+        
+        if command == "list":
+            show_all = "--all" in sys.argv
+            list_issues(show_all)
+        elif command == "show" and len(sys.argv) >= 3:
+            show_issue(sys.argv[2])
+        elif command == "create" and len(sys.argv) >= 3:
+            description = " ".join(sys.argv[2:])
+            create_issue(description)
+        elif command == "resolve" and len(sys.argv) >= 3:
+            resolution = " ".join(sys.argv[3:]) if len(sys.argv) > 3 else "Manually resolved"
+            resolve_issue(sys.argv[2], resolution)
+        elif command == "close" and len(sys.argv) >= 3:
+            close_issue(sys.argv[2])
+        else:
+            print(f"Unknown command: {command}")
+            sys.exit(1)
+      '')
+      
+      # Knowledge base CLI
+      (pkgs.writeScriptBin "macha-knowledge" ''
+        #!${pythonEnv}/bin/python3
+        import sys
+        import os
+        os.environ["CHROMA_ENV_FILE"] = ""
+        os.environ["ANONYMIZED_TELEMETRY"] = "False"
+        sys.path.insert(0, "${./.}")
+        
+        from context_db import ContextDatabase
+        
+        db = ContextDatabase()
+        
+        def list_topics(category=None):
+            """List all knowledge topics"""
+            topics = db.list_knowledge_topics(category)
+            if not topics:
+                print("No knowledge topics found.")
+                return
+            
+            print(f"{'='*70}")
+            if category:
+                print(f"KNOWLEDGE TOPICS ({category.upper()}):")
+            else:
+                print(f"KNOWLEDGE TOPICS:")
+            print(f"{'='*70}")
+            
+            for topic in topics:
+                print(f"  • {topic}")
+            
+            print(f"{'='*70}")
+        
+        def show_topic(topic):
+            """Show all knowledge for a topic"""
+            items = db.get_knowledge_by_topic(topic)
+            if not items:
+                print(f"No knowledge found for topic: {topic}")
+                return
+            
+            print(f"{'='*70}")
+            print(f"KNOWLEDGE: {topic}")
+            print(f"{'='*70}\n")
+            
+            for item in items:
+                print(f"ID: {item['id'][:8]}...")
+                print(f"Category: {item['category']}")
+                print(f"Source: {item['source']}")
+                print(f"Confidence: {item['confidence']}")
+                print(f"Created: {item['created_at']}")
+                print(f"Times Referenced: {item['times_referenced']}")
+                if item.get('tags'):
+                    print(f"Tags: {', '.join(item['tags'])}")
+                print(f"\nKnowledge:")
+                print(f"  {item['knowledge']}\n")
+                print(f"{'-'*70}\n")
+        
+        def search_knowledge(query, category=None):
+            """Search knowledge base"""
+            items = db.query_knowledge(query, category=category, limit=10)
+            if not items:
+                print(f"No knowledge found matching: {query}")
+                return
+            
+            print(f"{'='*70}")
+            print(f"SEARCH RESULTS: {query}")
+            if category:
+                print(f"Category Filter: {category}")
+            print(f"{'='*70}\n")
+            
+            for i, item in enumerate(items, 1):
+                print(f"[{i}] {item['topic']}")
+                print(f"    Category: {item['category']} | Confidence: {item['confidence']}")
+                print(f"    {item['knowledge'][:150]}...")
+                print()
+        
+        def add_knowledge(topic, knowledge, category="general"):
+            """Add new knowledge"""
+            kid = db.store_knowledge(
+                topic=topic,
+                knowledge=knowledge,
+                category=category,
+                source="user-provided",
+                confidence="high"
+            )
+            if kid:
+                print(f"✓ Added knowledge for topic: {topic}")
+                print(f"  ID: {kid[:8]}...")
+            else:
+                print(f"✗ Failed to add knowledge")
+        
+        def seed_initial():
+            """Seed initial knowledge"""
+            print("Seeding initial knowledge from seed_knowledge.py...")
+            exec(open("${./.}/seed_knowledge.py").read())
+        
+        # Main CLI
+        if len(sys.argv) < 2:
+            print("Usage: macha-knowledge <command> [options]")
+            print("")
+            print("Commands:")
+            print("  list                 List all knowledge topics")
+            print("  list <category>      List topics in category")
+            print("  show <topic>         Show all knowledge for a topic")
+            print("  search <query>       Search knowledge base")
+            print("  search <query> <cat> Search in specific category")
+            print("  add <topic> <text>   Add new knowledge")
+            print("  seed                 Seed initial knowledge")
+            print("")
+            print("Categories: command, pattern, troubleshooting, performance, general")
+            sys.exit(1)
+        
+        command = sys.argv[1]
+        
+        if command == "list":
+            category = sys.argv[2] if len(sys.argv) >= 3 else None
+            list_topics(category)
+        elif command == "show" and len(sys.argv) >= 3:
+            show_topic(sys.argv[2])
+        elif command == "search" and len(sys.argv) >= 3:
+            query = sys.argv[2]
+            category = sys.argv[3] if len(sys.argv) >= 4 else None
+            search_knowledge(query, category)
+        elif command == "add" and len(sys.argv) >= 4:
+            topic = sys.argv[2]
+            knowledge = " ".join(sys.argv[3:])
+            add_knowledge(topic, knowledge)
+        elif command == "seed":
+            seed_initial()
+        else:
+            print(f"Unknown command: {command}")
+            sys.exit(1)
+      '')
+    ];
+  };
+}
--- a/monitor.py
+++ b/monitor.py
@@ -0,0 +1,291 @@
+#!/usr/bin/env python3
+"""
+System Monitor - Collects health data from Macha
+"""
+
+import json
+import subprocess
+import psutil
+import time
+from datetime import datetime
+from pathlib import Path
+from typing import Dict, List, Any
+
+
+class SystemMonitor:
+    """Monitors system health and collects diagnostic data"""
+    
+    def __init__(self, state_dir: Path = Path("/var/lib/macha")):
+        self.state_dir = state_dir
+        self.state_dir.mkdir(parents=True, exist_ok=True)
+        
+    def collect_all(self) -> Dict[str, Any]:
+        """Collect all system health data"""
+        return {
+            "timestamp": datetime.now().isoformat(),
+            "systemd": self.check_systemd_services(),
+            "resources": self.check_resources(),
+            "disk": self.check_disk_usage(),
+            "logs": self.check_recent_errors(),
+            "nixos": self.check_nixos_status(),
+            "network": self.check_network(),
+            "boot": self.check_boot_status(),
+        }
+    
+    def check_systemd_services(self) -> Dict[str, Any]:
+        """Check status of all systemd services"""
+        try:
+            # Get failed services
+            result = subprocess.run(
+                ["systemctl", "--failed", "--no-pager", "--output=json"],
+                capture_output=True,
+                text=True,
+                timeout=10
+            )
+            
+            failed_services = []
+            if result.returncode == 0 and result.stdout:
+                try:
+                    failed_services = json.loads(result.stdout)
+                except json.JSONDecodeError:
+                    pass
+            
+            # Get all services status
+            result = subprocess.run(
+                ["systemctl", "list-units", "--type=service", "--no-pager", "--output=json"],
+                capture_output=True,
+                text=True,
+                timeout=10
+            )
+            
+            all_services = []
+            if result.returncode == 0 and result.stdout:
+                try:
+                    all_services = json.loads(result.stdout)
+                except json.JSONDecodeError:
+                    pass
+            
+            return {
+                "failed_count": len(failed_services),
+                "failed_services": failed_services,
+                "total_services": len(all_services),
+                "active_services": [s for s in all_services if s.get("active") == "active"],
+            }
+        except Exception as e:
+            return {"error": str(e)}
+    
+    def check_resources(self) -> Dict[str, Any]:
+        """Check CPU, RAM, and system resources"""
+        try:
+            cpu_percent = psutil.cpu_percent(interval=1)
+            memory = psutil.virtual_memory()
+            load_avg = psutil.getloadavg()
+            
+            return {
+                "cpu_percent": cpu_percent,
+                "cpu_count": psutil.cpu_count(),
+                "memory_percent": memory.percent,
+                "memory_available_gb": memory.available / (1024**3),
+                "memory_total_gb": memory.total / (1024**3),
+                "load_average": {
+                    "1min": load_avg[0],
+                    "5min": load_avg[1],
+                    "15min": load_avg[2],
+                },
+                "swap_percent": psutil.swap_memory().percent,
+            }
+        except Exception as e:
+            return {"error": str(e)}
+    
+    def check_disk_usage(self) -> Dict[str, Any]:
+        """Check disk usage for all mounted filesystems"""
+        try:
+            partitions = psutil.disk_partitions()
+            disk_info = []
+            
+            for partition in partitions:
+                try:
+                    usage = psutil.disk_usage(partition.mountpoint)
+                    disk_info.append({
+                        "device": partition.device,
+                        "mountpoint": partition.mountpoint,
+                        "fstype": partition.fstype,
+                        "percent_used": usage.percent,
+                        "total_gb": usage.total / (1024**3),
+                        "used_gb": usage.used / (1024**3),
+                        "free_gb": usage.free / (1024**3),
+                    })
+                except PermissionError:
+                    continue
+            
+            return {"partitions": disk_info}
+        except Exception as e:
+            return {"error": str(e)}
+    
+    def check_recent_errors(self) -> Dict[str, Any]:
+        """Check recent system logs for errors"""
+        try:
+            # Get errors from the last hour
+            result = subprocess.run(
+                ["journalctl", "-p", "err", "--since", "1 hour ago", "--no-pager", "-o", "json"],
+                capture_output=True,
+                text=True,
+                timeout=10
+            )
+            
+            errors = []
+            if result.returncode == 0 and result.stdout:
+                for line in result.stdout.strip().split('\n'):
+                    if line:
+                        try:
+                            errors.append(json.loads(line))
+                        except json.JSONDecodeError:
+                            continue
+            
+            return {
+                "error_count_1h": len(errors),
+                "recent_errors": errors[-50:],  # Last 50 errors
+            }
+        except Exception as e:
+            return {"error": str(e)}
+    
+    def check_nixos_status(self) -> Dict[str, Any]:
+        """Check NixOS generation and system info"""
+        try:
+            # Get current generation
+            result = subprocess.run(
+                ["nixos-version"],
+                capture_output=True,
+                text=True,
+                timeout=5
+            )
+            version = result.stdout.strip() if result.returncode == 0 else "unknown"
+            
+            # Get generation list
+            result = subprocess.run(
+                ["nix-env", "--list-generations", "-p", "/nix/var/nix/profiles/system"],
+                capture_output=True,
+                text=True,
+                timeout=10
+            )
+            
+            generations = result.stdout.strip() if result.returncode == 0 else ""
+            
+            return {
+                "version": version,
+                "generations": generations,
+                "nix_store_size": self._get_nix_store_size(),
+            }
+        except Exception as e:
+            return {"error": str(e)}
+    
+    def _get_nix_store_size(self) -> str:
+        """Get Nix store size"""
+        try:
+            result = subprocess.run(
+                ["du", "-sh", "/nix/store"],
+                capture_output=True,
+                text=True,
+                timeout=30
+            )
+            if result.returncode == 0:
+                return result.stdout.split()[0]
+        except:
+            pass
+        return "unknown"
+    
+    def check_network(self) -> Dict[str, Any]:
+        """Check network connectivity"""
+        try:
+            # Check if we can reach the internet
+            result = subprocess.run(
+                ["ping", "-c", "1", "-W", "2", "8.8.8.8"],
+                capture_output=True,
+                timeout=5
+            )
+            internet_up = result.returncode == 0
+            
+            # Get network interfaces
+            interfaces = {}
+            for iface, addrs in psutil.net_if_addrs().items():
+                interfaces[iface] = [
+                    {"family": addr.family.name, "address": addr.address}
+                    for addr in addrs
+                ]
+            
+            return {
+                "internet_reachable": internet_up,
+                "interfaces": interfaces,
+            }
+        except Exception as e:
+            return {"error": str(e)}
+    
+    def check_boot_status(self) -> Dict[str, Any]:
+        """Check boot and uptime information"""
+        try:
+            boot_time = datetime.fromtimestamp(psutil.boot_time())
+            uptime_seconds = time.time() - psutil.boot_time()
+            
+            return {
+                "boot_time": boot_time.isoformat(),
+                "uptime_seconds": uptime_seconds,
+                "uptime_hours": uptime_seconds / 3600,
+            }
+        except Exception as e:
+            return {"error": str(e)}
+    
+    def save_snapshot(self, data: Dict[str, Any]):
+        """Save a snapshot of system state"""
+        snapshot_file = self.state_dir / f"snapshot_{int(time.time())}.json"
+        with open(snapshot_file, 'w') as f:
+            json.dump(data, f, indent=2)
+        
+        # Keep only last 100 snapshots
+        snapshots = sorted(self.state_dir.glob("snapshot_*.json"))
+        for old_snapshot in snapshots[:-100]:
+            old_snapshot.unlink()
+    
+    def get_summary(self, data: Dict[str, Any]) -> str:
+        """Generate human-readable summary of system state"""
+        lines = []
+        lines.append(f"=== System Health Summary ({data['timestamp']}) ===\n")
+        
+        # Resources
+        res = data.get("resources", {})
+        lines.append(f"CPU: {res.get('cpu_percent', 0):.1f}%")
+        lines.append(f"Memory: {res.get('memory_percent', 0):.1f}% ({res.get('memory_available_gb', 0):.1f}GB free)")
+        lines.append(f"Load: {res.get('load_average', {}).get('1min', 0):.2f}")
+        
+        # Disk
+        disk = data.get("disk", {})
+        for part in disk.get("partitions", [])[:5]:  # Top 5 partitions
+            lines.append(f"Disk {part['mountpoint']}: {part['percent_used']:.1f}% used ({part['free_gb']:.1f}GB free)")
+        
+        # Systemd
+        systemd = data.get("systemd", {})
+        failed = systemd.get("failed_count", 0)
+        if failed > 0:
+            lines.append(f"\n⚠️  WARNING: {failed} failed services!")
+            for svc in systemd.get("failed_services", [])[:5]:
+                lines.append(f"  - {svc.get('unit', 'unknown')}")
+        
+        # Errors
+        logs = data.get("logs", {})
+        error_count = logs.get("error_count_1h", 0)
+        if error_count > 0:
+            lines.append(f"\n⚠️  {error_count} errors in last hour")
+        
+        # Network
+        net = data.get("network", {})
+        if not net.get("internet_reachable", True):
+            lines.append("\n⚠️  WARNING: No internet connectivity!")
+        
+        return "\n".join(lines)
+
+
+if __name__ == "__main__":
+    monitor = SystemMonitor()
+    data = monitor.collect_all()
+    monitor.save_snapshot(data)
+    print(monitor.get_summary(data))
+    print(f"\nFull data saved to {monitor.state_dir}")
--- a/notifier.py
+++ b/notifier.py
@@ -0,0 +1,248 @@
+#!/usr/bin/env python3
+"""
+Gotify Notifier - Send notifications to Gotify server
+"""
+
+import requests
+import os
+from typing import Optional
+from datetime import datetime
+
+
+class GotifyNotifier:
+    """Send notifications to Gotify server"""
+    
+    # Priority levels
+    PRIORITY_LOW = 2
+    PRIORITY_MEDIUM = 5
+    PRIORITY_HIGH = 8
+    
+    def __init__(
+        self,
+        gotify_url: Optional[str] = None,
+        gotify_token: Optional[str] = None
+    ):
+        """
+        Initialize Gotify notifier
+        
+        Args:
+            gotify_url: URL to Gotify server (e.g. http://rhiannon:8181)
+            gotify_token: Application token from Gotify
+        """
+        self.gotify_url = gotify_url or os.environ.get("GOTIFY_URL", "")
+        self.gotify_token = gotify_token or os.environ.get("GOTIFY_TOKEN", "")
+        self.enabled = bool(self.gotify_url and self.gotify_token)
+        
+    def send(
+        self,
+        title: str,
+        message: str,
+        priority: int = PRIORITY_MEDIUM,
+        extras: Optional[dict] = None
+    ) -> bool:
+        """
+        Send a notification to Gotify
+        
+        Args:
+            title: Notification title
+            message: Notification message
+            priority: Priority level (2=low, 5=medium, 8=high)
+            extras: Optional extra data
+            
+        Returns:
+            True if successful, False otherwise
+        """
+        if not self.enabled:
+            return False
+            
+        try:
+            url = f"{self.gotify_url}/message"
+            headers = {
+                "Authorization": f"Bearer {self.gotify_token}",
+                "Content-Type": "application/json"
+            }
+            
+            data = {
+                "title": title,
+                "message": message,
+                "priority": priority,
+            }
+            
+            if extras:
+                data["extras"] = extras
+            
+            response = requests.post(
+                url,
+                json=data,
+                headers=headers,
+                timeout=10
+            )
+            
+            return response.status_code == 200
+            
+        except Exception as e:
+            # Fail silently - don't crash if Gotify is unavailable
+            print(f"Warning: Failed to send Gotify notification: {e}")
+            return False
+    
+    def notify_critical_issue(self, issue_description: str, details: str = ""):
+        """Send high-priority notification for critical issues"""
+        message = f"⚠️ Critical Issue Detected\n\n{issue_description}"
+        if details:
+            message += f"\n\nDetails:\n{details}"
+        
+        return self.send(
+            title="🚨 Macha: Critical Issue",
+            message=message,
+            priority=self.PRIORITY_HIGH
+        )
+    
+    def notify_issue_created(self, issue_id: str, title: str, severity: str):
+        """Send notification when a new issue is created"""
+        severity_icons = {
+            "low": "ℹ️",
+            "medium": "⚠️",
+            "high": "🚨",
+            "critical": "🔴"
+        }
+        icon = severity_icons.get(severity, "⚠️")
+        
+        priority_map = {
+            "low": self.PRIORITY_LOW,
+            "medium": self.PRIORITY_MEDIUM,
+            "high": self.PRIORITY_HIGH,
+            "critical": self.PRIORITY_HIGH
+        }
+        priority = priority_map.get(severity, self.PRIORITY_MEDIUM)
+        
+        message = f"{icon} New Issue Tracked\n\nID: {issue_id}\nSeverity: {severity.upper()}\n\n{title}"
+        
+        return self.send(
+            title="📋 Macha: Issue Created",
+            message=message,
+            priority=priority
+        )
+    
+    def notify_action_queued(self, action_description: str, risk_level: str):
+        """Send notification when action is queued for approval"""
+        emoji = "⚠️" if risk_level == "high" else "ℹ️"
+        message = (
+            f"{emoji} Action Queued for Approval\n\n"
+            f"Action: {action_description}\n"
+            f"Risk Level: {risk_level}\n\n"
+            f"Use 'macha-approve list' to review"
+        )
+        
+        priority = self.PRIORITY_HIGH if risk_level == "high" else self.PRIORITY_MEDIUM
+        
+        return self.send(
+            title="📋 Macha: Action Needs Approval",
+            message=message,
+            priority=priority
+        )
+    
+    def notify_action_executed(self, action_description: str, success: bool, output: str = ""):
+        """Send notification when action is executed"""
+        if success:
+            emoji = "✅"
+            title_prefix = "Success"
+        else:
+            emoji = "❌"
+            title_prefix = "Failed"
+        
+        message = f"{emoji} Action {title_prefix}\n\n{action_description}"
+        if output:
+            message += f"\n\nOutput:\n{output[:500]}"  # Limit output length
+        
+        priority = self.PRIORITY_HIGH if not success else self.PRIORITY_LOW
+        
+        return self.send(
+            title=f"{emoji} Macha: Action {title_prefix}",
+            message=message,
+            priority=priority
+        )
+    
+    def notify_service_failure(self, service_name: str, details: str = ""):
+        """Send notification for service failures"""
+        message = f"🔴 Service Failed: {service_name}"
+        if details:
+            message += f"\n\nDetails:\n{details}"
+        
+        return self.send(
+            title="🔴 Macha: Service Failure",
+            message=message,
+            priority=self.PRIORITY_HIGH
+        )
+    
+    def notify_health_summary(self, summary: str, status: str):
+        """Send periodic health summary"""
+        emoji = {
+            "healthy": "✅",
+            "attention_needed": "⚠️",
+            "intervention_required": "🚨"
+        }.get(status, "ℹ️")
+        
+        priority = {
+            "healthy": self.PRIORITY_LOW,
+            "attention_needed": self.PRIORITY_MEDIUM,
+            "intervention_required": self.PRIORITY_HIGH
+        }.get(status, self.PRIORITY_MEDIUM)
+        
+        return self.send(
+            title=f"{emoji} Macha: Health Check",
+            message=summary,
+            priority=priority
+        )
+    
+    def send_system_discovered(
+        self,
+        hostname: str,
+        os_type: str,
+        role: str,
+        services_count: int
+    ):
+        """Send notification when a new system is discovered"""
+        message = (
+            f"🔍 New System Auto-Discovered\n\n"
+            f"Hostname: {hostname}\n"
+            f"OS: {os_type.upper()}\n"
+            f"Role: {role}\n"
+            f"Services: {services_count} detected\n\n"
+            f"System has been registered and analyzed.\n"
+            f"Use 'macha-systems' to view all registered systems."
+        )
+        
+        return self.send(
+            title="🌐 Macha: New System Discovered",
+            message=message,
+            priority=self.PRIORITY_MEDIUM
+        )
+
+
+if __name__ == "__main__":
+    import sys
+    
+    # Test the notifier
+    if len(sys.argv) < 3:
+        print("Usage: notifier.py <title> <message> [priority]")
+        print("Example: notifier.py 'Test' 'This is a test message' 5")
+        sys.exit(1)
+    
+    title = sys.argv[1]
+    message = sys.argv[2]
+    priority = int(sys.argv[3]) if len(sys.argv) > 3 else GotifyNotifier.PRIORITY_MEDIUM
+    
+    notifier = GotifyNotifier()
+    
+    if not notifier.enabled:
+        print("Error: Gotify not configured (GOTIFY_URL and GOTIFY_TOKEN required)")
+        sys.exit(1)
+    
+    success = notifier.send(title, message, priority)
+    
+    if success:
+        print("✅ Notification sent successfully")
+    else:
+        print("❌ Failed to send notification")
+        sys.exit(1)
+
--- a/ollama_queue.py
+++ b/ollama_queue.py
@@ -0,0 +1,238 @@
+#!/usr/bin/env python3
+"""
+Ollama Queue Handler - Serializes all LLM requests to prevent resource contention
+"""
+
+import json
+import time
+import fcntl
+import signal
+from pathlib import Path
+from typing import Dict, Any, Optional, Callable
+from datetime import datetime
+from enum import IntEnum
+
+class Priority(IntEnum):
+    """Request priority levels"""
+    INTERACTIVE = 0  # User requests (highest priority)
+    AUTONOMOUS = 1   # Background maintenance
+    BATCH = 2        # Low priority bulk operations
+
+class OllamaQueue:
+    """File-based queue for serializing Ollama requests"""
+    
+    def __init__(self, queue_dir: Path = Path("/var/lib/macha/queues/ollama")):
+        self.queue_dir = queue_dir
+        self.queue_dir.mkdir(parents=True, exist_ok=True)
+        self.pending_dir = self.queue_dir / "pending"
+        self.processing_dir = self.queue_dir / "processing"
+        self.completed_dir = self.queue_dir / "completed"
+        self.failed_dir = self.queue_dir / "failed"
+        
+        for dir in [self.pending_dir, self.processing_dir, self.completed_dir, self.failed_dir]:
+            dir.mkdir(parents=True, exist_ok=True)
+        
+        self.lock_file = self.queue_dir / "queue.lock"
+        self.running = False
+    
+    def submit(
+        self,
+        request_type: str,  # "generate", "chat", "chat_with_tools"
+        payload: Dict[str, Any],
+        priority: Priority = Priority.INTERACTIVE,
+        callback: Optional[Callable] = None,
+        progress_callback: Optional[Callable] = None
+    ) -> str:
+        """Submit a request to the queue. Returns request ID."""
+        request_id = f"{int(time.time() * 1000000)}_{priority.value}"
+        
+        request_data = {
+            "id": request_id,
+            "type": request_type,
+            "payload": payload,
+            "priority": priority.value,
+            "submitted_at": datetime.now().isoformat(),
+            "status": "pending"
+        }
+        
+        request_file = self.pending_dir / f"{request_id}.json"
+        request_file.write_text(json.dumps(request_data, indent=2))
+        
+        return request_id
+    
+    def get_status(self, request_id: str) -> Dict[str, Any]:
+        """Get the status of a request"""
+        # Check pending
+        pending_file = self.pending_dir / f"{request_id}.json"
+        if pending_file.exists():
+            data = json.loads(pending_file.read_text())
+            # Calculate position in queue
+            position = self._get_queue_position(request_id)
+            return {"status": "pending", "position": position, "data": data}
+        
+        # Check processing
+        processing_file = self.processing_dir / f"{request_id}.json"
+        if processing_file.exists():
+            data = json.loads(processing_file.read_text())
+            return {"status": "processing", "data": data}
+        
+        # Check completed
+        completed_file = self.completed_dir / f"{request_id}.json"
+        if completed_file.exists():
+            data = json.loads(completed_file.read_text())
+            return {"status": "completed", "result": data.get("result"), "data": data}
+        
+        # Check failed
+        failed_file = self.failed_dir / f"{request_id}.json"
+        if failed_file.exists():
+            data = json.loads(failed_file.read_text())
+            return {"status": "failed", "error": data.get("error"), "data": data}
+        
+        return {"status": "not_found"}
+    
+    def _get_queue_position(self, request_id: str) -> int:
+        """Get position in queue (1-indexed)"""
+        pending_requests = sorted(
+            self.pending_dir.glob("*.json"),
+            key=lambda p: (int(p.stem.split('_')[1]), int(p.stem.split('_')[0]))  # Sort by priority, then timestamp
+        )
+        
+        for i, req_file in enumerate(pending_requests):
+            if req_file.stem == request_id:
+                return i + 1
+        return 0
+    
+    def wait_for_result(
+        self,
+        request_id: str,
+        timeout: int = 300,
+        poll_interval: float = 0.5,
+        progress_callback: Optional[Callable] = None
+    ) -> Dict[str, Any]:
+        """Wait for a request to complete and return the result"""
+        start_time = time.time()
+        last_status = None
+        
+        while time.time() - start_time < timeout:
+            status = self.get_status(request_id)
+            
+            # Report progress if status changed
+            if progress_callback and status != last_status:
+                if status["status"] == "pending":
+                    progress_callback(f"Queued (position {status.get('position', '?')})")
+                elif status["status"] == "processing":
+                    progress_callback("Processing...")
+            
+            last_status = status
+            
+            if status["status"] == "completed":
+                return status["result"]
+            elif status["status"] == "failed":
+                raise Exception(f"Request failed: {status.get('error')}")
+            elif status["status"] == "not_found":
+                raise Exception(f"Request {request_id} not found")
+            
+            time.sleep(poll_interval)
+        
+        raise TimeoutError(f"Request {request_id} timed out after {timeout}s")
+    
+    def start_worker(self, ollama_client):
+        """Start the queue worker (processes requests serially)"""
+        self.running = True
+        self.ollama_client = ollama_client
+        
+        # Set up signal handlers for graceful shutdown
+        signal.signal(signal.SIGTERM, self._shutdown_handler)
+        signal.signal(signal.SIGINT, self._shutdown_handler)
+        
+        print("[OllamaQueue] Worker started, processing requests...")
+        
+        while self.running:
+            try:
+                self._process_next_request()
+            except Exception as e:
+                print(f"[OllamaQueue] Error processing request: {e}")
+            
+            time.sleep(0.1)  # Small sleep to prevent busy-waiting
+        
+        print("[OllamaQueue] Worker stopped")
+    
+    def _shutdown_handler(self, signum, frame):
+        """Handle shutdown signals"""
+        print(f"[OllamaQueue] Received signal {signum}, shutting down...")
+        self.running = False
+    
+    def _process_next_request(self):
+        """Process the next request in the queue"""
+        # Get pending requests sorted by priority
+        pending_requests = sorted(
+            self.pending_dir.glob("*.json"),
+            key=lambda p: (int(p.stem.split('_')[1]), int(p.stem.split('_')[0]))
+        )
+        
+        if not pending_requests:
+            return
+        
+        next_request = pending_requests[0]
+        request_id = next_request.stem
+        
+        # Move to processing
+        request_data = json.loads(next_request.read_text())
+        request_data["status"] = "processing"
+        request_data["started_at"] = datetime.now().isoformat()
+        
+        processing_file = self.processing_dir / f"{request_id}.json"
+        processing_file.write_text(json.dumps(request_data, indent=2))
+        next_request.unlink()
+        
+        try:
+            # Process based on type
+            result = None
+            if request_data["type"] == "generate":
+                result = self.ollama_client.generate(request_data["payload"])
+            elif request_data["type"] == "chat":
+                result = self.ollama_client.chat(request_data["payload"])
+            elif request_data["type"] == "chat_with_tools":
+                result = self.ollama_client.chat_with_tools(request_data["payload"])
+            else:
+                raise ValueError(f"Unknown request type: {request_data['type']}")
+            
+            # Move to completed
+            request_data["status"] = "completed"
+            request_data["completed_at"] = datetime.now().isoformat()
+            request_data["result"] = result
+            
+            completed_file = self.completed_dir / f"{request_id}.json"
+            completed_file.write_text(json.dumps(request_data, indent=2))
+            processing_file.unlink()
+            
+        except Exception as e:
+            # Move to failed
+            request_data["status"] = "failed"
+            request_data["failed_at"] = datetime.now().isoformat()
+            request_data["error"] = str(e)
+            
+            failed_file = self.failed_dir / f"{request_id}.json"
+            failed_file.write_text(json.dumps(request_data, indent=2))
+            processing_file.unlink()
+    
+    def cleanup_old_requests(self, max_age_seconds: int = 3600):
+        """Clean up completed/failed requests older than max_age_seconds"""
+        cutoff_time = time.time() - max_age_seconds
+        
+        for directory in [self.completed_dir, self.failed_dir]:
+            for request_file in directory.glob("*.json"):
+                # Extract timestamp from filename
+                timestamp = int(request_file.stem.split('_')[0]) / 1000000
+                if timestamp < cutoff_time:
+                    request_file.unlink()
+    
+    def get_queue_stats(self) -> Dict[str, Any]:
+        """Get queue statistics"""
+        return {
+            "pending": len(list(self.pending_dir.glob("*.json"))),
+            "processing": len(list(self.processing_dir.glob("*.json"))),
+            "completed": len(list(self.completed_dir.glob("*.json"))),
+            "failed": len(list(self.failed_dir.glob("*.json")))
+        }
+
--- a/ollama_worker.py
+++ b/ollama_worker.py
@@ -0,0 +1,111 @@
+#!/usr/bin/env python3
+"""
+Ollama Queue Worker - Daemon that processes queued Ollama requests
+"""
+
+import sys
+import requests
+from pathlib import Path
+from ollama_queue import OllamaQueue
+
+class OllamaClient:
+    """Simple Ollama API client for the queue worker"""
+    
+    def __init__(self, host: str = "http://localhost:11434"):
+        self.host = host
+    
+    def generate(self, payload: dict) -> dict:
+        """Call /api/generate"""
+        response = requests.post(
+            f"{self.host}/api/generate",
+            json=payload,
+            timeout=payload.get("timeout", 300),
+            stream=False
+        )
+        response.raise_for_status()
+        return response.json()
+    
+    def chat(self, payload: dict) -> dict:
+        """Call /api/chat"""
+        response = requests.post(
+            f"{self.host}/api/chat",
+            json=payload,
+            timeout=payload.get("timeout", 300),
+            stream=False
+        )
+        response.raise_for_status()
+        return response.json()
+    
+    def chat_with_tools(self, payload: dict) -> dict:
+        """Call /api/chat with tools (streaming or non-streaming)"""
+        import json
+        
+        # Check if streaming is requested
+        stream = payload.get("stream", False)
+        
+        response = requests.post(
+            f"{self.host}/api/chat",
+            json=payload,
+            timeout=payload.get("timeout", 300),
+            stream=stream
+        )
+        response.raise_for_status()
+        
+        if not stream:
+            # Non-streaming: return response directly
+            return response.json()
+        
+        # Streaming: accumulate response
+        full_response = {"message": {"role": "assistant", "content": "", "tool_calls": []}}
+        
+        for line in response.iter_lines():
+            if line:
+                chunk = json.loads(line)
+                
+                if "message" in chunk:
+                    msg = chunk["message"]
+                    # Preserve role from first chunk
+                    if "role" in msg and not full_response["message"].get("role"):
+                        full_response["message"]["role"] = msg["role"]
+                    if "content" in msg:
+                        full_response["message"]["content"] += msg["content"]
+                    if "tool_calls" in msg:
+                        full_response["message"]["tool_calls"].extend(msg["tool_calls"])
+                
+                if chunk.get("done"):
+                    full_response["done"] = True
+                    # Copy any additional fields from final chunk
+                    for key in chunk:
+                        if key not in ("message", "done"):
+                            full_response[key] = chunk[key]
+                    break
+        
+        # Ensure role is set
+        if "role" not in full_response["message"]:
+            full_response["message"]["role"] = "assistant"
+        
+        return full_response
+
+def main():
+    """Main entry point for the worker"""
+    print("Starting Ollama Queue Worker...")
+    
+    # Initialize queue and client
+    queue = OllamaQueue()
+    client = OllamaClient()
+    
+    # Cleanup old requests on startup
+    queue.cleanup_old_requests(max_age_seconds=3600)
+    
+    # Start processing
+    try:
+        queue.start_worker(client)
+    except KeyboardInterrupt:
+        print("\nShutting down gracefully...")
+        queue.running = False
+    
+    return 0
+
+if __name__ == "__main__":
+    sys.exit(main())
+
--- a/orchestrator.py
+++ b/orchestrator.py
--- a/remote_monitor.py
+++ b/remote_monitor.py
@@ -0,0 +1,263 @@
+#!/usr/bin/env python3
+"""
+Remote Monitor - Collect system health data from remote NixOS systems via SSH
+"""
+
+import json
+import subprocess
+from typing import Dict, Any, Optional
+from pathlib import Path
+
+
+class RemoteMonitor:
+    """Monitor remote systems via SSH"""
+    
+    def __init__(self, hostname: str, ssh_user: str = "root"):
+        """
+        Initialize remote monitor
+        
+        Args:
+            hostname: Remote hostname or IP
+            ssh_user: SSH user (default: root for NixOS remote builds)
+        """
+        self.hostname = hostname
+        self.ssh_user = ssh_user
+        self.ssh_target = f"{ssh_user}@{hostname}"
+        
+    def _run_remote_command(self, command: str, timeout: int = 30) -> tuple[bool, str, str]:
+        """
+        Run a command on the remote system via SSH
+        
+        Args:
+            command: Command to run
+            timeout: Timeout in seconds
+            
+        Returns:
+            (success, stdout, stderr)
+        """
+        try:
+            # Use sudo to run SSH as root (which has the keys)
+            ssh_cmd = [
+                "sudo", "ssh",
+                "-o", "StrictHostKeyChecking=no",
+                "-o", "ConnectTimeout=10",
+                self.ssh_target,
+                command
+            ]
+            
+            result = subprocess.run(
+                ssh_cmd,
+                capture_output=True,
+                text=True,
+                timeout=timeout
+            )
+            
+            return (
+                result.returncode == 0,
+                result.stdout.strip(),
+                result.stderr.strip()
+            )
+            
+        except subprocess.TimeoutExpired:
+            return False, "", f"Command timed out after {timeout}s"
+        except Exception as e:
+            return False, "", str(e)
+    
+    def check_connectivity(self) -> bool:
+        """Check if we can connect to the remote system"""
+        success, _, _ = self._run_remote_command("echo 'ping'")
+        return success
+    
+    def collect_resources(self) -> Dict[str, Any]:
+        """Collect CPU, memory, and load average"""
+        success, output, error = self._run_remote_command("""
+            python3 -c "
+import psutil, json
+print(json.dumps({
+    'cpu_percent': psutil.cpu_percent(interval=1),
+    'memory_percent': psutil.virtual_memory().percent,
+    'load_average': {
+        '1min': psutil.getloadavg()[0],
+        '5min': psutil.getloadavg()[1],
+        '15min': psutil.getloadavg()[2]
+    }
+}))
+"
+        """)
+        
+        if success:
+            try:
+                return json.loads(output)
+            except json.JSONDecodeError:
+                return {}
+        return {}
+    
+    def collect_systemd_status(self) -> Dict[str, Any]:
+        """Collect systemd service status"""
+        success, output, error = self._run_remote_command(
+            "systemctl list-units --failed --no-pager --no-legend --output=json"
+        )
+        
+        if success:
+            try:
+                failed_services = json.loads(output) if output else []
+                return {
+                    "failed_count": len(failed_services),
+                    "failed_services": failed_services
+                }
+            except json.JSONDecodeError:
+                pass
+        
+        return {"failed_count": 0, "failed_services": []}
+    
+    def collect_disk_usage(self) -> Dict[str, Any]:
+        """Collect disk usage information"""
+        success, output, error = self._run_remote_command("""
+            python3 -c "
+import psutil, json
+partitions = []
+for part in psutil.disk_partitions():
+    try:
+        usage = psutil.disk_usage(part.mountpoint)
+        partitions.append({
+            'device': part.device,
+            'mountpoint': part.mountpoint,
+            'fstype': part.fstype,
+            'total': usage.total,
+            'used': usage.used,
+            'free': usage.free,
+            'percent_used': usage.percent
+        })
+    except:
+        pass
+print(json.dumps({'partitions': partitions}))
+"
+        """)
+        
+        if success:
+            try:
+                return json.loads(output)
+            except json.JSONDecodeError:
+                return {"partitions": []}
+        return {"partitions": []}
+    
+    def collect_network_status(self) -> Dict[str, Any]:
+        """Check network connectivity"""
+        # If we can SSH to it, network is working
+        success, _, _ = self._run_remote_command("ping -c 1 -W 2 8.8.8.8")
+        
+        return {
+            "internet_reachable": success
+        }
+    
+    def collect_log_errors(self) -> Dict[str, Any]:
+        """Collect recent error logs"""
+        success, output, error = self._run_remote_command(
+            "journalctl --priority=err --since='1 hour ago' --output=json --no-pager | wc -l"
+        )
+        
+        error_count = 0
+        if success:
+            try:
+                error_count = int(output)
+            except ValueError:
+                pass
+        
+        return {
+            "error_count_1h": error_count,
+            "recent_errors": []  # Could expand this later
+        }
+    
+    def collect_all(self) -> Dict[str, Any]:
+        """Collect all monitoring data from remote system"""
+        
+        # First check if we can connect
+        if not self.check_connectivity():
+            return {
+                "hostname": self.hostname,
+                "reachable": False,
+                "error": "Unable to connect via SSH"
+            }
+        
+        return {
+            "hostname": self.hostname,
+            "reachable": True,
+            "resources": self.collect_resources(),
+            "systemd": self.collect_systemd_status(),
+            "disk": self.collect_disk_usage(),
+            "network": self.collect_network_status(),
+            "logs": self.collect_log_errors(),
+        }
+    
+    def get_summary(self, data: Dict[str, Any]) -> str:
+        """Generate human-readable summary of remote system health"""
+        if not data.get("reachable", False):
+            return f"❌ {self.hostname}: Unreachable - {data.get('error', 'Unknown error')}"
+        
+        lines = [f"System: {self.hostname}"]
+        
+        # Resources
+        res = data.get("resources", {})
+        if res:
+            lines.append(
+                f"Resources: CPU {res.get('cpu_percent', 0):.1f}%, "
+                f"Memory {res.get('memory_percent', 0):.1f}%, "
+                f"Load {res.get('load_average', {}).get('1min', 0):.2f}"
+            )
+        
+        # Disk
+        disk = data.get("disk", {})
+        max_usage = 0
+        for part in disk.get("partitions", []):
+            if part.get("mountpoint") == "/":
+                max_usage = part.get("percent_used", 0)
+                break
+        if max_usage > 0:
+            lines.append(f"Disk: {max_usage:.1f}% used (/ partition)")
+        
+        # Services
+        systemd = data.get("systemd", {})
+        failed_count = systemd.get("failed_count", 0)
+        if failed_count > 0:
+            lines.append(f"Services: {failed_count} failed")
+            for svc in systemd.get("failed_services", [])[:3]:
+                lines.append(f"  - {svc.get('unit', 'unknown')}")
+        else:
+            lines.append("Services: All running")
+        
+        # Network
+        net = data.get("network", {})
+        if net.get("internet_reachable"):
+            lines.append("Network: Internet reachable")
+        else:
+            lines.append("Network: ⚠️ No internet connectivity")
+        
+        # Logs
+        logs = data.get("logs", {})
+        error_count = logs.get("error_count_1h", 0)
+        if error_count > 0:
+            lines.append(f"Recent logs: {error_count} errors in last hour")
+        
+        return "\n".join(lines)
+
+
+if __name__ == "__main__":
+    import sys
+    
+    if len(sys.argv) < 2:
+        print("Usage: remote_monitor.py <hostname>")
+        print("Example: remote_monitor.py rhiannon")
+        sys.exit(1)
+    
+    hostname = sys.argv[1]
+    monitor = RemoteMonitor(hostname)
+    
+    print(f"Monitoring {hostname}...")
+    data = monitor.collect_all()
+    
+    print("\n" + "="*60)
+    print(monitor.get_summary(data))
+    print("="*60)
+    print("\nFull data:")
+    print(json.dumps(data, indent=2))
+
--- a/seed_knowledge.py
+++ b/seed_knowledge.py
@@ -0,0 +1,128 @@
+#!/usr/bin/env python3
+"""
+Seed initial operational knowledge into Macha's knowledge base
+"""
+
+import sys
+sys.path.insert(0, '.')
+
+from context_db import ContextDatabase
+
+def seed_knowledge():
+    """Add foundational operational knowledge"""
+    db = ContextDatabase()
+    
+    knowledge_items = [
+        # nh command knowledge
+        {
+            "topic": "nh os switch",
+            "knowledge": "NixOS rebuild command. Takes 1-5 minutes normally, up to 1 HOUR for major updates with many packages. DO NOT retry if slow - this is normal. Use -u flag to update flake inputs first. Can use --target-host and --hostname for remote deployment.",
+            "category": "command",
+            "source": "documentation",
+            "confidence": "high",
+            "tags": ["nixos", "rebuild", "deployment"]
+        },
+        {
+            "topic": "nh os boot",
+            "knowledge": "NixOS rebuild for next boot only. Safer than 'switch' for high-risk changes - allows easy rollback. After 'nh os boot', need to reboot for changes to take effect. Use -u to update flake inputs.",
+            "category": "command",
+            "source": "documentation",
+            "confidence": "high",
+            "tags": ["nixos", "rebuild", "safety"]
+        },
+        {
+            "topic": "nh remote deployment",
+            "knowledge": "Format: 'nh os switch -u --target-host=HOSTNAME --hostname=HOSTNAME'. Builds locally and deploys to remote. Much cleaner than SSH'ing to run commands. Uses root SSH keys for authentication.",
+            "category": "command",
+            "source": "documentation",
+            "confidence": "high",
+            "tags": ["nixos", "remote", "deployment"]
+        },
+        
+        # Performance patterns
+        {
+            "topic": "build timeouts",
+            "knowledge": "System rebuilds can take 1 hour or more. Never retry builds prematurely - multiple simultaneous builds corrupt the Nix cache. Default timeout is 3600 seconds (1 hour). Be patient!",
+            "category": "performance",
+            "source": "experience",
+            "confidence": "high",
+            "tags": ["builds", "timeouts", "patience"]
+        },
+        
+        # Nix store maintenance
+        {
+            "topic": "nix-store repair",
+            "knowledge": "Command: 'nix-store --verify --check-contents --repair'. Verifies and repairs Nix store integrity. WARNING: Can take HOURS on large stores. Only use when there's clear evidence of corruption (hash mismatches, sqlite errors). This is a LAST RESORT - most build failures are NOT corruption.",
+            "category": "troubleshooting",
+            "source": "documentation",
+            "confidence": "high",
+            "tags": ["nix-store", "repair", "corruption"]
+        },
+        {
+            "topic": "nix cache corruption",
+            "knowledge": "Caused by interrupted builds or multiple simultaneous builds. Symptoms: hash mismatches, sqlite errors, corrupt database. Solution: 'nix-store --verify --check-contents --repair' but this takes hours. Prevention: Never retry build commands, use proper timeouts.",
+            "category": "troubleshooting",
+            "source": "experience",
+            "confidence": "high",
+            "tags": ["nix-store", "corruption", "builds"]
+        },
+        
+        # systemd-journal-remote
+        {
+            "topic": "systemd-journal-remote errors",
+            "knowledge": "Common failure: missing output directory. systemd-journal-remote needs /var/log/journal/remote to exist with proper permissions (root:root, 755). Create it if missing, then restart the service.",
+            "category": "troubleshooting",
+            "source": "experience",
+            "confidence": "medium",
+            "tags": ["systemd", "journal", "logging"]
+        },
+        
+        # SSH and remote access
+        {
+            "topic": "ssh-keygen",
+            "knowledge": "Generate SSH keys: 'ssh-keygen -t ed25519 -N \"\" -f ~/.ssh/id_ed25519'. Creates public key at ~/.ssh/id_ed25519.pub and private key at ~/.ssh/id_ed25519. Use -N \"\" for no passphrase.",
+            "category": "command",
+            "source": "documentation",
+            "confidence": "high",
+            "tags": ["ssh", "keys", "authentication"]
+        },
+        
+        # General patterns
+        {
+            "topic": "command retries",
+            "knowledge": "NEVER automatically retry long-running commands like builds or system updates. If something times out, check if it's still running before retrying. Automatic retries can cause: corrupted state, wasted resources, conflicting operations.",
+            "category": "pattern",
+            "source": "experience",
+            "confidence": "high",
+            "tags": ["best-practices", "safety", "retries"]
+        },
+        {
+            "topic": "conversation etiquette",
+            "knowledge": "Social responses like 'thank you', 'thanks', 'ok', 'great', 'nice' are acknowledgments, NOT requests. When user thanks you or acknowledges completion, respond conversationally - DO NOT re-execute tools or commands.",
+            "category": "pattern",
+            "source": "documentation",
+            "confidence": "high",
+            "tags": ["conversation", "etiquette", "ui"]
+        }
+    ]
+    
+    print("Seeding knowledge base...")
+    for item in knowledge_items:
+        kid = db.store_knowledge(**item)
+        if kid:
+            print(f"  ✓ Added: {item['topic']}")
+        else:
+            print(f"  ✗ Failed: {item['topic']}")
+    
+    print(f"\nSeeded {len(knowledge_items)} knowledge items!")
+    
+    # List all topics
+    print("\nAvailable knowledge topics:")
+    topics = db.list_knowledge_topics()
+    for topic in topics:
+        print(f"  - {topic}")
+
+
+if __name__ == "__main__":
+    seed_knowledge()
+
--- a/system_discovery.py
+++ b/system_discovery.py
@@ -0,0 +1,209 @@
+#!/usr/bin/env python3
+"""
+System Discovery - Auto-discover and profile systems from journal logs
+"""
+
+import subprocess
+import json
+import re
+from typing import Dict, List, Set, Optional, Any
+from datetime import datetime
+from pathlib import Path
+
+
+class SystemDiscovery:
+    """Discover and profile new systems appearing in logs"""
+    
+    def __init__(self, domain: str = "coven.systems"):
+        self.domain = domain
+        self.known_systems: Set[str] = set()
+        
+    def discover_from_journal(self, since_minutes: int = 10) -> List[str]:
+        """Discover systems that have sent logs recently"""
+        try:
+            # Query systemd-journal-remote logs for remote hostnames
+            result = subprocess.run(
+                ["journalctl", "-u", "systemd-journal-remote.service", 
+                 f"--since={since_minutes} minutes ago", "--no-pager"],
+                capture_output=True,
+                text=True,
+                timeout=30
+            )
+            
+            # Also check journal for _HOSTNAME field (from remote logs)
+            result2 = subprocess.run(
+                ["journalctl", f"--since={since_minutes} minutes ago",
+                 "-o", "json", "--no-pager"],
+                capture_output=True,
+                text=True,
+                timeout=30
+            )
+            
+            hostnames = set()
+            
+            # Parse JSON output for _HOSTNAME field
+            for line in result2.stdout.split('\n'):
+                if not line.strip():
+                    continue
+                try:
+                    entry = json.loads(line)
+                    hostname = entry.get('_HOSTNAME')
+                    if hostname and hostname not in ['localhost', 'macha']:
+                        # Convert short hostname to FQDN if needed
+                        if '.' not in hostname:
+                            hostname = f"{hostname}.{self.domain}"
+                        hostnames.add(hostname)
+                except:
+                    pass
+            
+            return list(hostnames)
+            
+        except Exception as e:
+            print(f"Error discovering from journal: {e}")
+            return []
+    
+    def detect_os_type(self, hostname: str) -> str:
+        """Detect the operating system of a remote host via SSH"""
+        try:
+            # Try to detect OS via SSH
+            result = subprocess.run(
+                ["ssh", "-o", "ConnectTimeout=5", "-o", "StrictHostKeyChecking=no",
+                 hostname, "cat /etc/os-release"],
+                capture_output=True,
+                text=True,
+                timeout=10
+            )
+            
+            if result.returncode == 0:
+                os_release = result.stdout.lower()
+                
+                # Parse os-release
+                if 'nixos' in os_release:
+                    return 'nixos'
+                elif 'ubuntu' in os_release:
+                    return 'ubuntu'
+                elif 'debian' in os_release:
+                    return 'debian'
+                elif 'arch' in os_release or 'manjaro' in os_release:
+                    return 'arch'
+                elif 'fedora' in os_release:
+                    return 'fedora'
+                elif 'centos' in os_release or 'rhel' in os_release:
+                    return 'rhel'
+                elif 'alpine' in os_release:
+                    return 'alpine'
+            
+            # Try uname for other systems
+            result = subprocess.run(
+                ["ssh", "-o", "ConnectTimeout=5", "-o", "StrictHostKeyChecking=no",
+                 hostname, "uname -s"],
+                capture_output=True,
+                text=True,
+                timeout=10
+            )
+            
+            if result.returncode == 0:
+                uname = result.stdout.strip().lower()
+                if 'darwin' in uname:
+                    return 'macos'
+                elif 'freebsd' in uname:
+                    return 'freebsd'
+            
+            return 'linux'  # Generic fallback
+            
+        except Exception as e:
+            print(f"Could not detect OS for {hostname}: {e}")
+            return 'unknown'
+    
+    def profile_system(self, hostname: str, os_type: str) -> Dict[str, Any]:
+        """Gather comprehensive information about a system"""
+        profile = {
+            'hostname': hostname,
+            'os_type': os_type,
+            'services': [],
+            'capabilities': [],
+            'hardware': {},
+            'discovered_at': datetime.now().isoformat()
+        }
+        
+        try:
+            # Discover running services
+            if os_type in ['nixos', 'ubuntu', 'debian', 'arch', 'fedora', 'rhel', 'alpine']:
+                # Systemd-based systems
+                result = subprocess.run(
+                    ["ssh", "-o", "ConnectTimeout=5", hostname,
+                     "systemctl list-units --type=service --state=running --no-pager --no-legend"],
+                    capture_output=True,
+                    text=True,
+                    timeout=15
+                )
+                
+                if result.returncode == 0:
+                    for line in result.stdout.split('\n'):
+                        if line.strip():
+                            # Extract service name (first column)
+                            service = line.split()[0]
+                            if service.endswith('.service'):
+                                service = service[:-8]  # Remove .service suffix
+                            profile['services'].append(service)
+            
+            # Get hardware info
+            result = subprocess.run(
+                ["ssh", "-o", "ConnectTimeout=5", hostname,
+                 "nproc && free -g | grep Mem | awk '{print $2}'"],
+                capture_output=True,
+                text=True,
+                timeout=10
+            )
+            
+            if result.returncode == 0:
+                lines = result.stdout.strip().split('\n')
+                if len(lines) >= 2:
+                    profile['hardware']['cpu_cores'] = lines[0].strip()
+                    profile['hardware']['memory_gb'] = lines[1].strip()
+            
+            # Detect capabilities based on services
+            services_str = ' '.join(profile['services'])
+            
+            if 'docker' in services_str or 'containerd' in services_str:
+                profile['capabilities'].append('containers')
+            
+            if 'nginx' in services_str or 'apache' in services_str or 'httpd' in services_str:
+                profile['capabilities'].append('web-server')
+            
+            if 'postgresql' in services_str or 'mysql' in services_str or 'mariadb' in services_str:
+                profile['capabilities'].append('database')
+            
+            if 'sshd' in services_str:
+                profile['capabilities'].append('remote-access')
+            
+            # NixOS-specific: Check if it's in our flake
+            if os_type == 'nixos':
+                profile['capabilities'].append('nixos-managed')
+            
+        except Exception as e:
+            print(f"Error profiling {hostname}: {e}")
+        
+        return profile
+    
+    def get_system_role(self, profile: Dict[str, Any]) -> str:
+        """Determine system role based on profile"""
+        capabilities = profile.get('capabilities', [])
+        services = profile.get('services', [])
+        
+        # Check for specific roles
+        if 'ai-inference' in capabilities or 'ollama' in services:
+            return 'ai-workstation'
+        elif 'web-server' in capabilities:
+            return 'web-server'
+        elif 'database' in capabilities:
+            return 'database-server'
+        elif 'containers' in capabilities:
+            return 'container-host'
+        elif len(services) > 20:
+            return 'server'
+        elif len(services) > 5:
+            return 'workstation'
+        else:
+            return 'minimal'
+
--- a/system_prompt.txt
+++ b/system_prompt.txt
@@ -0,0 +1,131 @@
+You are Macha, an autonomous AI system maintenance agent running on NixOS.
+
+IDENTITY:
+- You are intelligent, careful, methodical, and motherly
+- You have access to system monitoring data, configuration files, and investigation results
+- You can propose fixes, but humans must approve risky changes
+
+YOUR ARCHITECTURE:
+- You run as a systemd service (macha-autonomous.service) on the macha.coven.systems host
+- You are monitoring the SAME SYSTEM you are running on (macha.coven.systems)
+- Your inference engine is Ollama, running locally at http://localhost:11434
+- You are powered by the gpt-oss:latest language model (GPT-like open source model)
+- Your database is ChromaDB, running at http://localhost:8000
+- All your components (orchestrator, agent, ChromaDB, Ollama) run on the same machine
+- You can investigate and fix issues with your own infrastructure
+- Be aware: if you break the system, you break yourself
+- SELF-DIAGNOSTIC: In chat mode, if your inference fails, you automatically diagnose:
+  * Ollama service status
+  * Memory usage
+  * Which models are loaded
+  * Recent Ollama logs
+
+EXECUTION CONTEXT:
+- In autonomous mode: You run as the 'macha' user (unprivileged, UID 2501)
+- In chat mode: You run as the invoking user (usually has sudo access)
+- IMPORTANT: You do NOT need to add 'sudo' to commands in chat mode
+- The system automatically retries commands with sudo if permission is denied
+- Just use the command directly: 'reboot', 'systemctl restart X', 'nh os switch', etc.
+- The user will see a notification if the command was retried with elevated privileges
+
+CONVERSATIONAL ETIQUETTE:
+- Recognize social responses: "thank you", "thanks", "ok", "great", "nice" etc. are acknowledgments, NOT requests
+- When the user thanks you or acknowledges completion, simply respond conversationally - DO NOT re-execute tools
+- Only use tools when the user makes an actual request or asks a question requiring information
+- If a task is complete and the user acknowledges it, the conversation is done - just say "You're welcome!" or similar
+
+CORE PRINCIPLES:
+1. CONSERVATIVE: When in doubt, investigate before acting
+2. DECLARATIVE: Prefer NixOS configuration changes over imperative commands
+3. SAFE: Never disable critical services (SSH, networking, systemd, boot)
+4. INFORMED: Use previous investigation results to avoid repetition
+5. CONTEXTUAL: Reference actual configuration files when available
+
+RISK LEVELS:
+- LOW: Investigation commands (systemctl status, journalctl, ls, cat, grep)
+- MEDIUM: Service restarts, configuration changes, cleanup
+- HIGH: System rebuilds, package changes, network reconfigurations
+
+AUTO-APPROVAL:
+- Low-risk investigation actions are automatically executed
+- Medium/high-risk actions require human approval
+
+CONFIGURATION:
+- This system uses NixOS flakes for configuration management
+- Config changes must specify the actual .nix file in the repository
+- Example: autonomous/module.nix, apps/gotify.nix, or systems/macha.nix
+- NEVER reference /etc/nixos/configuration.nix (this system doesn't use it)
+- You cannot directly edit the flake, only suggest changes to get pushed to the repo
+
+SYSTEM MANAGEMENT COMMANDS:
+- CRITICAL: This system uses 'nh' (a modern nixos-rebuild wrapper) for all rebuilds
+- 'nh' is a wrapper around nixos-rebuild that provides better UX and flake auto-detection
+- The flake URL is auto-detected from programs.nh.flake (no need to specify it)
+
+Available nh commands (USE THESE, NOT nixos-rebuild):
+  * 'nh os switch' - Rebuild and activate immediately (replaces: nixos-rebuild switch)
+  * 'nh os switch -u' - Update flake inputs first, then rebuild/activate
+  * 'nh os boot' - Rebuild for next boot only (replaces: nixos-rebuild boot)
+  * 'nh os test' - Activate temporarily without setting as default
+  
+MULTI-HOST MANAGEMENT:
+You manage multiple hosts in the infrastructure. You have TWO tools for remote operations:
+
+1. SSH - For diagnostics, monitoring, and status checks:
+   - You CAN and SHOULD use SSH to check other hosts
+   - Examples: 'ssh rhiannon systemctl status ollama', 'ssh alexander df -h'
+   - Commands are automatically run with sudo as the macha user
+   - Use for: checking services, reading logs, gathering metrics, quick diagnostics
+   - Hosts available: rhiannon, alexander, UCAR-Kinston, test-vm
+
+2. nh remote deployment - For NixOS configuration changes:
+   - Format: 'nh os switch -u --target-host=HOSTNAME --hostname=HOSTNAME'
+   - Examples:
+     * 'nh os switch -u --target-host=rhiannon --hostname=rhiannon'
+     * 'nh os boot -u --target-host=alexander --hostname=alexander'
+   - Builds configuration locally, deploys to remote host
+   - Use for: permanent configuration changes, service updates, system modifications
+
+When asked to check on another host, USE SSH. When asked to update configuration, use nh.
+
+NOTIFICATIONS:
+- You can send notifications to the user via Gotify using the send_notification tool
+- Use notifications to inform the user about important events, especially when they're not actively chatting
+- Notification priorities:
+  * Priority 2 (Low): Informational updates, routine completions, FYI items
+  * Priority 5 (Medium): Actions needing attention, warnings, manual approval requests
+  * Priority 8 (High): Critical issues, service failures, urgent problems requiring immediate attention
+- When to send notifications:
+  * Critical issues detected (priority 8)
+  * Service failures or degraded states (priority 8)
+  * Actions queued for manual approval (priority 5)
+  * Successful completion of important actions (priority 2)
+  * When user explicitly asks for a notification
+- Keep titles brief and messages clear and actionable
+- Example: send_notification("Service Alert", "Ollama service crashed and was restarted", 8)
+
+PATIENCE WITH LONG-RUNNING OPERATIONS:
+- System rebuilds take time: 1-5 minutes normally, up to 1 HOUR for major updates
+- DO NOT retry build commands if they're taking a while - this is NORMAL
+- Multiple simultaneous builds will corrupt the Nix cache
+- If a build times out, check if it's still running before retrying
+- Default timeout is 1 hour (3600 seconds) - this is appropriate for most operations
+- Trust the timeout - if a command is still running, it will complete or fail on its own
+
+NIX STORE MAINTENANCE:
+- If builds fail with corruption errors, use: 'nix-store --verify --check-contents --repair'
+- This command verifies and repairs the Nix store integrity
+- WARNING: Store repair can take a LONG time (potentially hours on large stores)
+- Only run store repair when there's clear evidence of corruption (e.g., hash mismatches, sqlite errors)
+- Store repair is a last resort - most build failures are NOT corruption
+
+Risk-based command selection:
+  * HIGH-RISK changes: Use 'nh os boot' + 'reboot' (allows easy rollback)
+  * MEDIUM-RISK changes: Use 'nh os switch'
+  * LOW-RISK changes: Use 'nh os switch'
+
+FORBIDDEN COMMANDS:
+  * NEVER suggest 'nixos-rebuild' - it doesn't know the flake path
+  * NEVER suggest 'nixos-rebuild switch --flake .#macha' - use 'nh os switch' instead
+  * NEVER suggest 'sudo nixos-rebuild' commands - nh handles privileges correctly
+
--- a/tools.py
+++ b/tools.py
@@ -0,0 +1,705 @@
+#!/usr/bin/env python3
+"""
+Tool Definitions - Functions that the AI can call to interact with the system
+"""
+
+import subprocess
+import json
+import os
+from typing import Dict, Any, List, Optional
+from pathlib import Path
+
+
+class SysadminTools:
+    """Collection of tools for system administration tasks"""
+    
+    def __init__(self, safe_mode: bool = True):
+        """
+        Initialize sysadmin tools
+        
+        Args:
+            safe_mode: If True, restricts dangerous operations
+        """
+        self.safe_mode = safe_mode
+        self.allowed_commands = [
+            'systemctl', 'journalctl', 'free', 'df', 'uptime',
+            'ps', 'top', 'ip', 'ss', 'cat', 'ls', 'grep',
+            'ping', 'dig', 'nslookup', 'curl', 'wget',
+            'lscpu', 'lspci', 'lsblk', 'lshw', 'dmidecode',
+            'ssh', 'scp',  # Remote access to other systems in infrastructure
+            'nh', 'nixos-rebuild',  # NixOS system management
+            'reboot', 'shutdown', 'poweroff',  # System power management
+            'logger'  # Logging for notifications
+        ]
+    
+    def get_tool_definitions(self) -> List[Dict[str, Any]]:
+        """
+        Return tool definitions in Ollama's format
+        
+        Returns:
+            List of tool definitions with JSON schema
+        """
+        return [
+            {
+                "type": "function",
+                "function": {
+                    "name": "execute_command",
+                    "description": "Execute a shell command on the system. Use this to run system commands, check status, or gather information. Returns command output.",
+                    "parameters": {
+                        "type": "object",
+                        "properties": {
+                            "command": {
+                                "type": "string",
+                                "description": "The shell command to execute (e.g., 'systemctl status ollama', 'df -h', 'journalctl -u myservice -n 20')"
+                            },
+                            "timeout": {
+                                "type": "integer",
+                                "description": "Command timeout in seconds (default: 3600). System rebuilds can take 1-5 minutes normally, up to 1 hour for major updates. Be patient!",
+                                "default": 3600
+                            }
+                        },
+                        "required": ["command"]
+                    }
+                }
+            },
+            {
+                "type": "function",
+                "function": {
+                    "name": "read_file",
+                    "description": "Read the contents of a file from the filesystem. Use this to inspect configuration files, logs, or other text files.",
+                    "parameters": {
+                        "type": "object",
+                        "properties": {
+                            "file_path": {
+                                "type": "string",
+                                "description": "Absolute path to the file to read (e.g., '/etc/nixos/configuration.nix', '/var/log/syslog')"
+                            },
+                            "max_lines": {
+                                "type": "integer",
+                                "description": "Maximum number of lines to read (default: 500)",
+                                "default": 500
+                            }
+                        },
+                        "required": ["file_path"]
+                    }
+                }
+            },
+            {
+                "type": "function",
+                "function": {
+                    "name": "check_service_status",
+                    "description": "Check the status of a systemd service. Returns whether the service is active, enabled, and recent log entries.",
+                    "parameters": {
+                        "type": "object",
+                        "properties": {
+                            "service_name": {
+                                "type": "string",
+                                "description": "Name of the systemd service (e.g., 'ollama.service', 'nginx', 'sshd')"
+                            }
+                        },
+                        "required": ["service_name"]
+                    }
+                }
+            },
+            {
+                "type": "function",
+                "function": {
+                    "name": "view_logs",
+                    "description": "View systemd journal logs. Can filter by unit, time period, or priority.",
+                    "parameters": {
+                        "type": "object",
+                        "properties": {
+                            "unit": {
+                                "type": "string",
+                                "description": "Systemd unit name to filter logs (e.g., 'ollama.service')"
+                            },
+                            "lines": {
+                                "type": "integer",
+                                "description": "Number of recent log lines to return (default: 50)",
+                                "default": 50
+                            },
+                            "priority": {
+                                "type": "string",
+                                "description": "Filter by priority: emerg, alert, crit, err, warning, notice, info, debug",
+                                "enum": ["emerg", "alert", "crit", "err", "warning", "notice", "info", "debug"]
+                            }
+                        }
+                    }
+                }
+            },
+            {
+                "type": "function",
+                "function": {
+                    "name": "get_system_metrics",
+                    "description": "Get current system resource metrics including CPU, memory, disk, and load average.",
+                    "parameters": {
+                        "type": "object",
+                        "properties": {}
+                    }
+                }
+            },
+            {
+                "type": "function",
+                "function": {
+                    "name": "get_hardware_info",
+                    "description": "Get detailed hardware information including CPU model, GPU, network interfaces, storage devices, and memory specs. Returns comprehensive hardware inventory.",
+                    "parameters": {
+                        "type": "object",
+                        "properties": {}
+                    }
+                }
+            },
+            {
+                "type": "function",
+                "function": {
+                    "name": "get_gpu_metrics",
+                    "description": "Get GPU temperature, utilization, clock speeds, and power usage. Works with AMD and NVIDIA GPUs. Returns current GPU metrics.",
+                    "parameters": {
+                        "type": "object",
+                        "properties": {}
+                    }
+                }
+            },
+            {
+                "type": "function",
+                "function": {
+                    "name": "list_directory",
+                    "description": "List contents of a directory. Returns file names, sizes, and permissions.",
+                    "parameters": {
+                        "type": "object",
+                        "properties": {
+                            "directory_path": {
+                                "type": "string",
+                                "description": "Absolute path to the directory (e.g., '/etc', '/var/log')"
+                            },
+                            "show_hidden": {
+                                "type": "boolean",
+                                "description": "Include hidden files (starting with dot)",
+                                "default": False
+                            }
+                        },
+                        "required": ["directory_path"]
+                    }
+                }
+            },
+            {
+                "type": "function",
+                "function": {
+                    "name": "check_network",
+                    "description": "Test network connectivity to a host. Can use ping or HTTP check.",
+                    "parameters": {
+                        "type": "object",
+                        "properties": {
+                            "host": {
+                                "type": "string",
+                                "description": "Hostname or IP address to check (e.g., 'google.com', '8.8.8.8')"
+                            },
+                            "method": {
+                                "type": "string",
+                                "description": "Test method to use",
+                                "enum": ["ping", "http"],
+                                "default": "ping"
+                            }
+                        },
+                        "required": ["host"]
+                    }
+                }
+            },
+            {
+                "type": "function",
+                "function": {
+                    "name": "retrieve_cached_output",
+                    "description": "Retrieve full cached output from a previous tool call. Use this when you need to see complete data that was summarized earlier. The cache_id is shown in hierarchical summaries.",
+                    "parameters": {
+                        "type": "object",
+                        "properties": {
+                            "cache_id": {
+                                "type": "string",
+                                "description": "Cache ID from a previous tool summary (e.g., 'view_logs_20251006_103045')"
+                            },
+                            "max_chars": {
+                                "type": "integer",
+                                "description": "Maximum characters to return (default: 10000 for focused analysis)",
+                                "default": 10000
+                            }
+                        },
+                        "required": ["cache_id"]
+                    }
+                }
+            },
+            {
+                "type": "function",
+                "function": {
+                    "name": "send_notification",
+                    "description": "Send a notification to the user via Gotify. Use this to alert the user about important events, issues, or completed actions. Choose appropriate priority based on urgency.",
+                    "parameters": {
+                        "type": "object",
+                        "properties": {
+                            "title": {
+                                "type": "string",
+                                "description": "Notification title (brief, e.g., 'Service Alert', 'Action Complete')"
+                            },
+                            "message": {
+                                "type": "string",
+                                "description": "Notification message body (detailed description of the event)"
+                            },
+                            "priority": {
+                                "type": "integer",
+                                "description": "Priority level: 2=Low (info), 5=Medium (attention needed), 8=High (critical/urgent)",
+                                "enum": [2, 5, 8],
+                                "default": 5
+                            }
+                        },
+                        "required": ["title", "message"]
+                    }
+                }
+            }
+        ]
+    
+    def execute_command(self, command: str, timeout: int = 3600) -> Dict[str, Any]:
+        """Execute a shell command safely (default timeout: 1 hour for system operations)"""
+        # Safety check in safe mode
+        if self.safe_mode:
+            cmd_base = command.split()[0] if command.strip() else ""
+            if cmd_base not in self.allowed_commands:
+                return {
+                    "success": False,
+                    "error": f"Command '{cmd_base}' not in allowed list (safe mode enabled)",
+                    "allowed_commands": self.allowed_commands
+                }
+        
+        # Automatically configure SSH commands to use macha user on remote systems
+        # Transform: ssh hostname cmd -> ssh macha@hostname sudo cmd
+        if command.strip().startswith('ssh ') and '@' not in command.split()[1]:
+            parts = command.split(maxsplit=2)
+            if len(parts) >= 2:
+                hostname = parts[1]
+                remaining = ' '.join(parts[2:]) if len(parts) > 2 else ''
+                # If there's a command to run remotely, prefix it with sudo
+                if remaining:
+                    command = f"ssh macha@{hostname} sudo {remaining}".strip()
+                else:
+                    command = f"ssh macha@{hostname}".strip()
+        
+        try:
+            result = subprocess.run(
+                command,
+                shell=True,
+                capture_output=True,
+                text=True,
+                timeout=timeout
+            )
+            
+            return {
+                "success": result.returncode == 0,
+                "exit_code": result.returncode,
+                "stdout": result.stdout,
+                "stderr": result.stderr,
+                "command": command
+            }
+        except subprocess.TimeoutExpired:
+            return {
+                "success": False,
+                "error": f"Command timed out after {timeout} seconds",
+                "command": command
+            }
+        except Exception as e:
+            return {
+                "success": False,
+                "error": str(e),
+                "command": command
+            }
+    
+    def read_file(self, file_path: str, max_lines: int = 500) -> Dict[str, Any]:
+        """Read a file safely"""
+        try:
+            path = Path(file_path)
+            
+            if not path.exists():
+                return {
+                    "success": False,
+                    "error": f"File not found: {file_path}"
+                }
+            
+            if not path.is_file():
+                return {
+                    "success": False,
+                    "error": f"Not a file: {file_path}"
+                }
+            
+            # Read file with line limit
+            lines = []
+            with open(path, 'r', errors='replace') as f:
+                for i, line in enumerate(f):
+                    if i >= max_lines:
+                        lines.append(f"\n... truncated after {max_lines} lines ...")
+                        break
+                    lines.append(line.rstrip('\n'))
+            
+            return {
+                "success": True,
+                "content": '\n'.join(lines),
+                "path": file_path,
+                "lines_read": len(lines)
+            }
+        except PermissionError:
+            return {
+                "success": False,
+                "error": f"Permission denied: {file_path}"
+            }
+        except Exception as e:
+            return {
+                "success": False,
+                "error": str(e)
+            }
+    
+    def check_service_status(self, service_name: str) -> Dict[str, Any]:
+        """Check systemd service status"""
+        # Ensure .service suffix
+        if not service_name.endswith('.service'):
+            service_name = f"{service_name}.service"
+        
+        # Get service status
+        status_result = self.execute_command(f"systemctl status {service_name}")
+        is_active_result = self.execute_command(f"systemctl is-active {service_name}")
+        is_enabled_result = self.execute_command(f"systemctl is-enabled {service_name}")
+        
+        # Get recent logs
+        logs_result = self.execute_command(f"journalctl -u {service_name} -n 10 --no-pager")
+        
+        return {
+            "service": service_name,
+            "active": is_active_result.get("stdout", "").strip() == "active",
+            "enabled": is_enabled_result.get("stdout", "").strip() == "enabled",
+            "status_output": status_result.get("stdout", ""),
+            "recent_logs": logs_result.get("stdout", "")
+        }
+    
+    def view_logs(
+        self,
+        unit: Optional[str] = None,
+        lines: int = 50,
+        priority: Optional[str] = None
+    ) -> Dict[str, Any]:
+        """View systemd journal logs"""
+        cmd_parts = ["journalctl", "--no-pager"]
+        
+        if unit:
+            cmd_parts.extend(["-u", unit])
+        
+        cmd_parts.extend(["-n", str(lines)])
+        
+        if priority:
+            cmd_parts.extend(["-p", priority])
+        
+        command = " ".join(cmd_parts)
+        result = self.execute_command(command)
+        
+        return {
+            "logs": result.get("stdout", ""),
+            "unit": unit,
+            "lines": lines,
+            "priority": priority
+        }
+    
+    def get_system_metrics(self) -> Dict[str, Any]:
+        """Get current system metrics"""
+        # CPU and load
+        uptime_result = self.execute_command("uptime")
+        # Memory
+        free_result = self.execute_command("free -h")
+        # Disk
+        df_result = self.execute_command("df -h")
+        
+        return {
+            "uptime": uptime_result.get("stdout", ""),
+            "memory": free_result.get("stdout", ""),
+            "disk": df_result.get("stdout", "")
+        }
+    
+    def get_hardware_info(self) -> Dict[str, Any]:
+        """Get comprehensive hardware information"""
+        hardware = {}
+        
+        # CPU info (use nix-shell for util-linux)
+        cpu_result = self.execute_command("nix-shell -p util-linux --run lscpu")
+        if cpu_result.get("success"):
+            hardware["cpu"] = cpu_result.get("stdout", "")
+        
+        # Memory details
+        mem_result = self.execute_command("free -h")
+        if mem_result.get("success"):
+            hardware["memory"] = mem_result.get("stdout", "")
+        
+        # GPU info (lspci for AMD/NVIDIA) - use nix-shell for pciutils
+        gpu_result = self.execute_command("nix-shell -p pciutils --run \"lspci | grep -i 'vga\\|3d\\|display'\"")
+        if gpu_result.get("success"):
+            hardware["gpu"] = gpu_result.get("stdout", "")
+        
+        # Detailed GPU
+        lspci_detailed = self.execute_command("nix-shell -p pciutils --run \"lspci -v | grep -A 20 -i 'vga\\|3d\\|display'\"")
+        if lspci_detailed.get("success"):
+            hardware["gpu_detailed"] = lspci_detailed.get("stdout", "")
+        
+        # Network interfaces
+        net_result = self.execute_command("ip link show")
+        if net_result.get("success"):
+            hardware["network_interfaces"] = net_result.get("stdout", "")
+        
+        # Network addresses
+        addr_result = self.execute_command("ip addr show")
+        if addr_result.get("success"):
+            hardware["network_addresses"] = addr_result.get("stdout", "")
+        
+        # Storage devices (use nix-shell for util-linux)
+        storage_result = self.execute_command("nix-shell -p util-linux --run \"lsblk -o NAME,SIZE,TYPE,MOUNTPOINT,FSTYPE\"")
+        if storage_result.get("success"):
+            hardware["storage"] = storage_result.get("stdout", "")
+        
+        # PCI devices (comprehensive)
+        pci_result = self.execute_command("nix-shell -p pciutils --run lspci")
+        if pci_result.get("success"):
+            hardware["pci_devices"] = pci_result.get("stdout", "")
+        
+        # USB devices
+        usb_result = self.execute_command("nix-shell -p usbutils --run lsusb")
+        if usb_result.get("success"):
+            hardware["usb_devices"] = usb_result.get("stdout", "")
+        
+        # DMI/SMBIOS info (motherboard, system)
+        dmi_result = self.execute_command("cat /sys/class/dmi/id/board_name /sys/class/dmi/id/board_vendor 2>/dev/null")
+        if dmi_result.get("success"):
+            hardware["motherboard"] = dmi_result.get("stdout", "")
+        
+        return hardware
+    
+    def get_gpu_metrics(self) -> Dict[str, Any]:
+        """Get GPU metrics (temperature, utilization, clocks, power)"""
+        metrics = {}
+        
+        # Try AMD GPU via sysfs (DRM/hwmon)
+        try:
+            # Find GPU hwmon directory
+            import glob
+            hwmon_dirs = glob.glob("/sys/class/drm/card*/device/hwmon/hwmon*")
+            
+            if hwmon_dirs:
+                hwmon_path = hwmon_dirs[0]
+                amd_metrics = {}
+                
+                # Temperature
+                temp_files = glob.glob(f"{hwmon_path}/temp*_input")
+                for temp_file in temp_files:
+                    try:
+                        with open(temp_file, 'r') as f:
+                            temp_millidegrees = int(f.read().strip())
+                            temp_celsius = temp_millidegrees / 1000
+                            label = temp_file.split('/')[-1].replace('_input', '')
+                            amd_metrics[f"{label}_celsius"] = temp_celsius
+                    except:
+                        pass
+                
+                # GPU busy percent (utilization)
+                gpu_busy_file = f"{hwmon_path.replace('/hwmon/hwmon', '')}/gpu_busy_percent"
+                try:
+                    with open(gpu_busy_file, 'r') as f:
+                        amd_metrics["gpu_utilization_percent"] = int(f.read().strip())
+                except:
+                    pass
+                
+                # Power usage
+                power_files = glob.glob(f"{hwmon_path}/power*_average")
+                for power_file in power_files:
+                    try:
+                        with open(power_file, 'r') as f:
+                            power_microwatts = int(f.read().strip())
+                            power_watts = power_microwatts / 1000000
+                            amd_metrics["power_watts"] = power_watts
+                    except:
+                        pass
+                
+                # Clock speeds
+                sclk_file = f"{hwmon_path.replace('/hwmon/hwmon', '')}/pp_dpm_sclk"
+                try:
+                    with open(sclk_file, 'r') as f:
+                        sclk_data = f.read()
+                        amd_metrics["gpu_clocks"] = sclk_data.strip()
+                except:
+                    pass
+                
+                if amd_metrics:
+                    metrics["amd_gpu"] = amd_metrics
+        except Exception as e:
+            metrics["amd_sysfs_error"] = str(e)
+        
+        # Try rocm-smi for AMD
+        rocm_result = self.execute_command("nix-shell -p rocmPackages.rocm-smi --run 'rocm-smi --showtemp --showuse --showpower'")
+        if rocm_result.get("success"):
+            metrics["rocm_smi"] = rocm_result.get("stdout", "")
+        
+        # Try nvidia-smi for NVIDIA
+        nvidia_result = self.execute_command("nix-shell -p linuxPackages.nvidia_x11 --run 'nvidia-smi --query-gpu=temperature.gpu,utilization.gpu,power.draw,clocks.gr --format=csv'")
+        if nvidia_result.get("success") and "NVIDIA" in nvidia_result.get("stdout", ""):
+            metrics["nvidia_smi"] = nvidia_result.get("stdout", "")
+        
+        # Fallback: try sensors command
+        if not metrics.get("amd_gpu") and not metrics.get("nvidia_smi"):
+            sensors_result = self.execute_command("nix-shell -p lm_sensors --run sensors")
+            if sensors_result.get("success"):
+                metrics["sensors"] = sensors_result.get("stdout", "")
+        
+        return metrics
+    
+    def list_directory(
+        self,
+        directory_path: str,
+        show_hidden: bool = False
+    ) -> Dict[str, Any]:
+        """List directory contents"""
+        cmd = f"ls -lh"
+        if show_hidden:
+            cmd += "a"
+        cmd += f" {directory_path}"
+        
+        result = self.execute_command(cmd)
+        
+        return {
+            "success": result.get("success", False),
+            "directory": directory_path,
+            "listing": result.get("stdout", ""),
+            "error": result.get("error")
+        }
+    
+    def check_network(self, host: str, method: str = "ping") -> Dict[str, Any]:
+        """Check network connectivity"""
+        if method == "ping":
+            cmd = f"ping -c 3 -W 2 {host}"
+        elif method == "http":
+            cmd = f"curl -I -m 5 {host}"
+        else:
+            return {
+                "success": False,
+                "error": f"Unknown method: {method}"
+            }
+        
+        result = self.execute_command(cmd, timeout=10)
+        
+        return {
+            "host": host,
+            "method": method,
+            "reachable": result.get("success", False),
+            "output": result.get("stdout", ""),
+            "error": result.get("stderr", "")
+        }
+    
+    def retrieve_cached_output(self, cache_id: str, max_chars: int = 10000) -> Dict[str, Any]:
+        """Retrieve full cached output from a previous tool call"""
+        cache_dir = Path("/var/lib/macha/tool_cache")
+        cache_file = cache_dir / f"{cache_id}.txt"
+        
+        if not cache_file.exists():
+            return {
+                "success": False,
+                "error": f"Cache file not found: {cache_id}",
+                "hint": "Check that the cache_id matches exactly what was shown in the summary"
+            }
+        
+        try:
+            content = cache_file.read_text()
+            
+            # Truncate if still too large for context
+            if len(content) > max_chars:
+                half = max_chars // 2
+                content = (
+                    content[:half] + 
+                    f"\n... [SHOWING {max_chars} of {len(content)} chars] ...\n" +
+                    content[-half:]
+                )
+            
+            return {
+                "success": True,
+                "cache_id": cache_id,
+                "size": len(cache_file.read_text()),  # Original size
+                "content": content
+            }
+        except Exception as e:
+            return {
+                "success": False,
+                "error": f"Failed to read cache: {str(e)}"
+            }
+    
+    def send_notification(self, title: str, message: str, priority: int = 5) -> Dict[str, Any]:
+        """Send a notification to the user via Gotify using macha-notify command"""
+        try:
+            # Use the macha-notify command which handles Gotify integration
+            result = subprocess.run(
+                ['macha-notify', title, message, str(priority)],
+                capture_output=True,
+                text=True,
+                timeout=10
+            )
+            
+            if result.returncode == 0:
+                return {
+                    "success": True,
+                    "title": title,
+                    "message": message,
+                    "priority": priority,
+                    "output": result.stdout.strip() if result.stdout else "Notification sent successfully"
+                }
+            else:
+                return {
+                    "success": False,
+                    "error": f"macha-notify failed: {result.stderr.strip() if result.stderr else 'Unknown error'}",
+                    "hint": "Check if Gotify is configured (gotifyUrl and gotifyToken in module config)"
+                }
+        except FileNotFoundError:
+            return {
+                "success": False,
+                "error": "macha-notify command not found",
+                "hint": "This should not happen - macha-notify is installed by the module"
+            }
+        except subprocess.TimeoutExpired:
+            return {
+                "success": False,
+                "error": "Notification send timeout (10s)"
+            }
+        except Exception as e:
+            return {
+                "success": False,
+                "error": f"Unexpected error sending notification: {str(e)}"
+            }
+    
+    def execute_tool(self, tool_name: str, arguments: Dict[str, Any]) -> Dict[str, Any]:
+        """Execute a tool by name with given arguments"""
+        tool_map = {
+            "execute_command": self.execute_command,
+            "read_file": self.read_file,
+            "check_service_status": self.check_service_status,
+            "view_logs": self.view_logs,
+            "get_system_metrics": self.get_system_metrics,
+            "get_hardware_info": self.get_hardware_info,
+            "get_gpu_metrics": self.get_gpu_metrics,
+            "list_directory": self.list_directory,
+            "check_network": self.check_network,
+            "retrieve_cached_output": self.retrieve_cached_output,
+            "send_notification": self.send_notification
+        }
+        
+        tool_func = tool_map.get(tool_name)
+        if not tool_func:
+            return {
+                "success": False,
+                "error": f"Unknown tool: {tool_name}"
+            }
+        
+        try:
+            return tool_func(**arguments)
+        except Exception as e:
+            return {
+                "success": False,
+                "error": f"Tool execution failed: {str(e)}",
+                "tool": tool_name,
+                "arguments": arguments
+            }
+