Initial commit: Split Macha autonomous system into separate flake

Macha is now a standalone NixOS flake that can be imported into other
systems. This provides:

- Independent versioning
- Easier reusability
- Cleaner separation of concerns
- Better development workflow

Includes:
- Complete autonomous system code
- NixOS module with full configuration options
- Queue-based architecture with priority system
- Chunked map-reduce for large outputs
- ChromaDB knowledge base
- Tool calling system
- Multi-host SSH management
- Gotify notification integration

All capabilities from DESIGN.md are preserved.
This commit is contained in:
Lily Miller
2025-10-06 14:32:37 -06:00
commit 22ba493d9e
30 changed files with 10306 additions and 0 deletions

229
QUICKSTART.md Normal file
View File

@@ -0,0 +1,229 @@
# Macha Autonomous System - Quick Start Guide
## What is This?
Macha now has a self-maintenance system that uses local AI (via Ollama) to monitor, analyze, and maintain itself. Think of it as a 24/7 system administrator that watches over Macha.
## How It Works
1. **Monitor**: Every 5 minutes, collects system health data (services, resources, logs, etc.)
2. **Analyze**: Uses llama3.1:70b to analyze the data and detect issues
3. **Act**: Based on autonomy level, either proposes fixes or executes them automatically
4. **Learn**: Logs all decisions and actions for auditing and improvement
## Autonomy Levels
### `observe` - Monitoring Only
- Monitors system health
- Logs everything
- Takes NO actions
- Good for: Testing, learning what the system sees
### `suggest` - Approval Required (DEFAULT)
- Monitors and analyzes
- Proposes fixes
- Requires manual approval before executing
- Good for: Production use, when you want control
### `auto-safe` - Limited Autonomy
- Auto-executes "safe" actions:
- Restarting failed services
- Disk cleanup
- Log rotation
- Read-only diagnostics
- Asks approval for risky changes
- Good for: Hands-off operation with safety net
### `auto-full` - Full Autonomy
- Auto-executes most actions
- Still requires approval for HIGH RISK actions
- Never touches protected services (SSH, networking, etc.)
- Good for: Experimental, when you trust the system
## Commands
### Check the status
```bash
# View the service status
systemctl status macha-autonomous
# View live logs
macha-logs service
# View AI decision log
macha-logs decisions
# View action execution log
macha-logs actions
# View orchestrator log
macha-logs orchestrator
```
### Run a manual check
```bash
# Run one maintenance cycle now
macha-check
```
### Approval workflow (when autonomyLevel = "suggest")
```bash
# List pending actions awaiting approval
macha-approve list
# Approve action number 0
macha-approve approve 0
```
### Change autonomy level
Edit `/home/lily/Documents/nixos-servers/systems/macha.nix`:
```nix
services.macha-autonomous = {
enable = true;
autonomyLevel = "auto-safe"; # Change this
checkInterval = 300;
model = "llama3.1:70b";
};
```
Then rebuild:
```bash
sudo nixos-rebuild switch --flake .#macha
```
## What Can It Do?
### Automatically Detects
- Failed systemd services
- High resource usage (CPU, RAM, disk)
- Recent errors in logs
- Network connectivity issues
- Disk space problems
- Boot/uptime anomalies
### Can Propose/Execute
- Restart failed services
- Clean up disk space (nix store, old logs)
- Investigate issues (run diagnostics)
- Propose configuration changes (for manual review)
- NixOS rebuilds (with safety checks)
### Safety Features
- **Protected services**: Never touches SSH, networking, systemd core
- **Dry-run testing**: Tests NixOS rebuilds before applying
- **Action logging**: Every action is logged with context
- **Rollback capability**: Can revert changes
- **Rate limiting**: Won't spam actions
- **Human override**: You can always disable or intervene
## Example Workflow
1. **System detects failed service**
```
Monitor: "ollama.service is failed"
AI Agent: "The ollama service crashed. Propose restarting it."
```
2. **In `suggest` mode (default)**
```
Executor: "Action queued for approval"
You: Run `macha-approve list`
You: Review the proposed action
You: Run `macha-approve approve 0`
Executor: Restarts the service
```
3. **In `auto-safe` mode**
```
Executor: "Low risk action, auto-executing"
Executor: Restarts the service automatically
You: Check logs later to see what happened
```
## Monitoring the System
All data is stored in `/var/lib/macha-autonomous/`:
- `orchestrator.log` - Main system log
- `decisions.jsonl` - AI analysis decisions (JSON Lines format)
- `actions.jsonl` - Executed actions log
- `snapshot_*.json` - System state snapshots
- `approval_queue.json` - Pending actions
## Tips
1. **Start with `suggest` mode** - Get comfortable with what it proposes
2. **Review the logs** - See what it's detecting and proposing
3. **Graduate to `auto-safe`** - Let it handle routine maintenance
4. **Use `observe` for debugging** - If something seems wrong
5. **Check approval queue regularly** - If using `suggest` mode
## Troubleshooting
### Service won't start
```bash
# Check for errors
journalctl -u macha-autonomous -n 50
# Verify Ollama is running
systemctl status ollama
# Test Ollama manually
curl http://localhost:11434/api/generate -d '{"model": "llama3.1:70b", "prompt": "test"}'
```
### AI making bad decisions
- Switch to `observe` mode to stop actions
- Review `decisions.jsonl` to see reasoning
- File an issue or adjust prompts in `agent.py`
### Want to disable temporarily
```bash
sudo systemctl stop macha-autonomous
```
### Want to disable permanently
Edit `systems/macha.nix`:
```nix
services.macha-autonomous.enable = false;
```
Then rebuild.
## Architecture
```
┌─────────────────────────────────────────────────────────┐
│ Orchestrator │
│ (Main loop, runs every 5 minutes) │
└────────────┬──────────────┬──────────────┬──────────────┘
│ │ │
┌───▼────┐ ┌────▼────┐ ┌────▼─────┐
│Monitor │ │ Agent │ │ Executor │
│ │───▶│ (AI) │───▶│ (Safe) │
└────────┘ └─────────┘ └──────────┘
│ │ │
Collects Analyzes Executes
System Issues Actions
Health w/ LLM Safely
```
## Future Enhancements
Potential future capabilities:
- Integration with MCP servers (already installed!)
- Predictive maintenance (learning from patterns)
- Self-optimization (tuning configs based on usage)
- Cluster management (if you add more systems)
- Automated backups and disaster recovery
- Security monitoring and hardening
- Performance tuning recommendations
## Philosophy
The goal is a system that maintains itself while being:
1. **Safe** - Never breaks critical functionality
2. **Transparent** - All decisions are logged and explainable
3. **Conservative** - When in doubt, ask for approval
4. **Learning** - Gets better over time
5. **Human-friendly** - Easy to understand and override
Macha is here to help you, not replace you!