Problem
Capturing tasks and ideas on the go means unlocking your phone, opening an app, navigating to the right list, and typing. By the time you are done, the thought is half-forgotten. You need a single physical button press that records your voice, transcribes it, and routes it into your knowledge base automatically.
Solution
Map the iPhone Action Button to a Shortcut that records audio, saves it to iCloud, triggers a bot to transcribe and process it, syncs the result to Obsidian, and cleans up the voice file.
1. Create the iPhone Shortcut
Shortcut: "Quick Capture"
1. Record Audio (stop on tap)
2. Save File
- Service: iCloud Drive
- Path: /Shortcuts/voice-inbox/
- Filename: capture-{Current Date}.m4a
3. Show Notification: "Voice captured"
Assign to the Action Button: Settings > Action Button > Shortcut > "Quick Capture"
2. Watch the voice inbox folder and transcribe
#!/usr/bin/env python3
# watch_voice_inbox.py
import os
import time
from pathlib import Path
from watchdog.observers import Observer
from watchdog.events import FileSystemEventHandler
import whisper
VOICE_DIR = Path.home() / "Library/Mobile Documents/com~apple~CloudDocs/Shortcuts/voice-inbox"
OUTPUT_DIR = Path.home() / "vault/00-inbox/monologues"
model = whisper.load_model("base")
class VoiceHandler(FileSystemEventHandler):
def on_created(self, event):
if not event.src_path.endswith(".m4a"):
return
filepath = Path(event.src_path)
time.sleep(2) # Wait for iCloud sync
# Transcribe
result = model.transcribe(str(filepath))
text = result["text"].strip()
# Create markdown note
date = time.strftime("%Y-%m-%d")
note_name = f"{date}-{filepath.stem}.md"
note_path = OUTPUT_DIR / note_name
note_path.write_text(
f"---\ntitle: Voice capture {date}\ntags: [voice, inbox]\n"
f"created: {date}\n---\n\n{text}\n"
)
# Clean up voice file
filepath.unlink()
print(f"Processed: {note_name}")
if __name__ == "__main__":
OUTPUT_DIR.mkdir(parents=True, exist_ok=True)
observer = Observer()
observer.schedule(VoiceHandler(), str(VOICE_DIR), recursive=False)
observer.start()
print(f"Watching {VOICE_DIR}")
observer.join()
3. Let the AI agent process inbox items
<!-- skills/process-inbox.md -->
---
name: process-inbox
description: Process voice captures from the inbox
---
Check ~/vault/00-inbox/monologues/ for new voice transcriptions.
For each unprocessed note:
1. Extract actionable tasks and add to 00-inbox/tasks/
2. Extract ideas and add to 02-concepts/ in the right subdirectory
3. Move the original note to 01-archive/
4. Commit changes to git
Why It Works
The iPhone Action Button provides a hardware shortcut that works from the lock screen -- zero taps to start recording. iCloud Drive syncs the audio file to your Mac in seconds. The watcher script runs continuously, transcribing with Whisper (local, no API needed) and writing structured markdown into your Obsidian vault. The AI agent then triages the transcription into tasks, ideas, or notes during its scheduled processing cycle.
Context
- Aaron Vanston uses Wispr Flow for voice-to-text when prompting AI, combining with screenshots for faster iteration than browser MCPs
- Nvidia's voice models are reported as better than Elevenlabs or Whisper for transcription quality
- The same pattern works with a wrist tap on Apple Watch using a Shortcut complication
- For faster transcription, swap Whisper for
mlx-whisperwhich runs optimized on Apple Silicon - ClawdBot users can trigger this via WhatsApp voice messages instead of the Shortcut approach
- The key insight is separating capture (instant, low-friction) from processing (async, AI-powered)