One-tap voice-to-task pipeline with iPhone Shortcuts and Obsidian

Problem

Capturing tasks and ideas on the go means unlocking your phone, opening an app, navigating to the right list, and typing. By the time you are done, the thought is half-forgotten. You need a single physical button press that records your voice, transcribes it, and routes it into your knowledge base automatically.

Solution

Map the iPhone Action Button to a Shortcut that records audio, saves it to iCloud, triggers a bot to transcribe and process it, syncs the result to Obsidian, and cleans up the voice file.

1. Create the iPhone Shortcut

Shortcut: "Quick Capture"

1. Record Audio (stop on tap)
2. Save File
   - Service: iCloud Drive
   - Path: /Shortcuts/voice-inbox/
   - Filename: capture-{Current Date}.m4a
3. Show Notification: "Voice captured"

Assign to the Action Button: Settings > Action Button > Shortcut > "Quick Capture"

2. Watch the voice inbox folder and transcribe

#!/usr/bin/env python3
# watch_voice_inbox.py
import os
import time
from pathlib import Path
from watchdog.observers import Observer
from watchdog.events import FileSystemEventHandler
import whisper

VOICE_DIR = Path.home() / "Library/Mobile Documents/com~apple~CloudDocs/Shortcuts/voice-inbox"
OUTPUT_DIR = Path.home() / "vault/00-inbox/monologues"

model = whisper.load_model("base")

class VoiceHandler(FileSystemEventHandler):
    def on_created(self, event):
        if not event.src_path.endswith(".m4a"):
            return

        filepath = Path(event.src_path)
        time.sleep(2)  # Wait for iCloud sync

        # Transcribe
        result = model.transcribe(str(filepath))
        text = result["text"].strip()

        # Create markdown note
        date = time.strftime("%Y-%m-%d")
        note_name = f"{date}-{filepath.stem}.md"
        note_path = OUTPUT_DIR / note_name

        note_path.write_text(
            f"---\ntitle: Voice capture {date}\ntags: [voice, inbox]\n"
            f"created: {date}\n---\n\n{text}\n"
        )

        # Clean up voice file
        filepath.unlink()
        print(f"Processed: {note_name}")

if __name__ == "__main__":
    OUTPUT_DIR.mkdir(parents=True, exist_ok=True)
    observer = Observer()
    observer.schedule(VoiceHandler(), str(VOICE_DIR), recursive=False)
    observer.start()
    print(f"Watching {VOICE_DIR}")
    observer.join()

3. Let the AI agent process inbox items

<!-- skills/process-inbox.md -->
---
name: process-inbox
description: Process voice captures from the inbox
---

Check ~/vault/00-inbox/monologues/ for new voice transcriptions.
For each unprocessed note:
1. Extract actionable tasks and add to 00-inbox/tasks/
2. Extract ideas and add to 02-concepts/ in the right subdirectory
3. Move the original note to 01-archive/
4. Commit changes to git

Why It Works

The iPhone Action Button provides a hardware shortcut that works from the lock screen -- zero taps to start recording. iCloud Drive syncs the audio file to your Mac in seconds. The watcher script runs continuously, transcribing with Whisper (local, no API needed) and writing structured markdown into your Obsidian vault. The AI agent then triages the transcription into tasks, ideas, or notes during its scheduled processing cycle.

Context

Aaron Vanston uses Wispr Flow for voice-to-text when prompting AI, combining with screenshots for faster iteration than browser MCPs
Nvidia's voice models are reported as better than Elevenlabs or Whisper for transcription quality
The same pattern works with a wrist tap on Apple Watch using a Shortcut complication
For faster transcription, swap Whisper for mlx-whisper which runs optimized on Apple Silicon
ClawdBot users can trigger this via WhatsApp voice messages instead of the Shortcut approach
The key insight is separating capture (instant, low-friction) from processing (async, AI-powered)