Archive Claude Code session logs and query them with DuckDB

Problem

Claude Code stores session data as JSONL files at ~/.claude/projects/{project-path-kebab-cased}/{session-uuid}.jsonl. These files contain every message, tool call, and token usage metric for each session. However, Claude Code purges files older than 30 days, which means you lose long-term history needed for cost analysis, productivity tracking, and building self-improving agent workflows.

Solution

Step 1: Set up automated backup with crontab and rsync

# Add to crontab with: crontab -e
# Runs every hour, preserving all session files indefinitely
0 * * * * rsync -a --ignore-existing ~/.claude/projects/ ~/claude-session-archive/

Step 2: Query archived sessions with DuckDB

DuckDB can read recursive directories of JSONL files as a single table, making it easy to run analytics across all sessions.

-- Load all session logs as a single table
SELECT *
FROM read_json(
  '~/claude-session-archive/**/*.jsonl',
  format = 'newline_delimited',
  union_by_name = true
)
LIMIT 10;

Step 3: Build useful analytics queries

-- Total tokens and estimated cost per project
SELECT
  regexp_extract(filename, 'projects/([^/]+)/', 1) AS project,
  SUM(CAST(message->>'input_tokens' AS INTEGER)) AS total_input_tokens,
  SUM(CAST(message->>'output_tokens' AS INTEGER)) AS total_output_tokens,
  COUNT(DISTINCT regexp_extract(filename, '([^/]+)\.jsonl$', 1)) AS sessions
FROM read_json(
  '~/claude-session-archive/**/*.jsonl',
  format = 'newline_delimited',
  union_by_name = true,
  filename = true
)
WHERE message->>'type' = 'usage'
GROUP BY project
ORDER BY total_input_tokens DESC;

Step 4: Create a Claude Code skill for self-reflection

# .claude/commands/introspect.md
# Run DuckDB queries against your own session history to find patterns,
# repeated mistakes, and areas for improvement.

Query the session archive at ~/claude-session-archive/ using DuckDB.
Analyze recent sessions for: repeated errors, tool usage patterns,
average turns per task, and common failure modes.
Use `duckdb -c "<query>"` to run queries.

Why It Works

DuckDB treats any directory tree of JSONL files as a queryable table with zero setup -- no database server, no import step, no schema definition. The read_json function with union_by_name = true handles schema variations across different session file formats gracefully. By using rsync --ignore-existing, the cron job only copies new files and never overwrites, keeping the backup idempotent and fast. This gives you a permanent, queryable archive that tools like ccusage can also read for cost metrics.

Context

ccusage (a community tool for Claude Code cost tracking) reads the same JSONL session files, so the archive is compatible with it
The --ignore-existing flag on rsync ensures already-archived files are never modified, even if the source changes
DuckDB runs as a single binary with no server process, making it ideal for local analytics
Session files contain tool calls, messages, thinking blocks, and usage metadata -- enough data for detailed productivity analysis
The introspect skill pattern lets Claude Code query its own history for self-improving agent loops