Problem
Claude Code stores session data as JSONL files at ~/.claude/projects/{project-path-kebab-cased}/{session-uuid}.jsonl. These files contain every message, tool call, and token usage metric for each session. However, Claude Code purges files older than 30 days, which means you lose long-term history needed for cost analysis, productivity tracking, and building self-improving agent workflows.
Solution
Step 1: Set up automated backup with crontab and rsync
# Add to crontab with: crontab -e
# Runs every hour, preserving all session files indefinitely
0 * * * * rsync -a --ignore-existing ~/.claude/projects/ ~/claude-session-archive/
Step 2: Query archived sessions with DuckDB
DuckDB can read recursive directories of JSONL files as a single table, making it easy to run analytics across all sessions.
-- Load all session logs as a single table
SELECT *
FROM read_json(
'~/claude-session-archive/**/*.jsonl',
format = 'newline_delimited',
union_by_name = true
)
LIMIT 10;
Step 3: Build useful analytics queries
-- Total tokens and estimated cost per project
SELECT
regexp_extract(filename, 'projects/([^/]+)/', 1) AS project,
SUM(CAST(message->>'input_tokens' AS INTEGER)) AS total_input_tokens,
SUM(CAST(message->>'output_tokens' AS INTEGER)) AS total_output_tokens,
COUNT(DISTINCT regexp_extract(filename, '([^/]+)\.jsonl$', 1)) AS sessions
FROM read_json(
'~/claude-session-archive/**/*.jsonl',
format = 'newline_delimited',
union_by_name = true,
filename = true
)
WHERE message->>'type' = 'usage'
GROUP BY project
ORDER BY total_input_tokens DESC;
Step 4: Create a Claude Code skill for self-reflection
# .claude/commands/introspect.md
# Run DuckDB queries against your own session history to find patterns,
# repeated mistakes, and areas for improvement.
Query the session archive at ~/claude-session-archive/ using DuckDB.
Analyze recent sessions for: repeated errors, tool usage patterns,
average turns per task, and common failure modes.
Use `duckdb -c "<query>"` to run queries.
Why It Works
DuckDB treats any directory tree of JSONL files as a queryable table with zero setup -- no database server, no import step, no schema definition. The read_json function with union_by_name = true handles schema variations across different session file formats gracefully. By using rsync --ignore-existing, the cron job only copies new files and never overwrites, keeping the backup idempotent and fast. This gives you a permanent, queryable archive that tools like ccusage can also read for cost metrics.
Context
- ccusage (a community tool for Claude Code cost tracking) reads the same JSONL session files, so the archive is compatible with it
- The
--ignore-existingflag on rsync ensures already-archived files are never modified, even if the source changes - DuckDB runs as a single binary with no server process, making it ideal for local analytics
- Session files contain tool calls, messages, thinking blocks, and usage metadata -- enough data for detailed productivity analysis
- The introspect skill pattern lets Claude Code query its own history for self-improving agent loops