Bulk face detection and cropping with macOS Vision Framework and Python

Problem

You have 250+ headshot images that need consistent processing: detect the face, crop to a specific aspect ratio keeping the face centered, and optionally remove the background. Doing this manually in Photoshop or Figma would take a designer two weeks. You need an automated pipeline that handles the full batch in minutes.

Solution

Combine two tools: a Swift binary using macOS Vision Framework for fast face detection and cropping, and a Python script using HuggingFace models for background removal.

Step 1: Face detection and cropping with Swift

Compile a Swift binary that uses the native Vision framework:

import Vision
import AppKit

func detectAndCrop(imagePath: String, outputPath: String, size: CGSize) throws {
    guard let image = NSImage(contentsOfFile: imagePath),
          let cgImage = image.cgImage(forProposedRect: nil, context: nil, hints: nil) else {
        throw ImageError.loadFailed
    }

    let request = VNDetectFaceRectanglesRequest()
    let handler = VNImageRequestHandler(cgImage: cgImage, options: [:])
    try handler.perform([request])

    guard let face = (request.results as? [VNFaceObservation])?.first else {
        print("No face found in \(imagePath)")
        return
    }

    let imageWidth = CGFloat(cgImage.width)
    let imageHeight = CGFloat(cgImage.height)

    // Center crop around detected face
    let faceCenterX = face.boundingBox.midX * imageWidth
    let faceCenterY = (1 - face.boundingBox.midY) * imageHeight
    let cropRect = CGRect(
        x: faceCenterX - size.width / 2,
        y: faceCenterY - size.height / 2,
        width: size.width,
        height: size.height
    ).clamped(to: CGRect(origin: .zero, size: CGSize(width: imageWidth, height: imageHeight)))

    let cropped = cgImage.cropping(to: cropRect)!
    let rep = NSBitmapImageRep(cgImage: cropped)
    let data = rep.representation(using: .png, properties: [:])!
    try data.write(to: URL(fileURLWithPath: outputPath))
}

Build and run:

swiftc face_crop.swift -o face_crop -framework Vision -framework AppKit
./face_crop --input ./photos --output ./cropped --size 400x500

Step 2: Background removal with Python

Use a PEP-723 script with embedded dependencies for zero-config setup:

# /// script
# requires-python = ">=3.11"
# dependencies = ["transformers", "torch", "pillow"]
# ///

from transformers import pipeline
from PIL import Image
from pathlib import Path
from multiprocessing import Pool
import argparse

remover = pipeline("image-segmentation", model="briaai/RMBG-1.4", trust_remote_code=True)

def remove_bg(path: str) -> None:
    img = Image.open(path)
    result = remover(img)
    mask = result[0]["mask"]
    img.putalpha(mask)
    output = Path(path).parent / f"{Path(path).stem}_nobg.png"
    img.save(output)

if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument("input_dir", type=Path)
    parser.add_argument("--workers", type=int, default=4)
    args = parser.parse_args()

    files = [str(p) for p in args.input_dir.glob("*.png")]
    with Pool(args.workers) as pool:
        pool.map(remove_bg, files)

Run with:

uv run remove_bg.py ./cropped --workers 8

Why It Works

The macOS Vision framework runs face detection on-device using Apple's optimized neural engine -- it processes hundreds of images in seconds without any external API calls or model downloads. The face bounding box coordinates let you compute a centered crop programmatically. For background removal, the HuggingFace segmentation model runs locally on CPU, keeping the entire pipeline offline and free. Using multiprocessing.Pool parallelizes the slower ML inference across CPU cores. Together, the two scripts replace two weeks of manual designer work with a few minutes of compute.

Context

The Vision framework only detects human faces -- it will not work for animal photos
Swift binary compiles in seconds and can be distributed via Homebrew for team use
The background removal model downloads 200-300MB on first run, then caches locally
PEP-723 inline metadata means uv run handles dependency installation automatically
For very large batches, add --page and --page-size CLI args to process in paginated sets
The Swift face detection approach also works for generating consistent avatar crops for user profiles