Problem
You have 250+ headshot images that need consistent processing: detect the face, crop to a specific aspect ratio keeping the face centered, and optionally remove the background. Doing this manually in Photoshop or Figma would take a designer two weeks. You need an automated pipeline that handles the full batch in minutes.
Solution
Combine two tools: a Swift binary using macOS Vision Framework for fast face detection and cropping, and a Python script using HuggingFace models for background removal.
Step 1: Face detection and cropping with Swift
Compile a Swift binary that uses the native Vision framework:
import Vision
import AppKit
func detectAndCrop(imagePath: String, outputPath: String, size: CGSize) throws {
guard let image = NSImage(contentsOfFile: imagePath),
let cgImage = image.cgImage(forProposedRect: nil, context: nil, hints: nil) else {
throw ImageError.loadFailed
}
let request = VNDetectFaceRectanglesRequest()
let handler = VNImageRequestHandler(cgImage: cgImage, options: [:])
try handler.perform([request])
guard let face = (request.results as? [VNFaceObservation])?.first else {
print("No face found in \(imagePath)")
return
}
let imageWidth = CGFloat(cgImage.width)
let imageHeight = CGFloat(cgImage.height)
// Center crop around detected face
let faceCenterX = face.boundingBox.midX * imageWidth
let faceCenterY = (1 - face.boundingBox.midY) * imageHeight
let cropRect = CGRect(
x: faceCenterX - size.width / 2,
y: faceCenterY - size.height / 2,
width: size.width,
height: size.height
).clamped(to: CGRect(origin: .zero, size: CGSize(width: imageWidth, height: imageHeight)))
let cropped = cgImage.cropping(to: cropRect)!
let rep = NSBitmapImageRep(cgImage: cropped)
let data = rep.representation(using: .png, properties: [:])!
try data.write(to: URL(fileURLWithPath: outputPath))
}
Build and run:
swiftc face_crop.swift -o face_crop -framework Vision -framework AppKit
./face_crop --input ./photos --output ./cropped --size 400x500
Step 2: Background removal with Python
Use a PEP-723 script with embedded dependencies for zero-config setup:
# /// script
# requires-python = ">=3.11"
# dependencies = ["transformers", "torch", "pillow"]
# ///
from transformers import pipeline
from PIL import Image
from pathlib import Path
from multiprocessing import Pool
import argparse
remover = pipeline("image-segmentation", model="briaai/RMBG-1.4", trust_remote_code=True)
def remove_bg(path: str) -> None:
img = Image.open(path)
result = remover(img)
mask = result[0]["mask"]
img.putalpha(mask)
output = Path(path).parent / f"{Path(path).stem}_nobg.png"
img.save(output)
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument("input_dir", type=Path)
parser.add_argument("--workers", type=int, default=4)
args = parser.parse_args()
files = [str(p) for p in args.input_dir.glob("*.png")]
with Pool(args.workers) as pool:
pool.map(remove_bg, files)
Run with:
uv run remove_bg.py ./cropped --workers 8
Why It Works
The macOS Vision framework runs face detection on-device using Apple's optimized neural engine -- it processes hundreds of images in seconds without any external API calls or model downloads. The face bounding box coordinates let you compute a centered crop programmatically. For background removal, the HuggingFace segmentation model runs locally on CPU, keeping the entire pipeline offline and free. Using multiprocessing.Pool parallelizes the slower ML inference across CPU cores. Together, the two scripts replace two weeks of manual designer work with a few minutes of compute.
Context
- The Vision framework only detects human faces -- it will not work for animal photos
- Swift binary compiles in seconds and can be distributed via Homebrew for team use
- The background removal model downloads 200-300MB on first run, then caches locally
- PEP-723 inline metadata means
uv runhandles dependency installation automatically - For very large batches, add
--pageand--page-sizeCLI args to process in paginated sets - The Swift face detection approach also works for generating consistent avatar crops for user profiles