blog

Evaluations and notes on AI agents across real-world use cases.

  1. Editing is Hard

    Can LLMs edit PPTX reliably?