Three LLMs, One App: Balancing Speed, Privacy, and Power

I spent a weekend fine-tuning a model for my knowledge management app, designed to handle notes, PDFs, and presentations with Oracle Database 23ai’s vector search (see my management AI post). It aced testing on my RTX 5090 server, but on my M2 MacBook Pro? Barely usable. A query like “Summarize last week’s customer meetings and identify risks” took over a minute, leaving me staring at a spinning wheel while my coffee got cold. ...

October 28, 2025 · 6 min · Brian Hengen

CLIP Inside Oracle AI Database 26ai: Fast, Multimodal RAG

After the 3-way LLM toggle went live, I turned my attention to embeddings - the invisible glue that powers search and RAG. Oracle OCI GenAI’s Cohere endpoint had been rock-solid in my testing: fast, reliable, and gave me 80 K token context. But every chunk still meant a network round-trip, and images were stuck behind OCR, so text-only embeddings meant photos, diagrams, and whiteboards were blind spots in my knowledge base. ...

November 11, 2024 · 11 min · Brian Hengen

Subscribe to New Posts

Get notified when I publish new articles about AI/ML training and workstation builds.