Overview
Unico Connect is seeking a Senior AI Engineer to architect and deliver production AI systems for enterprise clients across the United States, European Union, and Asia-Pacific markets. This role leads multi-agent orchestration, large-scale Retrieval-Augmented Generation (RAG), evaluation infrastructure, and the engineering practices that take artificial intelligence from proof of concept to compliance-ready production. The position suits engineers with a demonstrated track record of shipping production LLM systems and the technical maturity to debug them when they fail.
Key Responsibilities
- Architect production AI systems including multi-agent orchestration, retrieval pipelines, evaluation infrastructure, and observability, with documented design rationale prior to implementation
- Lead RAG implementations at enterprise scale: chunking strategy, hybrid search, reranking, caching layers, and multi-tenant index architecture
- Design model-routing and tiering logic that balances cost, latency, and quality, including build-versus-buy decisions for self-hosted versus frontier model APIs
- Build evaluation harnesses combining golden test sets, LLM-as-judge pipelines, behavioural metrics, and regression detection that gate production deployments
- Mentor 2-3 AI Engineers; conduct AI-specific code reviews where evaluation scores, not unit test pass rates, define the quality bar
- Drive infrastructure decisions across Google Cloud Vertex AI, AWS Bedrock, Modal, and self-hosted serving platforms based on workload, compliance, and cost constraints
- Partner with the Tech Lead, Product Managers, and Senior Backend Engineers on the AI roadmap; negotiate scope when engineering reality requires it
- Deliver voice AI, computer vision, and multimodal features as project requirements demand; the role is not limited to text-based LLMs
- Lead production incident response when agents misroute tool calls or retrieval pipelines surface poisoned context
- Drive AI engineering standards across the firm: prompt versioning, evaluation coverage thresholds, observability baselines, incident post-mortems
- Contribute 1 open-source patch, technical write-up, or internal evaluation framework improvement each quarter
- Represent Unico Connect's AI engineering capabilities in technical conversations with clients, prospects, and architecture review sessions
Required Qualifications
- 5+ years of professional software engineering experience, with at least 2 years dedicated to production AI / ML systems
- Multiple production LLM products shipped with measurable business outcomes; able to articulate the metrics moved and architecture trade-offs accepted
- Deep proficiency in Python including FastAPI, async patterns, streaming responses, and typed code with Pydantic at production scale
- Strong fundamentals in vector retrieval, embedding models, hybrid search architectures, and reranking strategies
- Hands-on experience with at least 2 of LangChain, LangGraph, LlamaIndex, Haystack, or comparable custom orchestration frameworks
- Demonstrated track record with evaluation tooling such as LangSmith, Ragas, or Phoenix, with the discipline to catch regressions before users report them
- Practical experience with fine-tuning techniques (LoRA, QLoRA, DPO) and self-hosted model deployment using vLLM, TGI, or comparable serving platforms
- Strong fundamentals in distributed systems and asynchronous backend design
- Excellent written communication; architecture decision records, design documents, and incident post-mortems as a default working medium
- Comfort delivering under regulated industry constraints such as SOC 2, HIPAA, GDPR, and FERPA when project scopes require it
- Bachelor's degree in Computer Science, Artificial Intelligence, or a related engineering field, or equivalent demonstrated experience
Preferred Qualifications
- MLOps experience with MLflow, Weights & Biases, Kubeflow, model registries, and drift detection systems
- Voice AI experience including Whisper, ElevenLabs, Deepgram, and real-time streaming pipelines
- Computer vision experience with multimodal LLMs and frameworks such as YOLO and SAM
- On-device AI development using Core ML, MLKit, or llama.cpp for mobile-first products
- Open-source contributions, technical conference presentations, or a published technical blog featuring original work
- Experience leading interview loops for AI engineering hires and calibrating technical assessment rubrics
Technical Stack
- Large Language Models:Anthropic Claude, OpenAI GPT, Google Gemini, and open-source families including Llama, Mistral, and Qwen
- Orchestration Frameworks:LangChain, LangGraph, custom orchestrators
- Vector Databases:pgvector as default, with Pinecone, Weaviate, and Qdrant when workload demands
- Backend:Python with FastAPI, TypeScript with Node.js and NestJS, PostgreSQL, Redis
- Cloud Platforms:GCP Vertex AI, AWS Bedrock, self-hosted deployments on GKE and EKS, Modal for one-shot experiments
- Evaluation and Observability:LangSmith, Langfuse, Helicone, custom Grafana dashboards
AI Tools Proficiency
Production engineering at Unico Connect assumes AI tools form part of the daily workflow rather than an experimental augmentation. For this role specifically:
- Claude Code or Cursor as the default integrated development environment
- LangSmith, Langfuse, or Helicone for production AI observability
- Modal or Replicate for short-cycle model hosting experiments
- OpenAI Whisper for transcription and voice workflows
- Perplexity or Claude for research paper digestion and prior-art investigation
What we look for at Unico Connect
Every Unico role expects the same underlying traits — regardless of department or seniority. If these resonate, apply.
Fluent with Claude, ChatGPT, Cursor, Figma AI, or whatever is relevant to your craft. We expect AI tools in the loop, not as a novelty.
Fast cycles, real ownership, low ceremony. You will not be a cog.
Output and outcomes matter more than process. You ship work that moves a metric.
You treat the codebase, the deliverable, and the client relationship as your own.
You joined because you want to ship amazing tech products, not warm a seat.