ML Engineer — New Grad 2026

San Francisco (Hybrid) · Remote OKFull-time$155K–$210K + equityStart date flexible, Summer/Fall 2026

About the role

Arcline builds AI-powered data tools that help K-12 school districts turn fragmented student data into clear, actionable decisions. We work with superintendents and district leaders across Alabama, California, Kentucky, Texas, Wisconsin, and more — replacing months of manual reporting with instant, AI-driven answers.

You'll own the entire intelligence layer that powers Arcline — our natural language query engine, the retrieval and ranking systems behind it, and the evaluation infrastructure that keeps it honest. When an educator asks "which 3rd graders are below benchmark in reading and also flagged for chronic absenteeism?", your systems are what turn that into an accurate, cited answer from their district's data.

This is a foundational ML role at a company with no existing ML team to plug into. You'll make the core technical decisions — retrieval architecture, evaluation methodology, model selection, agent design — and own the outcomes. If you want to build ML systems from scratch with real users and real stakes, this is the role.

Day to day

Own the design and evolution of Arcline's RAG architecture — retrieval strategy, ranking, chunking, reranking, and citation generation across heterogeneous education data sources
Build and operate the agent framework that decomposes complex educator queries into multi-step data retrieval and reasoning workflows
Design and maintain evaluation infrastructure: automated benchmarks, regression testing, human-in-the-loop evaluation pipelines, and quality metrics dashboards
Make model selection and integration decisions across the stack — choosing when to use frontier models vs. fine-tuned smaller models, managing cost/latency/quality tradeoffs in production
Build data preprocessing pipelines that normalize messy, inconsistent education data from dozens of district sources into clean representations for retrieval and inference
Drive prompt engineering and optimization as a disciplined practice — version-controlled prompts, A/B testing, systematic iteration
Collaborate with the engineering team on production infrastructure: API design, caching, latency optimization, and monitoring for ML-powered features

Requirements

B.S. or M.S. in Computer Science, Machine Learning, or a related field (graduating by Summer 2026)
Deep understanding of retrieval systems — you can reason about embedding models, vector search tradeoffs, hybrid retrieval, and reranking from first principles
Proficiency in Python and solid experience with SQL and data manipulation at scale
Hands-on experience building with LLM APIs (OpenAI, Anthropic) and retrieval frameworks (LangChain, LlamaIndex, or custom)
Experience designing experiments and evaluating ML system quality beyond just vibes — you've built or used structured evaluation pipelines

Bonus qualifications

Experience building RAG or agent systems that served real users (not just demos)
Research or coursework in information retrieval, NLP, or knowledge representation
Experience with fine-tuning, RLHF, or model distillation techniques
Familiarity with ML infrastructure: experiment tracking, model serving, feature stores
Experience with Postgres, FastAPI, or data pipeline tools (dagster, dbt)
AI-native development habits — you use LLM tools like Cursor, Claude Code, GitHub Copilot, Codex, or anything else to write, debug, and ship code faster

Compensation

$155K–$210K + equity. Compensation is determined based on experience, skills, and location.

You don't need to match every listed expectation to apply. We know the best candidates come from diverse backgrounds and experiences — if this sounds exciting to you, we'd love to hear from you.