ML Engineer — New Grad 2026

San Francisco (Hybrid) · Remote OKFull-time$155K–$210K + equityStart date flexible, Summer/Fall 2026

About the role

Arcline builds AI-powered data tools that help K-12 school districts turn fragmented student data into clear, actionable decisions. We work with superintendents and district leaders across Alabama, California, Kentucky, Texas, Wisconsin, and more — replacing months of manual reporting with instant, AI-driven answers.

You'll own the entire intelligence layer that powers Arcline — our natural language query engine, the retrieval and ranking systems behind it, and the evaluation infrastructure that keeps it honest. When an educator asks "which 3rd graders are below benchmark in reading and also flagged for chronic absenteeism?", your systems are what turn that into an accurate, cited answer from their district's data.

This is a foundational ML role at a company with no existing ML team to plug into. You'll make the core technical decisions — retrieval architecture, evaluation methodology, model selection, agent design — and own the outcomes. If you want to build ML systems from scratch with real users and real stakes, this is the role.

Day to day

  • Own the design and evolution of Arcline's RAG architecture — retrieval strategy, ranking, chunking, reranking, and citation generation across heterogeneous education data sources
  • Build and operate the agent framework that decomposes complex educator queries into multi-step data retrieval and reasoning workflows
  • Design and maintain evaluation infrastructure: automated benchmarks, regression testing, human-in-the-loop evaluation pipelines, and quality metrics dashboards
  • Make model selection and integration decisions across the stack — choosing when to use frontier models vs. fine-tuned smaller models, managing cost/latency/quality tradeoffs in production
  • Build data preprocessing pipelines that normalize messy, inconsistent education data from dozens of district sources into clean representations for retrieval and inference
  • Drive prompt engineering and optimization as a disciplined practice — version-controlled prompts, A/B testing, systematic iteration
  • Collaborate with the engineering team on production infrastructure: API design, caching, latency optimization, and monitoring for ML-powered features

Requirements

  • B.S. or M.S. in Computer Science, Machine Learning, or a related field (graduating by Summer 2026)
  • Deep understanding of retrieval systems — you can reason about embedding models, vector search tradeoffs, hybrid retrieval, and reranking from first principles
  • Proficiency in Python and solid experience with SQL and data manipulation at scale
  • Hands-on experience building with LLM APIs (OpenAI, Anthropic) and retrieval frameworks (LangChain, LlamaIndex, or custom)
  • Experience designing experiments and evaluating ML system quality beyond just vibes — you've built or used structured evaluation pipelines
  • Authorized to work in the United States

Bonus qualifications

  • Experience building RAG or agent systems that served real users (not just demos)
  • Research or coursework in information retrieval, NLP, or knowledge representation
  • Experience with fine-tuning, RLHF, or model distillation techniques
  • Familiarity with ML infrastructure: experiment tracking, model serving, feature stores
  • Experience with Postgres, FastAPI, or data pipeline tools (dagster, dbt)
  • AI-native development habits — you use tools like Cursor, Claude Code, GitHub Copilot, or Codex to ship faster

Compensation

$155K–$210K + equity. Compensation is determined based on experience, skills, and location.