Agentic AI R&D Intern — Summer 2026
About the role
Arcline builds AI-powered data tools that help K-12 school districts turn fragmented student data into clear, actionable decisions. We work with superintendents and district leaders across Alabama, California, Kentucky, Texas, Wisconsin, and more — replacing months of manual reporting with instant, AI-driven answers.
We're a small, AI-native team that ships fast and builds with real users. Our interns ship real features to real classrooms.
When an educator asks "which Title I schools have declining math scores and rising chronic absenteeism?" — that question can't be answered with a single database call. It requires decomposing the question into sub-tasks, routing each to the right data source, reasoning about how the pieces fit together, and synthesizing a cited answer. That's an agent orchestration problem, and it's the core of what you'll work on.
This is an R&D role. You'll prototype and evaluate multi-agent architectures that push beyond our current single-pass RAG pipeline — designing systems where specialized agents collaborate to handle the hardest queries our districts throw at us. You'll also explore proactive agent workflows: systems that autonomously surface funding gaps, compliance risks, and at-risk students before anyone thinks to ask.
Day to day
- Design and prototype multi-agent orchestration systems that decompose complex educator queries into coordinated sub-tasks across multiple data sources
- Build and evaluate agent planning and routing architectures — deciding which tools, data sources, and reasoning strategies each sub-task requires
- Develop proactive agent workflows for Arcline's Decision Engine: autonomous systems that monitor district data and surface funding gaps, attendance risks, and staffing anomalies without being asked
- Implement tool-use and function-calling patterns that let agents query databases, generate reports, cross-reference student records, and trigger downstream workflows
- Build evaluation harnesses for agent reliability — testing that multi-step agent chains produce correct, cited answers across diverse query types and district configurations
- Experiment with agent memory, state management, and context-passing strategies for long-running and multi-turn workflows
- Benchmark orchestration approaches (ReAct, plan-and-execute, hierarchical agents, LLM-as-judge routing) against real district query workloads and publish internal findings
Requirements
- Currently pursuing a B.S./B.A. or M.S./Ph.D. in Computer Science, Machine Learning, AI, or a related field
- Strong Python skills and hands-on experience building with LLM APIs (OpenAI, Anthropic)
- Demonstrated experience with agent frameworks — LangGraph, CrewAI, AutoGen, OpenAI Agents SDK, or custom agent architectures in projects or research
- Understanding of agentic design patterns: tool use, planning, reflection, multi-agent communication, and state management
- A research mindset — you're comfortable running experiments, tracking metrics, and writing up what worked and what didn't
- Able to commit to a 10-week internship from June 1 to August 10, 2026. Based in San Francisco with remote positions also available. Relocation assistance provided for on-site roles
Bonus qualifications
- Published research or substantial projects involving multi-agent systems, agent planning, or autonomous AI workflows
- Experience with agent evaluation and benchmarking — measuring reliability, tool-use accuracy, and reasoning quality
- Familiarity with RAG systems, vector databases (pgvector, Pinecone), or retrieval-augmented agent architectures
- Experience with structured output, function calling, or constrained generation from LLMs
- Coursework or research in planning, reasoning, multi-agent systems, or reinforcement learning
- Interest in education, public sector tech, or applying AI research to real-world product problems
- AI-native development habits — you use tools like Cursor, Claude Code, GitHub Copilot, or Codex to ship faster
Compensation
Hourly rate ranges from $45–60/hr. Compensation varies based on role track and experience level.