Engineering

What We're Building and What We're Deliberately Not Building

MRMarcus Rivera
Head of EngineeringJune 30, 202510 min read

I'm Marcus, Arcline's lead engineer. We've been heads-down on foundational engineering work for the past few weeks, and I want to share what we're building, what we're not building, and why both lists matter.

The honest version of this update is less glamorous than most startup engineering blogs. We don't have a product you can click through. We don't have screenshots. What we have is infrastructure — the plumbing that everything else will eventually sit on top of. If you've worked in data engineering, you know that getting the foundation right is the difference between a system that scales and one that collapses under its own weight at the worst possible time. If you haven't worked in data engineering, bear with me. I'll try to keep this accessible.

What We're Building

1. The Connector Framework

Arcline's core promise is that it connects to a district's existing systems and unifies their data. That means we need a way to talk to PowerSchool, Infinite Campus, Skyward, Canvas, Schoology, NWEA MAP, iReady, and dozens of other platforms — each with its own API (if it has one), its own data format, its own authentication mechanism, and its own set of quirks.

We could build each integration as a one-off. That's the fast approach: write custom code for PowerSchool, write custom code for Infinite Campus, repeat for every system. It works until you have fifteen connectors and a bug fix in one doesn't apply to any of the others. We've seen this pattern at previous companies. You end up maintaining fifteen separate integration codebases, each with their own bugs and their own tribal knowledge about why that one weird edge case is handled the way it is.

Instead, we built a connector framework — a plugin architecture where each integration is a module that follows a common interface. Every connector implements the same set of operations: authenticate, discover schema, extract records, handle incremental changes, and report status. The specifics differ — PowerSchool's API uses a different authentication flow than Infinite Campus's, and Skyward's data model is structured differently from either — but the contract is the same.

Think of it like electrical outlets. The devices are different, but they all plug into the same kind of socket. When we need to add a new connector — say, for a district running Tyler SIS or Aeries — we implement the interface for that system. The rest of the pipeline doesn't need to change.

We started with PowerSchool and Infinite Campus. Between them, those two SIS platforms cover roughly 60% of K-12 districts in the US. PowerSchool alone serves about 45 million students across 90+ countries. Infinite Campus is the second-largest SIS provider domestically. If we can reliably connect to these two, we can serve the majority of districts on day one.

The PowerSchool connector is working. We can authenticate against a district's PowerSchool instance, enumerate students, pull enrollment records, and detect changes since the last sync. Infinite Campus is about 70% done — their API handles pagination differently and their student record structure has some nested fields that required us to rethink part of our extraction logic. We'll have it finished within the next sprint.

2. The Normalization Layer

This is the hard part, and I want to be honest about exactly how hard it is.

Every student information system stores data differently. In PowerSchool, a student's enrollment status might be a numeric code: 0 for active, 1 for inactive, 2 for pre-registered, 3 for transferred. In Infinite Campus, it's a string field: "Active," "Inactive," "No Show." In Skyward, it's a combination of an enrollment status code and a separate withdrawal code that together determine the student's current standing.

That's one field. Now multiply that by every field in a student record — name, demographics, grade level, school assignment, program eligibility, attendance, disciplinary records — and multiply that by every system a district runs. The same student exists in six systems, described six different ways, often with six different identifiers.

Our normalization layer translates all of these vendor-specific schemas into a unified data model. We're building on the Ed-Fi data standard, which was designed specifically for K-12 education data interoperability. Ed-Fi gives us a shared vocabulary: a student enrollment record has a defined structure, with defined fields, with defined value sets. When our PowerSchool connector extracts a student record, the normalization layer translates it into Ed-Fi's format. When our Infinite Campus connector extracts the same kind of record, it translates into the same format.

The goal is that downstream — when a coordinator asks "how many students are actively enrolled at Jefferson Elementary" — the system doesn't need to know whether Jefferson's district runs PowerSchool or Infinite Campus. "Actively enrolled" means the same thing regardless of source system, because the normalization layer has already resolved the semantic differences.

The tricky part is that the real world is messier than any data standard. Ed-Fi defines an enrollment status with a specific set of values, but districts use enrollment statuses in ways that don't always map cleanly. A student who's been expelled but is attending an alternative program — are they "active"? It depends on the district's interpretation, their state's reporting rules, and which system you're asking. Our normalization layer has to handle these ambiguities, and in some cases, it has to surface the ambiguity rather than silently resolving it.

We're being methodical about this. Right now, we've mapped the core student enrollment and demographics entities. Attendance normalization is next. Assessment data is after that. Each domain has its own set of edge cases and semantic mismatches. We're documenting every mapping decision so that when a district asks "why does your count differ from ours," we can trace it back to a specific translation rule and have an honest conversation about whether our mapping is right or theirs is.

3. Tenant Isolation

Every district gets a completely isolated data environment. Not row-level filtering on a shared database. Not a tenant ID column with a WHERE clause that we hope never fails. Actual isolation. Each district's data lives in its own store, with its own encryption keys, with its own access controls. A query running against District A's data physically cannot return records from District B, because District B's records aren't in the same store.

I've written about this before in the context of FERPA compliance, so I won't repeat all the reasoning here. The short version: when you're handling the education records of minors, logical isolation isn't good enough. A single bug in a query filter could leak data across district boundaries. Physical isolation makes that bug impossible at the infrastructure level.

This is more expensive to operate. It adds complexity to deployment, monitoring, and maintenance. It means we can't do certain things that are easy in a shared-database architecture, like running aggregate queries across all our districts for internal analytics. We made that tradeoff deliberately. The cost of a cross-tenant data leak in K-12 isn't just a regulatory fine. It's the trust of every family whose child's records were exposed. You can't undo that.

4. Audit Logging

Every data access event is logged. Not as a feature we'll add later. Not as a compliance checkbox we'll address when a district asks for it. It's a core infrastructure component that we built from day one.

The audit log captures who accessed what data, when, through what query, and what the result set contained. The logs are written to an append-only store that can't be modified or deleted — not by district admins, not by our engineering team, not by anyone. If a parent files a FERPA request asking who has viewed their child's records, the district can answer that question with a query instead of a research project.

We built this first because retrofitting audit logging onto an existing system is one of the most painful things in software engineering. You end up with gaps — periods where access happened but wasn't logged. You end up with inconsistencies — some access paths logged at one level of detail, others at a different level. You end up with a system that's auditable in theory but unreliable in practice. We've been through that at previous companies and we're not doing it again.

FERPA requires that districts maintain records of who has accessed student education records. If Arcline is handling that data, the audit capability needs to be airtight from the first day a district connects. Not from some future version.

What We're Deliberately Not Building

This list is as important as the first one. Maybe more important, because saying no is harder than saying yes, and the things you defer define your priorities as much as the things you build.

1. A User Interface

We don't have a dashboard. We don't have a web app. Our first pilot districts will interact with Arcline through a simple query interface — closer to a command line than an application. Type a question, get an answer, see where the data came from.

That sounds backwards. Every startup investor we've talked to wants to see the product, and "the product" in their mind is a screen they can click through. But here's the thing: building a polished UI on top of a data layer that isn't right is exactly how every other edtech analytics company has failed districts. The dashboard looks great in the demo. Then the district connects their real systems, the data doesn't match, the numbers are stale, the definitions are inconsistent, and the beautiful UI becomes expensive shelf-ware.

We're going the other direction. Get the data layer right. Make sure the numbers are accurate, current, and traceable. Then build the interface. If the data layer works, a good UI is a matter of design. If the data layer doesn't work, no amount of UI polish will save it.

2. AI and ML Features

We have ideas. Lots of them. Natural language queries so a principal can type a question in plain English instead of selecting filters from dropdown menus. Predictive models that identify students at risk of chronic absence before they cross the threshold. Anomaly detection that flags when a data feed goes stale or a record count diverges unexpectedly.

All of that comes later. AI and ML are only as good as the data they operate on. A predictive model trained on dirty, inconsistent, or incomplete data doesn't produce insights — it produces confident-sounding nonsense. The normalization layer and the connector framework have to be solid before any machine learning model touches district data. Otherwise, we're building the same "garbage in, garbage out" system that districts have been burned by before, just with a fancier label.

3. State Reporting Automation

This was one of the most frequently requested features during our listening tour. Every state has its own reporting requirements — PEIMS in Texas, EMIS in Ohio, CALPADS in California. Districts spend enormous amounts of time preparing these submissions: pulling data from multiple systems, formatting it to state specifications, running validation checks, fixing errors, and resubmitting. A data coordinator in Texas told us she spends six weeks a year on PEIMS submissions alone.

We want to automate this. It's on the roadmap. But state reporting automation depends on the normalization layer being essentially bulletproof. If the mapping from vendor-specific data to our unified model has any errors, those errors will flow directly into state submissions. Getting state reporting wrong isn't a minor inconvenience — it affects funding, compliance status, and accountability ratings. We're not going to ship this until the foundation it depends on has been tested, validated, and proven across multiple districts.

The Tradeoff We're Making

I want to be direct about what this approach costs us.

Going foundation-first means we won't have impressive demos for months. When other edtech companies are showing polished dashboards with colorful charts and drag-and-drop report builders, we'll be showing a query interface that returns text. When they're demoing AI-powered insights, we'll be talking about data normalization rules and connector reliability metrics. That's a hard sell in a market where buying decisions are often influenced by how a product looks in a 30-minute demo.

We're betting that districts are tired of products that look great and don't work. We're betting that when we tell a data coordinator "the chronic absence number you see came from your SIS, it was synced 4 hours ago, here's exactly how the count was calculated, and here are the 47 students in that count," that accuracy and transparency will matter more than a colorful gauge chart.

Maybe we're wrong. It wouldn't be the first time an engineering team overweighted technical correctness and underweighted user experience. We're watching for that. Sarah, our head of product, keeps a sign on her monitor that says "Would a coordinator use this?" — and she's not shy about telling us when the answer is no.

But the alternative — building a slick interface on top of a data layer that silently returns wrong numbers — is how this industry has failed districts for a decade. We're not going to out-design our way past bad data. Nobody can. The foundation comes first.

What's Next

Next up, we're finishing the Infinite Campus connector, starting on Canvas and NWEA MAP connectors, and extending the normalization layer to handle attendance records. We're also beginning conversations with our first pilot districts — the ones who told us during the listening tour that they'd be willing to connect their systems to an early version of Arcline and give us candid feedback.

Those conversations will be the real test. It's one thing to build a connector against a PowerSchool sandbox instance. It's another thing entirely to connect to a live district with 15,000 students and see whether our normalization rules hold up against real data with real edge cases.

We'll report back. The good parts and the parts that break. That's the deal with building in public — you don't get to only share the wins.

MR
Marcus RiveraHead of Engineering

Making district data infrastructure reliable, fast, and secure.

Stay in the loop

Get the latest on K-12 data, AI, and district innovation — delivered monthly.

Keep Reading