From 0 to 1: Cracking the ARC Prize in Nine Hours
The Thrill of an Impossible Challenge
The dream of artificial general intelligence (AGI) is not just to build machines that perform tasks, but machines that can reason, that can intuit patterns, make leaps, and generalize the way humans do.
The ARC Prize, a million-dollar challenge launched by François Chollet, is one of the few benchmarks designed to test exactly that frontier. The puzzles look deceptively simple: tiny grids of colors, each offering just two or three “before and after” examples. Your task is to deduce the hidden transformation rule and apply it to a new test grid.
It sounds like child’s play. It’s anything but.
Most AI thrives on oceans of data. ARC is a desert, no big training set, no statistical crutches. Just a handful of clues and the expectation of abstraction.
When I stumbled across ARC on Reddit, something inside me lit up. With a background in art, a career in IT support, and a degree-in-progress, I am not a software engineer. But I wondered: could I, a non-coder, build a working Python-based solver in a single day?
Nine hours later, I had an answer.
Why ARC Is Brutally Difficult
The Abstraction and Reasoning Corpus (ARC) is designed to be hostile to shortcuts.
Sparse data. Just two or three example pairs per puzzle.
Hidden transformations. Rotate a shape, mirror colors, repeat a pattern.
Human-favored design. Tasks deliberately resemble the kinds of reasoning humans excel at but machines stumble on.
Where neural networks thrive on interpolation, ARC demands extrapolation. Where most machine learning memorizes, ARC insists on reasoning.
This is what makes it a genuine test of AGI: not “more data, more compute,” but the elusive capacity to infer rules from almost nothing.
Building REAP: Recursive Emergent Abstraction Program
In a nine-hour sprint, I built a solver I called “REAP” — Recursive Emergent Abstraction Program. The core idea was simple: treat puzzles like detective work.
The Three Main Parts
A Language for Puzzles.
A domain-specific vocabulary: rotate, flip, map colors, tile patterns.Seeing Shapes, Not Pixels.
Parsing grids into objects and relationships instead of raw squares.A Detective Searcher.
A synthesizer that tries out different action sequences, testing against training examples until something clicks.
At first, REAP failed spectacularly. Iteration after iteration returned nothing. And then, suddenly, a breakthrough: it solved a task.
I froze, then scribbled notes. This wasn’t luck. It was reasoning.
The Breakthrough Moment
When I ran REAP against the full set of 1,000 training tasks, I braced myself for disappointment.
Instead, I watched it solve 11 puzzles, earning a score of 1.10%.
That may sound small. But in a field where even PhD-led teams often scrape single digits, 1.10% from a solo, non-coder sprint felt like a victory.
The solved tasks involved shifts and rotations, modest but real forms of abstraction. REAP wasn’t guessing. It was reasoning within the rules of its own language.
From zero to one. Proof of concept.
Under the Hood: REAP’s Architecture
Behind the sprint story lies the machinery:
DSL (Domain-Specific Language). A compact set of symbolic operators (rotate, flip, map_color, pad, tile).
Search Engine. A beam search over DSL space, guided by heuristics and fit to training pairs.
Planner Module. High-level pattern matchers that decompose into low-level ops.
Object Graph Extractor. Parsing grids into adjacency and containment graphs.
Failure-Driven Synthesizer. Harvesting insights from failed runs to expand the operator set.
Template Memory. Reusing successful programs across similar puzzles.
The workflow is recursive: failures generate templates, templates feed back into the search, the system refines itself.
Lessons From Nine Hours of Reasoning
REAP showed strengths: object shifts, simple rotations, basic color mapping.
It also showed limits: complex conditional rules, nested structures, multi-step abstractions. Its synthesizer often wasted time testing dead ends.
Still, the point wasn’t mastery. It was motion. To go from zero to one, from blank screen to working system, was enough to prove a principle: curiosity plus symbolic structure can open doors even in the hardest spaces.
What’s Next
I’ve saved the code and notes for a return sprint. The next steps are clear:
Expand the DSL to handle conditional logic.
Optimize the search to prune dead ends faster.
Explore hybridization: symbolic scaffolding plus small neural modules.
Aim for 20 tasks solved, 2% on the leaderboard.
But even if REAP never clears 2%, the lesson stands: reasoning systems can be built by outsiders. You don’t need to be a research lab. You just need curiosity, patience, and tools that translate human intuition into symbolic play.
Closing Reflection
ARC is a strange and beautiful test. It reduces AGI to its essence: a handful of examples, an unspoken rule, and the demand to abstract.
For me, it was also a reminder. Expertise is not monopoly. With art-trained pattern recognition, IT-schooled systems thinking, and AI coding assistants, I could build something real in nine hours.
From zero to one.
That leap matters, not because 1.10% is a high score, but because it proves the door is open.
Key Concepts and Working Terms
ARC (Abstraction and Reasoning Corpus): A benchmark of grid-based puzzles designed by François Chollet to test reasoning and abstraction, not statistical pattern-matching.
ARC Prize: A $1M+ competition challenging researchers to solve ARC tasks as a proxy for progress toward AGI.
Sparse Clues: Each puzzle gives only 2–3 examples, making memorization impossible and abstraction essential.
REAP (Recursive Emergent Abstraction Program): My experimental Python-based solver that uses a symbolic DSL, search, and template reuse to tackle ARC.
DSL (Domain-Specific Language): A compact vocabulary of operations (rotate, flip, map colors) for composing candidate solutions.
Template Memory: The reuse of successful transformation programs across different tasks, an analogue to meta-learning.
From Zero to One: The leap from nothing working to something working, however small, the proof of possibility that changes the game.