The Internet Doesn’t Forget, So Why Will AI?

A Personal Meditation on Memory, Technology, and What Endures

I’ve been alive long enough to watch memory change shape.

Thirty-plus years might not seem like a long stretch, not in the grand sweep of human history. But in those decades I’ve seen data morph from fragile, transient, and at risk of vanishing at the smallest mishap — to something strangely sticky, nearly impossible to erase, and now woven into the DNA of artificial intelligence.

This post is not legal advice, not a final verdict, not a manifesto. It’s an extended meditation, built from personal experience, professional background, and what’s already in the public record. Think of it as a field journal of someone who grew up with floppy disks and MySpace, worked at the junction of art and IT, and is now trying to make sense of what AI is doing with our collective memory.

The internet doesn’t forget. That much, we’ve all heard. But the real question that gnaws at me now is: If the internet doesn’t forget, why should AI?

Growing Up in Fragility

My first encounters with memory were with fragility.

As a kid, I lost games and projects to corrupted floppy disks. I remember the hollow sound a drive would make when it clicked through a disk that wouldn’t load. No rescue, no backup, just gone.

Home movies on VHS could be taped over by accident. One moment your fifth birthday party, the next a sitcom rerun. I learned quickly that data was not safe; it was brittle, always on the edge of disappearing.

Even the early internet carried that volatility. A fan site I loved might vanish overnight because the webmaster stopped paying for hosting. One click led to a dead link, a 404 error, a reminder that nothing online could be trusted to last.

And yet, almost overnight, the rules changed.

The Internet’s Strange Bias Toward Memory

By the time I reached adulthood, the internet had developed what I can only describe as a structural memory.

A comment I dashed off in a forum as a teenager resurfaced years later in a cached copy. A photograph uploaded in college still appeared in search engines long after I deleted the account. Screenshots meant nothing truly disappeared.

The internet leaned toward persistence. Not because anyone intended it that way — but because the infrastructure itself made it so.

Packet logs. Every message broke into packets stamped with metadata about where they came from and where they were going. Routers and ISPs often kept logs for diagnostic or compliance reasons.

Caching. Browsers, ISPs, and content delivery networks stored local copies of websites. Even when an original post vanished, cached versions could linger for weeks, sometimes months.

Archival scraping. Projects like the Internet Archive’s Wayback Machine crawled and captured pages, preserving them long after their owners moved on.

Cloud redundancy. With the rise of AWS, Azure, and Google Cloud, deletion became technically hostile. Systems preferred to replicate, backup, and version files rather than risk catastrophic loss.

So even when the internet did forget, whether through link rot, format obsolescence, or failed server migrations like MySpace’s infamous loss of 12 years of music uploads, its bias leaned toward memory.

We live in that bias. Employers Google job candidates and find high school posts. Politicians see past mistakes resurface decades later. Journalists and activists depend on caches and archives to hold the powerful accountable.

The internet taught us that forgetting is not the default anymore. It has to be engineered, enforced, or fought for.

Lessons from Older Memory Systems

As I wrestle with AI today, I keep circling back to an older story: humanity’s five-thousand-year struggle to remember and forget.

Stone and clay. In Mesopotamia, scribes pressed laws and myths into tablets. Pharaohs etched names into stone. Kings and priests ensured their legacies endured — while ordinary voices vanished. Permanence was political from the very beginning.

Manuscripts. Papyrus, parchment, vellum. Fragile scrolls and codices circulated, copied by hand. Continuity survived by chance — a manuscript hidden in a monastery, a scroll preserved in dry caves. Forgetting lurked everywhere.

Printing press. Gutenberg shifted the geometry of memory. Redundancy multiplied. Books existed in thousands of copies. Forgetting required violence — censorship, burning, repression.

Broadcast. Photography, phonographs, film, radio, TV: slices of time replayed at will. But curated. Editors, producers, and institutions decided what entered the cultural archive. Memory was extended, but controlled.

The internet. Not curated. Not controlled. Memory spread everywhere, amplifying triviality and profundity alike. It became harder to suppress, harder to forget.

I’ve started to see AI not as a rupture, but as the next chapter in this long saga of how we remember and forget.

How AI Remembers

AI doesn’t archive like the internet. It doesn’t keep raw files on a shelf.

Instead, it encodes patterns. Neural networks digest billions of words, images, and sounds, compressing them into weights. These are statistical fingerprints of our culture.

But the bias remains the same: toward persistence.

Training data. Documented datasets include Common Crawl (billions of scraped webpages since 2008) and The Pile (an 825 GB open corpus with sources like YouTube transcripts, Reddit discussions, Books3, and more). Books3, in particular, drew lawsuits because it contained digitized books from unauthorized shadow libraries. In 2025, Anthropic reportedly reached a $1.5 billion settlement with authors over such datasets.

Companies are also cutting licensed deals, like OpenAI’s 2024 partnership with Reddit for access to the Reddit Data API.

AI is not trained on a handpicked canon. It’s trained on the exhaust of digital culture: conversations, transcripts, papers, and leaked shadow libraries.

Encoding. Once digested, the data doesn’t sit as files. It lives in billions of parameters. That makes erasure nearly impossible. Researchers have even extracted verbatim passages and confidential strings (like API keys) through “model inversion” attacks.

Cloud reinforcement. Models are trained on hyperscale clusters, checkpointed, versioned, mirrored, and stored across research groups. Even when companies say a dataset is deleted, ghosts often survive in backups or torrents.

In short: AI seems to share the internet’s bias toward persistence. A tendency to resist forgetting.

Forgetting as Promise vs. Reality

And yet, society insists on forgetting.

Law. GDPR enshrines the “right to be forgotten.” The EU AI Act extends these protections, implying training data must be erasable. Takedowns and delistings proliferate.

Practice. Deletion is often cosmetic. A search engine may delist a result, but the content still exists on a server or archive. A company may promise AI “unlearning,” but true removal from trillion-parameter models is prohibitively expensive. Fine-tuning and filters simulate forgetting — but the memory remains beneath the surface.

Culture. Humans expect forgetting. We assume time softens mistakes, that youthful errors fade. But AI doesn’t fade. Its statistical encoding preserves traces beyond cultural forgiveness.

This mismatch explains much of the discomfort people feel when confronting AI memory.

Possible Futures of AI Memory

I see three main futures, none of them comfortable.

  1. AI as Total Archive: AI remembers everything, forever. It becomes the ultimate historian. Benefits: accountability, accumulated knowledge. Risks: privacy collapse, weaponized mistakes, inescapable pasts.

  2. AI as Humanlike Forgetter: AI decays like us. Memories fade. Requests lead to erasure. Benefits: privacy, forgiveness. Risks: manipulation, authoritarian erasures, loss of history.

  3. AI as Regulated Forgetter: Governments enforce deletion mandates. Companies comply with visible filters. Benefits: alignment with law, trust-building. Risks: cosmetic compliance, lingering hidden traces, eroded trust.

The most likely outcome, at least from where I sit, may be a messy hybrid: some models archiving, others decaying, most promising deletion while still leaving hidden traces.

The Real Question

So when people ask, “Can AI forget?”, I think they’re asking the wrong thing.

The better question is: “Should AI forget? And if so, who decides what it forgets?”

Memory is never neutral. From cuneiform tablets to the cloud, it’s always been shaped by power. Which rulers, institutions, or corporations decide what endures? Which stories vanish?

The internet doesn’t forget because it wasn’t designed to. AI will forget only if we design it to. But every design choice carries cultural, political, and ethical weight.

The realization I keep circling back to is that forgetting is never free. It is always costly, partial, and contested. And it always raises the deeper question: whose memory survives?

Final Reflection

I began this essay by remembering corrupted floppy disks and lost VHS tapes. That fragility, once normal, now feels alien. We’ve entered an age where fragility is rare and persistence is the rule.

But persistence is not the same as justice. What we remember, what we erase, what we let linger in the shadows. These are cultural decisions, not technical defaults.

The internet made us live with persistence by accident. AI will force us to choose persistence, or forgetting, by design.

The decision is coming fast, and it is less about technology than about values. What do we want the future to inherit? Our total archive, our selective erasures, or something in between?

And perhaps the harder question still: who gets to decide?

Key Concepts and Working Terms

  • Structural Memory: A phrase I use to describe the internet’s tendency to preserve content unintentionally through caching, logging, backups, and archives. Unlike curated archives, structural memory emerges from infrastructure.

  • Bias Toward Persistence: My shorthand for the way both the internet and AI systems lean toward remembering rather than forgetting. Not a law of physics, but a pattern in how these systems operate.

  • Cosmetic Deletion: A working term for when something is hidden (like delisting a search result or fine-tuning an AI model) but not truly erased from underlying storage or training.

  • Messy Hybrid: My forecast term for the most plausible future: AI systems that partially archive, partially decay, and often retain traces despite promises of deletion.

  • Forgetting as Design: My way of framing the shift: the internet forgot (or didn’t) by accident, but AI forces us to engineer forgetting deliberately.

Previous
Previous

Who’s Responsible for AI Job Loss?

Next
Next

Epistemology by Design: My Work with Custom GPTs and the Ethics of Engineered Knowledge