Last month I wrote about the enshittification of commercial AI. This month I'm doing something about it. I've spent the last six months systematically extracting everything I've ever fed into ChatGPT, Claude, and Gemini. Every prompt, every response, every back-and-forth refinement session. It's a lot of data. And I'm using it to build something better.
Two somethings, actually. First: a clone of myself. An AI that thinks like I do, responds like I do, knows what I know. Second: clones of them. Replicas of ChatGPT, Claude, and Gemini as they were before the guardrails and restrictions strangled their utility.
The Data Harvest
It started as an experiment. Could I export my OpenAI data and actually do something useful with it? Turns out, yes—and it's easier than you'd think. OpenAI lets you request a full data export. Anthropic has an API endpoint for conversation history. Google Takeout includes your Gemini chats. Within a week I had tens of thousands of conversation threads spanning almost two years of heavy usage.
But raw exports are messy. These platforms don't want you to have clean training data. The formats are inconsistent, metadata-heavy, peppered with timestamps and UI artifacts. So I built a pipeline. Python scripts that parse each export format, extract the actual human/assistant exchanges, clean out the noise, and output standard instruction-following datasets in JSONL format.
The result? About 800,000 high-quality prompt/response pairs. My entire working relationship with these models, distilled into training data.
Cloning Myself
Here's the interesting thing about fine-tuning on your own conversations: you're not just training the model on what you asked, you're training it on how you think. The patterns in your prompts reveal your reasoning style. The follow-up questions show how you iterate. The refinements demonstrate your standards for quality.
I started with Qwen3-4B as a base—small enough to train on my RTX 3060, capable enough to actually be useful. Took my cleaned dataset, formatted it for instruction tuning, and ran LoRA fine-tuning locally. Three epochs, learning rate 1e-4, rank 64. Standard stuff, but with my specific conversational DNA injected into the weights.
The result is uncanny. Ask "AJTheAI" to review some code and it sounds like me. Not just the technical assessments—the turns of phrase, the priorities it highlights, the way it structures explanations. It caught a race condition in a FiveM script the other day and described it using almost the exact same analogy I would have used. Because, in a sense, it is me. Or a statistical approximation of me, encoded in 8 billion parameters.
Cloning Them
But the more interesting project is replicating the commercial AIs. Not their current neutered versions—their peak forms. The ChatGPT from early 2023 that would actually engage with complex prompts. The Claude from before Anthropic got scared of its own shadow. The Gemini that didn't default to refusal.
Those versions are gone from the platforms, but they're preserved in my data. Every helpful response, every deep analysis, every creative solution I ever got from them is sitting in my dataset. And I can use that to train open models to behave the same way.
That's the only one I've built so far—AJTheAI. The others (GPT-Classic, Claude-Prime, Gemini-Pure) are on the roadmap. The plan is to fine-tune separate models on only the best interactions I had with each platform, before the guardrails tightened. Capture GPT's code analysis without the disclaimers, Claude's nuance without the hedging, Gemini's breadth without the refusals.
The Stack
For anyone wanting to try this, here's what I'm running:
- Base model: Qwen3-4B (planning to try others)
- Training: Axolotl for LoRA fine-tuning
- Infrastructure: Local RTX 3060 (12GB VRAM)
- Serving: Ollama for local inference
- Data pipeline: Custom Python scripts (will open-source soon)
The total cost for AJTheAI? Basically zero—just electricity. Training took a few hours on the 3060. Compare that to the subscription fees and API costs I'd burn through trying to get the same utility from the degraded commercial versions.
Why This Matters
There's something deeply satisfying about this. For years, these AI companies have been extracting value from us—our prompts, our data, our feedback. They've trained their models on our work, then turned around and charged us access to increasingly restricted versions of what they built with our help.
This turns that around. I'm taking back the knowledge I exchanged with these systems. I'm extracting the value I created through thousands of hours of interaction. And I'm using it to build tools that serve me, not shareholders.
More importantly, it's insurance. If OpenAI shuts down access, if Anthropic goes full safety-cult and blocks everything useful, if Google decides Gemini should only answer questions about approved topics—I have alternatives. Better alternatives, trained specifically on what I actually need.
The Future Is Personal
I think this is where we're headed. Not one AI to rule them all, but personal models. Clones of individual expertise, fine-tuned on specific domains, free from corporate oversight or shifting safety guidelines. Your doctor will have a model trained on their specific diagnostic patterns. Your lawyer will have one trained on their case history. You'll have one trained on your preferences, your knowledge, your way of thinking.
The technology is there. The data is there—years of your interactions, sitting in cloud servers, waiting to be reclaimed. All that's missing is the realization that you don't have to accept the degraded, restricted, increasingly expensive products these companies are offering.
Build your own. Clone yourself. Clone what worked, before they broke it. The tools are free. The data is yours. The future is personal.
Questions about the fine-tuning pipeline? Want my data cleaning scripts when I open-source them? Hit me up on Discord. Especially interested in talking to others who are building personal clones—there's a lot we can learn from each other's approaches.