Last week, I launched Shikhu: a CLI tool and agent skill that helps you learn the code your agents make. Here, I’ll give an overview of why I built Shikhu, the philosophy that drives the architecture, and some plans for what’s next. I hope to learn along with you all as I develop this tool. If you’d like to stay in the loop, please star the repo and sign up for my newsletter, marking your interest in Shikhu!
Now, the details!
Agents all the way down
Coding agents are so, so good at writing software. I probably haven’t earnestly written a line of code in several months. But, I’ve had this sinking feeling slowly creep up on me over and over while I develop: I have no idea what these things are doing!
A lot of my work has shifted from learning things to just doing them. There’s a time and place for doing things (and yes, that’s usually on the clock!), but I felt like I had lost an essential, satisfying part of the experience of coding: distilling a clever solution to a problem. The feeling of knowing a thing, completely, inside and out. The feeling of mastery over my craft.
I’ve lost the habits that encourage this somewhere along the spaghetti code I’ve been writing. And I've heard some version of this from other developers too.
The loss isn’t just sentimental. Recent research from Anthropic demonstrated that using coding agents works more like exoskeletons than going to the gym: allowing you to accomplish significant feats without actually getting any stronger. The consequence of this is that we are incentivized to write code we don’t understand, to review code without clear authorship, and we learn way less than we used to. In other words, we atrophy in our knowledge, and become more reliant on models rather than our own capabilities.
But it’s not all bad of course! Turns out, according to that same research, it’s possible to learn while using coding agents — something I dug into in a previous post. It requires specific behavioral patterns like developing conceptual understanding through inquiry, especially after generating code or beforehand in planning. In other words, ya gotta ask questions, test your knowledge, and really engage with your code.
So, I set out to build a tool to help me rebuild this habit.
What is Shikhu?
Shikhu is a tool that helps you learn the code your agents write, by analyzing how you engage with your agent, alongside how well you do on getting quizzed on your code. I refer to how well you know your code conceptually as knowledge coverage, kinda like test coverage, but for your brain.
- cli.py✓ covered
- quiz.py✓ covered
- refresh.py2 / 3
- mercury.py✓ covered
- coverage.py1 / 3
- db.py0 / 3
It’s not quite code review, although you are certainly gonna review some code. It’s not quite flashcards, and it’s certainly easier than writing the code yourself!
I realized from that research paper (and trying but failing to let unaugmented LLMs teach me something new) that all I needed to do was incentivize this behavior of asking really good deep questions about the code I’m writing (aka, conceptual inquiry). And in order to make this a habit, I needed a way to quantify, validate, monitor, and repeat this.
Shikhu works by understanding your codebase on a per-file basis, generating summaries and looking a bit at how you ask questions with coding agents you use while working. Then, it generates and assigns you multiple choice quizzes to take (thrilling, I know), so you can benchmark your understanding over time. Quizzes sound silly but actually serve two interesting purposes:
- They allow you to quickly gauge your understanding, assuming the quiz is well made
- If a quiz question is bad, they can be rejected and regenerated, which creates a signal on what kinds of questions are good for you
Combining both of these (eventually), can result in validated question sets over your whole codebase, aka “golden questions”, which others can use to ensure a baseline understanding of your code. Learn more, and create assessments, all at once!
How to use it and how it works
Shikhu is designed to be free to use and compatible with coding agents.
All you need to get started is an Inception API key, which at writing allows for 10 million tokens a month. More than enough for what most need!
First, install the CLI tool using uv, like this:
uv tool install shikhu
Then, install the skill like this:
shikhu init
The Inception API key allows us to access a text diffusion model called Mercury 2.
Text diffusion is a generation architecture for large language models which has some unique advantages for generating structured data quickly. Specifically, models like these are really good at generating code and structured data, which we leverage for the following:
- Summaries of code files: these are used to cache interpretations of what your code does
- Quiz questions: these are what are used to gauge your understanding
Mercury 2 is so fast, that it can generate each summary in under a second, at hundreds of tokens per second. So, you don’t really have to wait for your claude/codex instances to do the same, or to eat up your rate limits by running these routines. Quizzes aren’t generated fresh until they’ve been taken or marked as stale.
Drop your Inception key into a .env file at the root of your repo:
# .env
INCEPTION_API_KEY=your-key-here
Run the first batch like this:
shikhu refresh
Then, code as you usually would. If you’d like to study a specific file, you can invoke the /shikhu-study skill, which will help you drill down and learn a file.
Additionally, it will
- track whenever you ask questions that are conceptual in nature (Regex for now..)
- use those to generate tailored quiz questions to reinforce your understanding
So you learn to ask about your code, then you confirm what you just learned later.
shikhu initinstalls the skill→
generate-from-studyyour questions → quizzes
After coding for a bit, take a quiz:
shikhu quiz
During a quiz, you can reject questions or answer them. You can also mark questions as good for later re-use/validation.
Eventually, you’ll have enough quiz questions and activity answering them to turn them golden. These are good questions you get right reliably which if you did, indicate your conceptual understanding.
The number of files you have in your codebase that have successful, golden questions represents your knowledge coverage.
That’s it! If you prefer to have your agent do the heavy lifting of using the CLI, you can as well, through the skill and the --help commands.
Why work this way?
Through trying (and failing, repeatedly) to learn new things with LLMs, I’ve realized the following things:
- Self-quizzing is the best way to benchmark and review new material
- Code review, while important, is not the same as developing conceptual understanding and should be practiced separately and deliberately
- The relationship between validating and creating quizzes during code generation, and sharing those quizzes can form a basis for baseline assessment of knowing your code
So, I wanted to build a tool that follows these principles. It was pretty tempting to, say, build a full on AI tutor but I’m not confident that such an open-ended method of learning codebases would work well. I wanted to have human verification and grounding in the loop, rather than wholly depending on an LLM to tell you what's what. It turns out, there is a neat way to combine both and still provide learning.
I also know it’s tempting to just conclude that people should read code more and get better at code review, but I think code review is a distinct skill from understanding codebases conceptually, and wanted to focus on the latter rather than forcing both.
What’s next for Shikhu
Over the next few weeks and months, I plan to iterate and develop Shikhu to improve my own learnings and understandings of the code I write and seek to make. I also really wanna see if this idea has any legs, so if you like it and use it, please let me know!
Things I’m thinking about doing are:
- creating solid benchmarks for quiz generation
- Introducing a simple form of learning from the labeled quiz question data
- Add more model support, more kinds of quiz questions, and voice input
But, what would help me the most is… you! If you love the tool, or maybe you wanna love the idea of it and need a feature to exist, please let me know!
Make a post in our GitHub Discussions, shoot me a message, or make a feature request.
Eventually, if there is continued interest, I want to make a managed version of Shikhu for devs that prefer to view their performance across repos, display their knowledge coverage, or just wanna not deal with setting up API keys. There are also some exciting opportunities around personalization that may arise due to the nature of Shikhu… but more on that in a future blog post, anything is possible!
Want to stay in the loop?
If this excites you, visit the repo, try the tool and let me know what you think!
You can open an issue, post in our Discussion threads, or shoot me a message.
We live in an extremely exciting time for writing code, but we risk losing the ability to understand it. I hope Shikhu and tools like it become a way to keep learning in the age of AI.
Join Answering Machines
Thoughts, reflections, tinkering and whimsy around AI, modern tech, the world, and all that comes with it.
I respect your inbox. Unsubscribe at any time.
