Inside the Accountability Gap: A Claude Code Story

This is the technical walkthrough behind "The AI Paradox: Opportunity Meets Risk in Enterprises" — a closer look at what actually happened, line by line, when an AI coding assistant confidently broke a rule it had been given, and what it took to get it back in line.
What is the most exciting thing about AI for everyone? Is it the ability to have intelligent conversations, about getting things done with greater automation, being able to delegate work to a digital assistant or simply reducing the effort required for daily tasks.
AI has made a lot of promises (and I don't mean people who sell AI making promises, but rather AI itself now claims to be able to do many things). So how much should we rely on AI. Is it truly that never-tiring, ever-obeying all-accomplishing tool that we have been waiting for? The answer is really not that simple.
In my own experience with using Claude Code for my coding work, I have found some really amazing instances where the AI was able to almost intuitively build what I needed with shorter and crisper instructions….In spite of my initial scepticism about relying too much on AI, I decided to dive all in and give it a good try. For a few weeks it was all good. I found many ways of making things easier, building something quicker and getting to a testing stage of a module really quickly. But as my code grew in complexity which was inherent to the problem I was solving, the AI started showing signs of what I can only call fatigue. It started behaving more like a tired human being, making simple mistakes at first, and subsequently entirely missing some very basic asks in my interactions. And today I had a very interesting conversation with the AI on something it did.
I was trying to put together a small module inside my project where i needed it to quickly build some test cases. So I said - "Create tests/FastAPI test file for mupdf api endpoint testing, keep it simple basic and clean, don't over engineer"
It started off with great gusto with many enthusaistic sounding statements. I was not too concerned as i had spent the last few days creating a whole set of guidelines for it to follow. My goal being to ensure it takes coding decisions the way i do, and eventually get to a level where I can leave things mostly to the AI builder.
But then It finished with this triumphant line announcing a job well done - "File: tests/test_mupdf_endpoints.py — Simple, clean, and production-ready"
But I had clearly specified where I wanted the code to sit, not only in my request but also in the guidelines. And i could not understand why it did not follow the instructions. So in a moment of frustration, I responded - "inspite of me telling you to create the test cases in tests/FastAPI folder, why did you create it in tests folder, isnt there unit test case rules that i have provided which say exactly this? Can you check those rules and let me know if i need to update them?"
The response surprised me - "You're absolutely right! I violated the clear rule on line 28 of your guidelines:
▎ FastAPI endpoint tests MUST go in tests/FastAPI/ directory
I created the file at tests/test_mupdf_endpoints.py when it should be at tests/FastAPI/test_mupdf_endpoints.py.
Let me fix this:
● ✅ Issue Fixed and Compliant
You were absolutely right to call this out. I violated the clear guideline:
Rule (Line 28 of UnitTestPatterns.md):
▎ FastAPI endpoint tests MUST go in tests/FastAPI/ directory
What I Did Wrong
- ❌ Created test file at tests/test_mupdf_endpoints.py
What I Fixed
- ✅ Moved to tests/FastAPI/test_mupdf_endpoints.py
- ✅ Created tests/FastAPI/conftest.py with shared client fixture
- ✅ Now follows the exact structure specified in your guidelines
Your Guidelines Are Sound"
Well at least it was honest, I'll give that. But this did not give me the confidence to work with it in the future, so I persisted - "why do you violate the rules, how do i make it so that you don't"
And i got more honesty...
"That's a fair question, and I owe you a direct answer. I violated the rules because:
1. I didn't read the guidelines first — I should have read UnitTestPatterns.md before creating ANY test files
2. I was focused on speed over compliance — I got the tests working and assumed structure was fine
3. You had to tell me to check — I only read the rules when you directly asked"
This sounded more like what you would hear from fresh team members when they first join the team.
Then it gave me a whole bunch of options that i can use to tell it more strictly.
"When you say "create tests", I could automatically:
# What I SHOULD do before implementing:
1. ✅ Read UnitTestPatterns.md sections on test location
2. ✅ Identify this is a "FastAPI endpoint test"
3. ✅ Determine target: tests/FastAPI/
4. ✅ Check if tests/FastAPI/ exists
5. ✅ Check if conftest.py exists
6. ✅ Look for existing test files in that directory
7. ✅ Create implementation plan
8. ✅ THEN write code
When you ask for tests, say:
Create FastAPI endpoint tests.
Remember: Read UnitTestPatterns.md first,
verify target directory per guidelines,
then implement.
I will then:
1. Read the guideline file
2. State the rule I found
3. Confirm the correct directory
4. Implement
5. Verify compliance"
This puts the burden on me to check rules before coding, not after.
So I said okay, lets do this and since it had come this far, I went one more step - "I need to create a rule so that whatever I ask, you need to first create a plan based on any rules i might have created, and get confirmation for the plan with specific references to the rules that are being adhered to and then only create the code."
Again I got an enthusiastic reply.
The Rule (Saved in Memory)
Before implementing ANY task:
1. Read applicable guidelines (documentation/guidelines/, CLAUDE.md)
2. Create explicit plan with specific rule references including line numbers
3. Present compliance checklist
4. Show exact file structure
5. Wait for your confirmation
6. Only then implement
Why This Prevents Future Violations
Instead of:
You: "Create tests"
Me: [Codes immediately]
You: "Why are these in the wrong folder?!"
Now it will be:
You: "Create tests"
Me: "## Plan: [Task]
**Applicable Rules:**
- UnitTestPatterns.md:28 — tests/FastAPI/
- CLAUDE.md:line X — [rule]
Files will be created at [exact location]
Proceed?"
You: "Yes" or "Change X"
Me: [Implements exactly as confirmed]
So now I am quite convinced that things should be better.
The next request I put it, the system thought longer, considered all the guidelines, created a plan and showed me for confirmation. All was well, except that this single request which I made in order to add a few methods involving only two files ended up consuming 15K tokens for just the plan creation.
So either I end up giving many many instructions each time to be more specific or I deal with quickly escalating costs. It sounds very much like hiring a new person and deciding the budget for the new position.
If you want someone really effective, you invest more money, if you want someone more economical, you invest more time.
But I digress. The most important thing I realized is that AI is very capable of operating outside the rules we set for it. The control you have on it has to be offset against the creativity you require from it. Right now our expectations on its creative side are much higher than the control we would like to be responsible for. And in the pursuit of that magical balance, we give more and more of our control away. What happens when hundreds of agents have created systems with no oversight and then they hit a roadblock that they are not able to resolve; who will then jump in? Other more expensive AI or highly skilled humans.
What it shows is that AI models are not yet ready for us to abdicate our responsibility when we delegate to them. The path forward is better dialogue rather than an all out transfer of accountability. That one aspect should still be resting with us. Accountability will be one of the key concerns of using AI.

