I spent four weeks part-time (probably 80 hours total) building a complete reactive UI framework with 40+ components, a router, and supporting interactive website using only LLM-generated code, it is evident LLMs can produce quality code—but like human developers, they need the right guidance.
On Code Quality:
On The Development Process:
The bottom line: In many ways LLMs behave like the average of the humans who trained them—they make similar mistakes, but much faster and at greater scale. Hence, you can get six months of maintenance and enhancement “slop” 6 minutes after you generate an initial clean code base and then ask for changes.
Four weeks ago, I set out to answer a question that's been hotly debated in the development community: Can LLMs generate substantive, production-quality code?
Not a toy application. Not a simple CRUD app. A complete, modern reactive UI framework with lots of pre-built components, a router and a supporting website with:
I chose to build Lightview (lightview.dev)—a reactive UI framework combining the best features of Bau.js, HTMX, and Juris.js. The constraint: 100% LLM-generated code using Anthropic's Claude (Opus 4.5, Sonnet 4.5) and Google's Gemini 3 Pro (Flash was not released when I started).
I began with Claude Opus:
Claude didn't initially dive into code. It asked dozens of clarifying questions:
However, at times it started to write code before I thought it should be ready and I had to abort a response and redirect it.
\
After an hour of back-and-forth, Claude finally said: "No more questions. Would you like me to generate an implementation plan since there will be many steps?"
The resulting plan was comprehensive - a detailed Markdown file with checkboxes, design decisions, and considerations for:
I did not make any substantive changes to this plan except for clarification on website items and the addition of one major feature, declarative event gating at the end of development.
With the plan in place, I hit my token limit on Opus. No problem—I switched to Gemini 3 (High), which had full context from the conversation plus the plan file.
Within minutes, Gemini generated lightview.js—the core reactivity engine—along with two example files: a "Hello, World!" demo showing both Bau-like syntax and vDOM-like syntax.
Then I made a mistake.
"Build the website as an SPA," I said, without specifying to use Lightview itself. I left for lunch.
When I returned, there was a beautiful website running in my browser. I looked at the code and my heart sank: React with Tailwind CSS.
Worse, when I asked to rebuild it with Lightview, I forgot to say "delete the existing site first." So it processed and modified all 50+ files one by one, burning through tokens at an alarming rate.
The first is often more token-efficient for large changes. The second is better for targeted fixes. The LLM won't automatically choose the efficient path—you need to direct it.
One issue caught me off-guard. After Claude generated the website using Lightview components, I noticed it was still full of Tailwind CSS classes. I asked Claude about this.
"Well," Claude effectively explained, "you chose DaisyUI for the UI components, and DaisyUI requires Tailwind as a dependency. I assumed you were okay with Tailwind being used throughout the site."
Fair point—but I wasn't okay with it. I prefer semantic CSS classes and wanted the site to use classic CSS approaches.
\
\
I asked Claude to rewrite the site using classic CSS and semantic classes. I liked the design and did not want to delete the files, so once again I suffered through a refactor that consumed a lot of tokens since it touched so many files. I once again ran out of tokens and tired GPT-OSS bit hit syntax errors and had to switch to another IDE to keep working.
\
Over the next few weeks, I worked to build out the website and test/iterate on components, I worked across multiple LLMs as token limits reset. Claude, Gemini, back to Claude. Each brought different strengths and weaknesses:
The router also needed work. Claude initially implemented a hash-based router (#/about, #/docs, etc.). This is entirely appropriate for an SPA—it's simple, reliable, and doesn't require server configuration.
But I had additional requirements I hadn't clearly stated: I wanted conventional paths (/about, /docs) for deep linking and SEO. Search engines can handle hash routes now, but path-based routing is still cleaner for indexing and sharing.
\
When I told Claude I needed conventional paths for SEO and deep linking, it very rapidly rewrote the router and came up with what I consider a clever solution—a hybrid approach that makes the SPA pages both deep-linkable and SEO-indexable without the complexity of server-side rendering. However, it did leave some of the original code in place which kind of obscured what was going on and was totally un-needed. I had to tell it to remove this code which supported the vestiges of hash-based routes. This code retention is the kind of thing that can lead to slop. I suppose many people would blame the LLM, but if I had been clear to start with and also said “completely re-write”, my guess is the vestiges would not have existed.
\
\
Project Size:
At this point, files seemed reasonable - not overly complex. But intuition and my biased feelings about code after more than 40 years of software development isn't enough. I decided to run formal metrics on the core files.
Core Libraries:
| File | Lines | Minified Size | |----|----|----| | lightview.js | 603 | 7.75K | | lightview-x.js | 1,251 | 20.2K | | lightview-router.js | 182 | 3K |
The website component gallery scored well on Lighthouse for performance without having had super focused optimization.
\ But then came the complexity metrics.
I asked Gemini Flash to evaluate the code using three formal metrics:
1. Maintainability Index (MI): A combined metric where 0 is unmaintainable and 100 is perfectly documented/clean code. The calculation considers:
Scores above 65 are considered healthy for library code. This metric gives you a single number to track code health over time.
2. Cyclomatic Complexity: An older but still valuable metric that measures the number of linearly independent paths through code. High cyclomatic complexity means:
3. Cognitive Complexity: A modern metric that measures the mental effort a human needs to understand code. Unlike cyclomatic complexity (which treats all control flow equally), cognitive complexity penalizes:
The thresholds:
Overall health looked good:
| File | Functions | Avg Maintainability | Avg Cognitive | Status | |----|----|----|----|----| | lightview.js | 58 | 65.5 | 3.3 | ⚖️ Good | | lightview-x.js | 93 | 66.5 | 3.6 | ⚖️ Good | | lightview-router.js | 27 | 68.6 | 2.1 | ⚖️ Good |
But drilling into individual functions told a different story. Two functions hit "Critical" status:
handleSrcAttribute (lightview-x.js):
Anonymous Template Processing (lightview-x.js):
This was slop. Technical debt waiting to become maintenance nightmares.
Here's where it gets interesting. The code was generated by Claude Opus, Claude Sonnet, and Gemini 3 Pro several weeks earlier. Could the newly released Gemini 3 Flash clean it up?
I asked Flash to refactor handleSrcAttribute to address its complexity. This seemed to take a little longer than necessary. So I aborted and spent some time reviewing its thinking process. There were obvious places it got side-tracked or even went in circles, but I told it to continue. After it completed, I manually inspected the code and thoroughly tested all website areas that use this feature. No bugs found.
After the fixes to handleSrcAttribute, I asked for revised statistics to see the improvement.
Unfortunately, Gemini Flash had deleted its metrics-analysis.js file! It had to recreate the entire analyzer.
\
When I told Gemini to keep the metrics scripts permanently, another issue surfaced: it failed to officially install dev dependencies like acorn (the JavaScript parser).
Flash simply assumed that because it found packages in node_modules, it could safely use them. The only reason acorn was available was because I'd already installed a Markdown parser that depended on it.
\
\
With the analyzer recreated, Flash showed how it had decomposed the monolithic function into focused helpers:
\
fetchContent (cognitive: 5)parseElements (cognitive: 5)updateTargetContent (cognitive: 7)elementsFromSelector (cognitive: 2)handleSrcAttribute orchestrator (cognitive: 10)| Metric | Before | After | Improvement | |----|----|----|----| | Cognitive Complexity | 35 🛑 | 10 ✅ | -71% | | Cyclomatic Complexity | 22 | 7 | -68% | | Status | Critical Slop | Clean Code | — |
Manual inspection and thorough website testing revealed zero bugs. The cost? A 0.5K increase in file size - negligible.
Emboldened, I tackled the template processing logic. Since it spanned multiple functions, this required more extensive refactoring:
Extracted Functions:
collectNodesFromMutations - iteration logicprocessAddedNode - scanning logictransformTextNode - template interpolation for texttransformElementNode - attribute interpolation and recursionResults:
| Function Group | Previous Max | New Max | Status | |----|----|----|----| | MutationObserver Logic | 31 🛑 | 6 ✅ | Clean | | domToElements Logic | 12 ⚠️ | 6 ✅ | Clean |
After refactoring, lightview-x.js improved significantly:
All critical slop eliminated. The increased function count reflects healthier modularity - complex logic delegated to specialized, low-complexity helpers. In fact, it is as good or better than established frameworks from a metrics perspective:
| File | Functions | Maintainability (min/avg/max) | Cognitive (min/avg/max) | Status | |----|----|----|----|----| | lightview.js | 58 | 7.2 / 65.5 / 92.9 | 0 / 3.4 / 25 | ⚖️ Good | | lightview-x.js | 103 | 0.0 / 66.8 / 93.5 | 0 / 3.2 / 23 | ⚖️ Good | | lightview-router.js | 27 | 24.8 / 68.6 / 93.5 | 0 / 2.1 / 19 | ⚖️ Good | | react.development.js | 109 | 0.0 / 65.2 / 91.5 | 0 / 2.2 / 33 | ⚖️ Good | | bau.js | 79 | 11.2 / 71.3 / 92.9 | 0 / 1.5 / 20 | ⚖️ Good | | htmx.js | 335 | 0.0 / 65.3 / 92.9 | 0 / 3.4 / 116 | ⚖️ Good | | juris.js | 360 | 21.2 / 70.1 / 96.5 | 0 / 2.6 / 51 | ⚖️ Good |
\
LLMs exhibit the same tendencies as average developers:
The difference? They do it faster and at greater volume. They can generate mountains of slop in hours that would take humans weeks.
Extended reasoning (visible in "thinking" modes) shows alternatives, self-corrections, and occasional "oh but" moments. The thinking is usually fruitful, sometimes chaotic. Don’t just leave or do something else when tasks you belive are comple or critical are being conducted. The LLMs rarely say "I give up" or "Please give me guidance" - I wish they would more often. Watch the thinking flow and abort the response request if necessary. Read the thinking and redirect or just say continue, you will learn a lot.
When I told a second LLM, "You are a different LLM reviewing this code. What are your thoughts?", magic happened.
This behavior is actually beyond what most humans provide:
\
I love that formal software metrics can guide LLM development. They're often considered too dull, mechanical, difficult or costly to obtain for human development, but in an LLM-enhanced IDE with an LLM that can write code to do formal source analysis (no need for an IDE plugin subscription), they should get far more attention than they do.
Metrics don't lie. They identified the slop my intuition missed.
After 40,000 lines of LLM-generated code, I'm cautiously optimistic.
Yes, LLMs can generate quality code. But like human developers, they need:
The criticism that LLMs generate slop isn't wrong—but it's incomplete. They generate slop for the same reasons humans do: unclear requirements, insufficient structure, and lack of quality enforcement.
The difference is iteration speed. What might take a human team months to build and refactor, LLMs can accomplish in hours. The cleanup work remains, but the initial generation accelerates dramatically.
I'm skeptical that most humans will tolerate the time required to be clear and specific with LLMs - just as they don't today when product managers or developers push for detailed requirements from business staff. The desire to "vibe code" and iterate will persist.
But here's what's changed: We can now iterate and clean up faster when requirements evolve or prove insufficient. The feedback loop has compressed from weeks to hours.
As coding environments evolve to wrap LLMs in better structure - automated metrics, enforced patterns, multi-model reviews -the quality will improve. We're not there yet, but the foundation is promising.
The real question isn't whether LLMs can generate quality code. It's whether we can provide them - and ourselves - with the discipline to do so consistently.
And, I have a final concern … if LLMs are based on history and have a tendency to stick with what they know, then how are we going to evolve the definition and use of things like UI libraries? Are we forever stuck with React unless we ask for something different? Or, are libraries an anachronism? Will LLMs and image or video models soon just generate the required image of a user interface with no underlying code?
Given its late entry into the game and the anchoring LLMs already have, I don’t hold high hopes for the adoption of Lightview, but it was an interesting experiment. You can visit the project at: [https://lightview.dev]()
\ \ \ \


