Author: Andrej Karpathy Compiled by: Tim, PANews 2025 will be a year of rapid development and great uncertainty for large language models, and we have achieved Author: Andrej Karpathy Compiled by: Tim, PANews 2025 will be a year of rapid development and great uncertainty for large language models, and we have achieved

Six major AI paradigm shifts in 2025: From RLVR training and Vibe Coding to Nano Banana

2025/12/22 17:24

Author: Andrej Karpathy

Compiled by: Tim, PANews

2025 will be a year of rapid development and great uncertainty for large language models, and we have achieved fruitful results. Below are some "paradigm shifts" that I personally find noteworthy and somewhat surprising, changes that have altered the landscape and impressed me, at least conceptually.

1. Reinforcement Learning Based on Verifiable Rewards (RLVR)

In early 2025, the LLM production stacks of all AI labs will roughly take the following form:

  • Pre-training (GPT-2/3 in 2020);
  • Supervise fine-tuning (InstructGPT 2022);
  • And reinforcement learning based on human feedback (RLHF, 2022).

For a long time, this has been a stable and mature technology stack for training production-grade large language models. By 2025, reinforcement learning based on verifiable rewards had become the core technology primarily adopted. By training large language models in various environments with automatically verifiable rewards (such as mathematical and programming problem-solving), these models can spontaneously form strategies that resemble "reasoning" in human terms. They learn to break down problem-solving into intermediate computational steps and master multiple strategies for solving problems through iterative deduction (see the example in the DeepSeek-R1 paper). In previous stacks, these strategies were difficult to implement because the optimal reasoning path and backtracking mechanism were not explicit for large language models, and solutions had to be explored through reward optimization to find suitable solutions.

Unlike supervised fine-tuning and human-feedback-based reinforcement learning (these two phases are relatively short and involve minimal computational cost), reinforcement learning based on verifiable rewards involves long-term optimization training on an objective, non-game-theoretic reward function. It has been proven that running reinforcement learning based on verifiable rewards can deliver significant performance improvements at a given cost, consuming a large amount of computational resources originally intended for pre-training. Therefore, the advancements in large language model capabilities in 2025 will primarily be reflected in major AI labs absorbing the enormous computational demands brought about by this new technology. Overall, we see that while the model size remains roughly the same, the training time for reinforcement learning has been significantly extended. Another unique aspect of this new technology is that we have gained a completely new dimension of regulation (and the corresponding Scaling theorem), namely, controlling model capability as a function of computational cost during testing by generating longer inference trajectories and increasing "thinking time." OpenAI's o1 model (released at the end of 2024) was the first demonstration of a reinforcement learning model based on verifiable rewards, while the release of o3 (early 2025) is a clear turning point, allowing a visibly significant leap forward.

2. Ghostly Intelligence vs. Animal-like Sawtooth Intelligence

2025 marked the first time I (and I believe the entire industry) began to understand the "form" of large language model intelligence from a more intuitive perspective. We are not "evolving and breeding animals," but rather "summoning ghosts." The entire technology stack of large language models (neural architecture, training data, training algorithms, and especially optimization objectives) is fundamentally different. Therefore, it's not surprising that we are obtaining entities in the realm of intelligence that are vastly different from biological intelligence; it's inappropriate to examine them from an animalistic perspective. From the perspective of supervised information, human neural networks are optimized for tribal survival in jungle environments, while large language model neural networks are optimized to mimic human text, obtain rewards in mathematical problems, and win human approval in arenas. As verifiable domains provide the conditions for reinforcement learning based on verifiable rewards, the capabilities of large language models in these domains will "suddenly increase," exhibiting an interesting, jagged performance characteristic overall. They may be both erudite geniuses and confused, cognitively challenged elementary school students, potentially leaking your data under duress.

Human intelligence: blue; AI intelligence: red. I like this version of the meme (sorry I can't find the original post on Twitter) because it points out that human intelligence also presents itself in a unique, jagged wave pattern.

Relatedly, in 2025 I developed a general indifference and distrust towards various benchmarks. The core issue is that benchmarks are essentially verifiable environments, making them highly susceptible to reinforcement learning based on verifiable rewards and weaker forms generated from synthetic data. In the typical "score maximization" process, large language model teams inevitably construct training environments near the small embedding spaces of the benchmarks and cover these areas with "capability jaggedness." "Training on the test set" has become the new normal.

So what if it sweeps all benchmark tests but still fails to achieve general artificial intelligence?

3. Cursor: A new layer for LLM applications

What impressed me most about Cursor (besides its rapid rise this year) is its compelling revelation of a new hierarchy of “LLM applications” as people start talking about “Cursor for XX domain.” As I emphasized in my Y Combinator talk this year, the core of LLM applications like Cursor lies in integrating and orchestrating LLM calls for a specific vertical domain:

  • They are responsible for "context engineering";
  • At the underlying level, multiple LLM calls are orchestrated into increasingly complex directed acyclic graphs, with a fine balance between performance and cost; and application-specific graphical interfaces are provided for people in the "human loop".
  • It also provides an "autonomous adjustment slider".

By 2025, there had been extensive discussions surrounding the development potential of this emerging application layer. Will large language model platforms dominate all applications, or will there still be vast possibilities for large language model applications? My personal prediction is that the positioning of large language model platforms will gradually converge towards cultivating "generalist university graduates," while large language model applications will be responsible for organizing and refining these "graduates," and by providing private data, sensors, actuators, and feedback loops, enabling them to truly become "professional teams" that can be deployed in specific vertical fields.

4. Claude Code: AI running locally

The emergence of Claude Code convincingly demonstrates for the first time the form of LLM agents, combining tool use with inference processes in a cyclical manner to achieve more persistent and complex problem-solving. Furthermore, what impressed me most about Claude Code is that it runs on the user's personal computer, deeply integrated with the user's private environment, data, and context. I believe OpenAI's assessment in this direction is somewhat flawed, as they have focused their code assistant and agent development on cloud deployment—specifically, containerized environments orchestrated by ChatGPT—rather than local localhost environments. While cloud-running agent clusters seem to represent the "ultimate form of general artificial intelligence," we are currently in a transitional phase characterized by uneven capability development and relatively slow progress. Under these circumstances, deploying agents directly on local computers, closely collaborating with developers and their specific work environments, is a more logical path. Claude Code accurately grasps this priority and encapsulates it in a concise, elegant, and highly attractive command-line tool, thus reshaping how AI is presented. It is no longer just a website like Google that needs to be accessed, but a tiny sprite or ghost "residing" in your computer. This is a completely new and unique paradigm for interacting with AI.

5. Vibe Coding - An environment for programming

By 2025, AI will have crossed a critical capability threshold, making it possible to build amazing programs using only English descriptions, without even needing to understand the underlying code. Interestingly, I coined the term "Vibe Coding" in a casual tweet during a shower, never imagining it would evolve to its current state. In the Vibe Coding paradigm, programming is no longer strictly confined to highly trained professionals, but becomes something everyone can participate in. From this perspective, it's yet another example of the phenomenon I described in my article, "Empowering People: How Large Language Models Are Changing Technology Diffusion Patterns." In stark contrast to all other technologies to date, ordinary people benefit more from large language models than professionals, businesses, and governments. But Vibe Coding not only empowers ordinary people to access programming but also empowers professional developers to write more software that "would never have been implemented otherwise." While developing nanochat, I used Vibe Coding to write a custom, efficient BPE tokenizer in Rust, without relying on existing libraries or delving into Rust. This year, I also used Vibe Coding to quickly prototype several projects, simply to verify the feasibility of certain ideas. I've even written entire one-off applications just to pinpoint a specific vulnerability, because code suddenly becomes free, ephemeral, malleable, and disposable. Atmospheric programming will reshape the software development ecosystem and profoundly change the boundaries of career definitions.

6. Nano Banana: LLM graphical interface

Google's Gemini Nano banana is one of the most disruptive paradigm shifts of 2025. In my view, Large Language Models (LLMs) represent the next major computing paradigm after computers of the 1970s and 80s. Therefore, we will see similar innovations based on similar fundamental reasons, resembling the evolution of personal computing, microcontrollers, and even the internet. Especially at the level of human-computer interaction, the current "dialogue" mode with LLMs is somewhat similar to inputting commands into computer terminals in the 1980s. Text is the most primitive form of data representation for computers (and LLMs), but it is not the preferred method for humans (especially when inputting). Humans actually dislike reading text; it is slow and laborious. Instead, humans prefer to receive information through visual and spatial dimensions, which is why graphical user interfaces (GUIs) were born in traditional computing. Similarly, Large Language Models should communicate with us in forms that humans prefer, through images, infographics, slides, whiteboards, animations, videos, web applications, and other media. Early forms have already achieved this through "visual text decorations" such as emojis and Markdown (e.g., headings, bolding, lists, tables, and other typographical elements). But who will ultimately build the graphical interface for a large language model? From this perspective, nano banana is an early prototype of this future blueprint. It is worth noting that nano banana's breakthrough lies not only in its image generation capabilities, but also in its comprehensive capabilities formed by the interweaving of text generation, image generation, and world knowledge in the model weights .

Market Opportunity
SIX Logo
SIX Price(SIX)
$0.01112
$0.01112$0.01112
+0.54%
USD
SIX (SIX) Live Price Chart
Disclaimer: The articles reposted on this site are sourced from public platforms and are provided for informational purposes only. They do not necessarily reflect the views of MEXC. All rights remain with the original authors. If you believe any content infringes on third-party rights, please contact [email protected] for removal. MEXC makes no guarantees regarding the accuracy, completeness, or timeliness of the content and is not responsible for any actions taken based on the information provided. The content does not constitute financial, legal, or other professional advice, nor should it be considered a recommendation or endorsement by MEXC.

You May Also Like

Atlassian’s Monumental DX Acquisition: Revolutionizing Developer Productivity for a Billion-Dollar Future

Atlassian’s Monumental DX Acquisition: Revolutionizing Developer Productivity for a Billion-Dollar Future

BitcoinWorld Atlassian’s Monumental DX Acquisition: Revolutionizing Developer Productivity for a Billion-Dollar Future In a move that sends ripples across the tech industry, impacting everything from foundational infrastructure to the cutting-edge innovations seen in blockchain and cryptocurrency development, productivity software giant Atlassian has made its largest acquisition to date. This isn’t just another corporate buyout; it’s a strategic investment in the very fabric of how software is built. The Atlassian acquisition of DX, a pioneering developer productivity platform, for a staggering $1 billion, signals a profound commitment to optimizing engineering workflows and understanding the true pulse of development teams. For those invested in the efficiency and scalability of digital ecosystems, this development underscores the growing importance of robust tooling at every layer. Unpacking the Monumental Atlassian Acquisition: A Billion-Dollar Bet on Developer Efficiency On a recent Thursday, Atlassian officially announced its agreement to acquire DX for $1 billion, a sum comprising both cash and restricted stock. This substantial investment highlights Atlassian’s belief in the critical role of developer insights in today’s fast-paced tech landscape. For years, Atlassian has been synonymous with collaboration and project management tools, powering teams worldwide with products like Jira, Confluence, and Trello. However, recognizing a growing need, the company has now decisively moved to integrate a dedicated developer productivity insight platform into its formidable product suite. This acquisition isn’t merely about expanding market share; it’s about deepening Atlassian’s value proposition by providing comprehensive visibility into the health and efficiency of engineering operations. The strategic rationale behind this billion-dollar move is multifaceted. Atlassian co-founder and CEO Mike Cannon-Brookes shared with Bitcoin World that after a three-year attempt to build an in-house developer productivity insight tool, his Sydney-based company realized the immense value of an external, existing solution. This candid admission speaks volumes about the complexity and specialized nature of developer productivity measurement. DX emerged as the natural choice, not least because an impressive 90% of DX’s existing customers were already leveraging Atlassian’s project management and collaboration tools. This pre-existing synergy promises a smoother integration and immediate value for a significant portion of the combined customer base. What is the DX Platform and Why is it a Game-Changer? At its core, DX is designed to empower enterprises by providing deep analytics into how productive their engineering teams truly are. More importantly, it helps identify and unblock bottlenecks that can significantly slow down development cycles. Launched five years ago by Abi Noda and Greyson Junggren, DX emerged from a fundamental challenge: the lack of accurate and non-intrusive metrics to understand developer friction. Abi Noda, in a 2022 interview with Bitcoin World, articulated his founding vision: to move beyond superficial metrics that often failed to capture the full picture of engineering challenges. His experience as a product manager at GitHub revealed that traditional measures often felt like surveillance rather than support, leading to skewed perceptions of productivity. DX was built on a different philosophy, focusing on qualitative and quantitative insights that truly reflect what hinders teams, without making developers feel scrutinized. Noda noted, “The assumptions we had about what we needed to help ship products faster were quite different than what the teams and developers were saying was getting in their way.” Since emerging from stealth in 2022, the DX platform has demonstrated remarkable growth, tripling its customer base every year. It now serves over 350 enterprise customers, including industry giants like ADP, Adyen, and GitHub. What makes DX’s success even more impressive is its lean operational model; the company achieved this rapid expansion while raising less than $5 million in venture funding. This efficiency underscores the inherent value and strong market demand for its solution, making it an exceptionally attractive target for Atlassian. Boosting Developer Productivity: Atlassian’s Strategic Vision The acquisition of DX is a clear signal of Atlassian’s strategic intent to not just manage tasks, but to optimize the entire software development lifecycle. By integrating DX’s capabilities, Atlassian aims to offer an end-to-end “flywheel” for engineering teams. This means providing tools that not only facilitate collaboration and project tracking but also offer actionable insights into where processes are breaking down and how they can be improved. Mike Cannon-Brookes elaborated on this synergy, stating, “DX has done an amazing job [of] understanding the qualitative and quantitative aspects of developer productivity and turning that into actions that can improve those companies and give them insights and comparisons to others in their industry, others at their size, etc.” This capability to benchmark and identify specific areas for improvement is invaluable for organizations striving for continuous enhancement. Abi Noda echoed this sentiment, telling Bitcoin World that the combined entities are “better together than apart.” He emphasized how Atlassian’s extensive suite of tools complements the data and information gathered by DX. “We are able to provide customers with that full flywheel to get the data and understand where we are unhealthy,” Noda explained. “They can plug in Atlassian’s tools and solutions to go address those bottlenecks. An end-to-end flywheel that is ultimately what customers want.” This integration promises to create a seamless experience, allowing teams to move from identifying an issue to implementing a solution within a unified ecosystem. The Intersection of Enterprise Software and Emerging Tech Trends This landmark acquisition also highlights a significant trend in the broader enterprise software landscape: a shift towards more intelligent, data-driven solutions that directly impact operational efficiency and competitive advantage. As companies continue to invest heavily in digital transformation, the ability to measure and optimize the output of their most valuable asset — their engineering talent — becomes paramount. DX’s impressive roster of over 350 enterprise customers, including some of the largest and most technologically advanced organizations, is a testament to the universal need for such a platform. These companies recognize that merely tracking tasks isn’t enough; they need to understand the underlying dynamics of their engineering teams to truly unlock their potential. The integration of DX into Atlassian’s ecosystem will likely set a new standard for what enterprise software can offer, pushing competitors to enhance their own productivity insights. Moreover, this move by Atlassian, a global leader in enterprise collaboration, underscores a broader investment thesis in foundational tooling. Just as robust blockchain infrastructure is critical for the future of decentralized finance, powerful and insightful developer tools are essential for the evolution of all software, including the complex applications underpinning Web3. The success of companies like DX, which scale without massive external funding, also resonates with the lean, efficient ethos often celebrated in the crypto space. Navigating the Era of AI Tools: Measuring Impact and ROI Perhaps one of the most compelling aspects of this acquisition, as highlighted by Atlassian’s CEO, is its timely relevance in the era of rapidly advancing AI tools. Mike Cannon-Brookes noted that the rise of AI has created a new imperative for companies to measure its usage and effectiveness. “You suddenly have these budgets that are going up. Is that a good thing? Is that not a good thing? Am I spending the money in the right ways? It’s really, really important and critical.” With AI-powered coding assistants and other generative AI solutions becoming increasingly prevalent in development workflows, organizations are grappling with how to quantify the return on investment (ROI) of these new technologies. DX’s platform can provide the necessary insights to understand if AI tools are genuinely boosting productivity, reducing bottlenecks, or simply adding to complexity. By offering clear data on how AI impacts developer efficiency, DX will help enterprises make smarter, data-driven decisions about their AI investments. This foresight positions Atlassian not just as a provider of developer tools, but as a strategic partner in navigating the complexities of modern software development, particularly as AI integrates more deeply into every facet of the engineering process. It’s about empowering organizations to leverage AI effectively, ensuring that these powerful new tools translate into tangible improvements in output and innovation. The Atlassian acquisition of DX represents a significant milestone for both companies and the broader tech industry. It’s a testament to the growing recognition that developer productivity is not just a buzzword, but a measurable and critical factor in an organization’s success. By combining DX’s powerful insights with Atlassian’s extensive suite of collaboration and project management tools, the merged entity is poised to offer an unparalleled, end-to-end solution for optimizing software development. This strategic move, valued at a billion dollars, underscores Atlassian’s commitment to innovation and its vision for a future where engineering teams are not only efficient but also deeply understood and supported, paving the way for a more productive and insightful era in enterprise software. To learn more about the latest AI market trends, explore our article on key developments shaping AI features. This post Atlassian’s Monumental DX Acquisition: Revolutionizing Developer Productivity for a Billion-Dollar Future first appeared on BitcoinWorld.
Share
Coinstats2025/09/18 21:40
China Bans Nvidia’s RTX Pro 6000D Chip Amid AI Hardware Push

China Bans Nvidia’s RTX Pro 6000D Chip Amid AI Hardware Push

TLDR China instructs major firms to cancel orders for Nvidia’s RTX Pro 6000D chip. Nvidia shares drop 1.5% after China’s ban on key AI hardware. China accelerates development of domestic AI chips, reducing U.S. tech reliance. Crypto and AI sectors may seek alternatives due to limited Nvidia access in China. China has taken a bold [...] The post China Bans Nvidia’s RTX Pro 6000D Chip Amid AI Hardware Push appeared first on CoinCentral.
Share
Coincentral2025/09/18 01:09
UWRO President Nail Saifutdinov: Digital Solutions for Faith Communities and Remembrance Services—Under One International Foundation

UWRO President Nail Saifutdinov: Digital Solutions for Faith Communities and Remembrance Services—Under One International Foundation

UWRO (United World Religions Organization) is an international faith tech foundation working at the intersection of technology, media, and social impact. It creates
Share
Techbullion2025/12/26 20:19