The cost model leverages SMT‑based solving (Z3) to achieve optimal decoding speed under CPU, I/O, and memory constraints.The cost model leverages SMT‑based solving (Z3) to achieve optimal decoding speed under CPU, I/O, and memory constraints.

How PowerInfer‑2 Turns Your Smartphone Into an AI Workstation

2025/11/04 03:56

Abstract and 1. Introduction

  1. Background and Motivation
  2. PowerInfer-2 Overview
  3. Neuron-Aware Runtime Inference
  4. Execution Plan Generation
  5. Implementation
  6. Evaluation
  7. Related Work
  8. Conclusion and References

5 Execution Plan Generation

Today’s smartphones are equipped with a variety of hardware specifications, such as differing CPU capabilities, I/O throughput, and DRAM sizes. Users deploying LLMs on these devices also have diverse objectives. Some may prioritize a balance between generation speed and memory usage, while others aim to maximize hardware utilization for increased speed. Additionally, the models themselves vary in weight numbers, structures, and sparsity levels. To manage this complexity, PowerInfer-2 includes an offline planner specifically designed to develop execution plans that optimally meet these varied requirements.

\

5.1 Execution Plan

\

5.2 Input Parameters

Table 2 also lists three categories of input parameters:

\ • Hardware: Parameters profiled from the hardware, such as CPU FLOPS, I/O throughput, and memory bandwidth.

\ • User: Parameters specified by the user, such as CPU constraints, memory limit, and lower bound of decoding speed.

\ • Model: Parameters about the model collected by an offline profiler, such as the size of the model, sparsity levels and caching characteristics, etc.

\

\

5.3 Cost Model

After collecting the input parameters, the planner uses a cost model to generate the execution plan. The goal is to maximize the generation speed s (as defined by Equation 1) while adhering to user-specified constraints (Formulas 3-5). The decoding speed s is inversely proportional to the time taken to decode one token (Equation 1), which is determined by the computation times for that token (Equation 2), as we efficiently overlap the computation and I/O operations. As we have defined the objective function and the constraints, the constructed model can be solved by mature SMT solvers. In our implementation, we utilize the Z3 solver [11] to solve the cost model.

\

\ To compute the decoding time, we first model the times for computation. As we observed that memory opeartion is not a significant factor compared to the computation, we do not consider it in the computation time. Computation time (Equation 6) is primarily influenced by the attention blocks, predictors, and FFN blocks. The calculation involves dividing the computational workload of these components by the CPU flops (defined in Equation 7- 8). The flops of the selected CPU cores are specified in Equations 9.

\

\ Table 2: Symbols used in execution planning.

\ As FFN block computation overlaps with neuron loading, the planner must also account for I/O transmission time. This is calculated by dividing the volume of neurons transferred from flash storage (Equation 10) by the I/O bandwidth. This transferred volume depends on both the activation rate and the cache miss rate.

\

\ Finally, the planner calculates the time to load neurons from memory, which relates to the weight sizes of attention blocks, predictors, and neurons activated at runtime. The memory time is determined by dividing the total weight of activated neurons for one token by the memory bandwidth (Equation 11).

\

6 Implementation

PowerInfer-2 is developed on top of PowerInfer [30], a stateof-the-art serving framework designed for sparsely-activated LLMs, by integrating an additional 12K lines of C++ code into PowerInfer [30]. These enhancements encompass several key areas, including the polymorphic neuron engine, neuron cache, flexible neuron loading, and neuron-cluster-level I/O pipeline.

\ Since PowerInfer-2 depends on privileged system APIs (e.g., mlock that locks pages in memory) that needs the root permission, we built it on the Android [5] platform. Even though there is no need to alter the system kernel, a rooted Android system still provides us with considerable flexibility in developing and debugging our system. Furthermore, PowerInfer-2 is inherently designed with no modifications to the kernel, making it easily portable to other operating systems, including iOS [14] platform.

\ The current implementation of PowerInfer-2 supports a diverse array of LLMs with varying model sizes, including Llama-2 family [27] (7B, 13B), TurboSparse-Mistral [31] (7B), and TurboSparse-Mixtral [31] (47B).

\ Table 3: Hardware specifications of smartphones we used in the evaluation. “DRAM” is the physical memory size. “Available” is the maximum memory size that can be occupied by an application.

\

:::info Authors:

(1) Zhenliang Xue, Co-first author from Institute of Parallel and Distributed Systems (IPADS), Shanghai Jiao Tong University;

(2) Yixin Song, Co-first author from Institute of Parallel and Distributed Systems (IPADS), Shanghai Jiao Tong University;

(3) Zeyu Mi, Institute of Parallel and Distributed Systems (IPADS), Shanghai Jiao Tong University ([email protected]);

(4) Le Chen, Institute of Parallel and Distributed Systems (IPADS), Shanghai Jiao Tong University;

(5) Yubin Xia, Institute of Parallel and Distributed Systems (IPADS), Shanghai Jiao Tong University;

(6) Haibo Chen, Institute of Parallel and Distributed Systems (IPADS), Shanghai Jiao Tong University.

:::


:::info This paper is available on arxiv under CC BY 4.0 license.

:::

\

Piyasa Fırsatı
Sleepless AI Logosu
Sleepless AI Fiyatı(AI)
$0.03709
$0.03709$0.03709
-0.77%
USD
Sleepless AI (AI) Canlı Fiyat Grafiği
Sorumluluk Reddi: Bu sitede yeniden yayınlanan makaleler, halka açık platformlardan alınmıştır ve yalnızca bilgilendirme amaçlıdır. MEXC'nin görüşlerini yansıtmayabilir. Tüm hakları telif sahiplerine aittir. Herhangi bir içeriğin üçüncü taraf haklarını ihlal ettiğini düşünüyorsanız, kaldırılması için lütfen [email protected] ile iletişime geçin. MEXC, içeriğin doğruluğu, eksiksizliği veya güncelliği konusunda hiçbir garanti vermez ve sağlanan bilgilere dayalı olarak alınan herhangi bir eylemden sorumlu değildir. İçerik, finansal, yasal veya diğer profesyonel tavsiye niteliğinde değildir ve MEXC tarafından bir tavsiye veya onay olarak değerlendirilmemelidir.

Ayrıca Şunları da Beğenebilirsiniz

Trump-Backed WLFI Plunges 58% – Buyback Plan Announced to Halt Freefall

Trump-Backed WLFI Plunges 58% – Buyback Plan Announced to Halt Freefall

World Liberty Financial (WLFI), the Trump-linked DeFi project, is scrambling to stop a market collapse after its token lost over 50% of its value in September. On Friday, the project unveiled a full buyback-and-burn program, directing all treasury liquidity fees to absorb selling pressure. According to a governance post on X, the community approved the plan overwhelmingly, with WLFI pledging full transparency for every burn. The urgency of the move reflects WLFI’s steep losses in recent weeks. WLFI is trading Friday at $0.19, down from its September 1 peak of $0.46, according to CoinMarketCap, a 58% drop in less than a month. Weekly losses stand at 12.85%, with a 15.45% decline for the month. This isn’t the project’s first attempt at intervention. Just days after launch, WLFI burned 47 million tokens on September 3 to counter a 31% sell-off, sending the supply to a verified burn address. For World Liberty Financial, the buyback-and-burn program represents both a damage-control measure and a test of community faith. While tokenomics adjustments can provide short-term relief, the project will need to convince investors that WLFI has staying power beyond interventions. WLFI Launches Buyback-and-Burn Plan, Linking Token Scarcity to Platform Growth According to the governance proposal, WLFI will use fees generated from its protocol-owned liquidity (POL) pools on Ethereum, BNB Chain, and Solana to repurchase tokens from the open market. Once bought back, the tokens will be sent to a burn address, permanently removing them from circulation.WLFI Proposal Source: WLFI The project stressed that this system ties supply reduction directly to platform growth. As trading activity rises, more liquidity fees are generated, fueling larger buybacks and burns. This seeks to create a feedback loop where adoption drives scarcity, and scarcity strengthens token value. Importantly, the plan applies only to WLFI’s protocol-controlled liquidity pools. Community and third-party liquidity pools remain unaffected, ensuring the mechanism doesn’t interfere with external ecosystem contributions. In its proposal, the WLFI team argued that the strategy aligns long-term holders with the project’s future by systematically reducing supply and discouraging short-term speculation. Each burn increases the relative stake of committed investors, reinforcing confidence in WLFI’s tokenomics. To bolster credibility, WLFI has pledged full transparency: every buyback and burn will be verifiable on-chain and reported to the community in real time. WLFI Joins Hyperliquid, Jupiter, and Sky as Buyback Craze Spills Into Wall Street WLFI’s decision to adopt a full buyback-and-burn strategy places it among the most ambitious tokenomic models in crypto. While partly a response to its sharp September price decline, the move also reflects a trend of DeFi protocols leveraging revenue streams to cut supply, align incentives, and strengthen token value. Hyperliquid illustrates the model at scale. Nearly all of its platform fees are funneled into automated $HYPE buybacks via its Assistance Fund, creating sustained demand. By mid-2025, more than 20 million tokens had been repurchased, with nearly 30 million held by Q3, worth over $1.5 billion. This consistency both increased scarcity and cemented Hyperliquid’s dominance in decentralized derivatives. Other protocols have adopted variations. Jupiter directs half its fees into $JUP repurchases, locking tokens for three years. Raydium earmarks 12% of fees for $RAY buybacks, already removing 71 million tokens, roughly a quarter of the circulating supply. Burn-based models push further, as seen with Sky, which has spent $75 million since February 2025 to permanently erase $SKY tokens, boosting scarcity and governance influence. But the buyback phenomenon isn’t limited to DeFi. Increasingly, listed companies with crypto treasuries are adopting aggressive repurchase programs, sometimes to offset losses as their digital assets decline. According to a report, at least seven firms, ranging from gaming to biotech, have turned to buybacks, often funded by debt, to prop up falling stock prices. One of the latest is Thumzup Media, a digital advertising company with a growing Web3 footprint. On Thursday, it launched a $10 million share repurchase plan, extending its capital return strategy through 2026, after completing a $1 million program that saw 212,432 shares bought at an average of $4.71. DeFi Development Corp, the first public company built around a Solana-based treasury strategy, also recently expanded its buyback program to $100 million, up from $1 million, making it one of the largest stock repurchase initiatives in the digital asset sector. Together, these cases show how buybacks, whether in tokenomics or equities, are emerging as a key mechanism for stabilizing value and signaling confidence, even as motivations and execution vary widely
Paylaş
CryptoNews2025/09/26 19:12
Son of filmmaker Rob Reiner charged with homicide for death of his parents

Son of filmmaker Rob Reiner charged with homicide for death of his parents

FILE PHOTO: Rob Reiner, director of "The Princess Bride," arrives for a special 25th anniversary viewing of the film during the New York Film Festival in New York
Paylaş
Rappler2025/12/16 09:59
Bitcoin Peak Coming in 45 Days? BTC Price To Reach $150K

Bitcoin Peak Coming in 45 Days? BTC Price To Reach $150K

The post Bitcoin Peak Coming in 45 Days? BTC Price To Reach $150K appeared first on Coinpedia Fintech News Bitcoin has delivered one of its strongest performances in recent months, jumping from September lows of $108K to over $117K today. But while excitement is high, market watchers warn the clock is ticking.  History shows Bitcoin peaks don’t last forever, and analysts now believe the next major top could arrive within just 45 days, with …
Paylaş
CoinPedia2025/09/18 15:49