What is the OpenAI Jalapeño chip and what is it used for?

Jalapeño is OpenAI's first proprietary ASIC chip, specifically optimised for inference of large language models (LLMs). It is called an 'Intelligence Processor' and is designed to execute inference workloads significantly more efficiently than current Nvidia GPUs. The chip was developed in partnership with Broadcom and TSMC and is already running internally with GPT-5.3-Codex-Spark.

When will the Jalapeño chip be available and when will API costs drop?

First prototype runs begin late 2026. Gradual ramp-up occurs in 2027, with full production capacity reached in the first half of 2028. Cost reductions in OpenAI APIs could become noticeable from 2027/2028 onward, depending on how quickly OpenAI migrates inference workloads to Jalapeño.

How much cheaper will AI inference become with Jalapeño compared to Nvidia GPUs?

According to Broadcom CEO Hock Tan, Jalapeño aims to reduce inference costs per token by approximately 50 percent. OpenAI itself speaks of 'substantially better performance per watt' compared to current GPUs. Actual cost savings will depend on yield, scale, and total cost of ownership (TCO).

What does full-stack AI platform mean in the context of OpenAI?

Full-stack means OpenAI controls every layer of the AI infrastructure: model architecture, training, inference optimisation, software layer, and now hardware (chips). This vertical integration enables tighter feedback loops, better cost efficiency, and stronger differentiation versus providers reliant on external hardware.

What alternatives to Nvidia are other AI providers developing?

Google has used TPUs (Tensor Processing Units) since 2016, Amazon offers Trainium (training) and Inferentia (inference), Meta develops MTIA, and Microsoft is working on Maia chips. Anthropic is also exploring proprietary chip designs. These providers pursue similar strategies to OpenAI: cost reduction and independence from Nvidia.

All articles

AI Insights

OpenAI Jalapeño: The First Proprietary AI Chip and Its Strategic Impact

Chris Jon Graf · AI Strategist & CEOPublished on 25 June 2026

In short

OpenAI has launched Jalapeño, its first proprietary ASIC chip for AI inference – developed in just nine months with Broadcom and TSMC. The 'Intelligence Processor' aims to reduce inference costs per token by 50 percent compared to current Nvidia GPUs and is already running GPT-5.3-Codex-Spark internally. Deployment begins late 2026, with full ramp-up by mid-2028. For companies relying on AI APIs, this means potentially lower operating costs and the start of a full-stack strategy that consolidates OpenAI's control over the entire AI infrastructure.

Jalapeño: OpenAI's Strategic Move Toward Chip Sovereignty

In June 2026, OpenAI unveiled its first proprietary chip: Jalapeño, an ASIC (Application-Specific Integrated Circuit) optimised exclusively for large language model (LLM) inference. Unlike general-purpose GPUs from Nvidia, Jalapeño is a specialised 'Intelligence Processor' developed in partnership with Broadcom (silicon design) and Celestica (board and rack integration). TSMC handles manufacturing.

The chip marks a strategic inflection point for OpenAI: moving away from dependence on external hardware providers toward a full-stack platform with control over models, software, and now the underlying hardware. This approach follows the precedent set by Apple (M-series chips), Google (TPU), Amazon (Trainium/Inferentia), and Meta (MTIA) – companies that leverage vertical integration to reduce costs and maximise performance.

Technical Specifications: 9 Months from Design to Tape-Out

OpenAI claims the nine-month development cycle from design to tape-out is the fastest ever achieved for a high-performance ASIC. According to Greg Brockman, President of OpenAI, the company's own AI models played a central role in accelerating chip design – a striking case of AI optimising its own infrastructure.

Engineering samples are already running GPT-5.3-Codex-Spark in internal tests
Performance per watt is 'substantially better than current state-of-the-art,' according to OpenAI
Likely eight HBM stacks (High Bandwidth Memory) to minimise latency in inference workloads
Deployment starts late 2026, full ramp-up by mid-2028

Broadcom CEO Hock Tan confirmed to CNBC that a small prototype run will begin late 2026, followed by ramp-up in 2027 and full capacity in the first half of 2028. Unconfirmed reports suggest Microsoft will take approximately 40 percent of the initial chip output – a signal of the deep infrastructure alignment between both partners.

50 Percent Cost Reduction: What Does It Mean for API Customers?

The critical metric for enterprise customers: Jalapeño aims to cut inference costs per token by approximately 50 percent compared to current Nvidia GPU clusters. This figure comes from Broadcom CEO Hock Tan and refers to performance per watt and total cost of ownership (TCO). OpenAI itself uses more cautious language, stating 'substantially better performance per watt.'

For companies heavily reliant on OpenAI APIs such as GPT-4, GPT-5, or future models, the implications are significant: cheaper inference means potentially lower API prices or more performance at the same price. This is particularly relevant for inference-intensive applications such as AI agents, autonomous workflows, or real-time assistants that continuously query models. Companies operating content pipelines with AI agents – as described in our article 'Which Tool Does an AI Agent Call First?' – will benefit directly from falling operational costs.

Full-Stack Platform: From API to Vertical Integration

OpenAI explicitly positions Jalapeño as a building block of a 'full-stack' AI platform. The company now controls every layer: model architecture, training, inference optimisation, software layer, and now physical hardware. This vertical integration enables tighter feedback loops between hardware design and model development.

According to TechRadar, OpenAI is working on a multi-generation chip roadmap. After Jalapeño, chips with codenames like Serrano, Cayenne, or Habanero may follow – nomenclature reminiscent of escalating chilli pepper heat levels, likely symbolising increasing performance tiers. In parallel, OpenAI has announced a 10-gigawatt commitment with Microsoft and other partners through 2029 – a massive scaling of compute capacity.

The Chip Race: OpenAI Against Google, Amazon, and Anthropic

OpenAI is not the first AI provider to develop proprietary chips. Google has operated TPUs (Tensor Processing Units) since 2016, Amazon offers Trainium for training and Inferentia for inference, and Meta develops MTIA (Meta Training and Inference Accelerator). According to Fortune, Anthropic – a direct OpenAI competitor – is also exploring proprietary chip designs.

Google TPU: Market leader in AI-specific ASICs, especially for proprietary models like Gemini
Amazon Trainium/Inferentia: Cost-effective alternative for AWS customers
Meta MTIA: Focus on internal workloads, no external commercialisation
Microsoft Maia: In development, closely aligned with OpenAI infrastructure
OpenAI Jalapeño: Inference-native, first generation of a multi-chip roadmap

Broadcom CEO Hock Tan compared Jalapeño to Nvidia's Blackwell chips and Google's TPUs in terms of speed and efficiency for LLM workloads. This is notable, as Blackwell chips (B100, B200) represent Nvidia's latest GPU generation and serve as the benchmark for inference performance.

Implications for AI Outsourcing and Enterprise Strategy

The Jalapeño announcement has direct consequences for companies that source AI externally or build internally:

Second, the question of vendor lock-in shifts. Companies deeply invested in OpenAI's stack will benefit from vertical integration – but also bind themselves more tightly to infrastructure decisions. For regulated sectors in Switzerland – such as financial services or healthcare – questions of data sovereignty and GDPR compliance remain central, as we have outlined in the context of the 'EU AI Act Omnibus 2026.'

Third, Jalapeño increases pressure on traditional chip vendors. Nvidia dominates the AI chip market with an estimated 80 percent share in training and inference. If OpenAI, Google, Amazon, and Meta deploy proprietary chips at scale, Nvidia's quasi-monopoly erodes. This could lead to price cuts on Nvidia hardware – or increased differentiation through software layers like CUDA and NIM.

Timeline and Next Steps

Late 2026: First prototype runs of Jalapeño begin with select partners
2027: Gradual production ramp-up, first production workloads run on Jalapeño
H1 2028: Full production capacity reached, OpenAI migrates significant inference load to Jalapeño
2028–2029: Second generation (Serrano?) and scaling to 10 gigawatts of compute capacity

For enterprises, this timeline means: the next 18 to 24 months remain a transition period during which OpenAI will continue to rely primarily on Nvidia hardware. From 2028 onward, cost-performance ratios should improve noticeably.

Conclusion: Full-Stack Control as Competitive Advantage

OpenAI's Jalapeño chip is more than a technical innovation – it is a strategic signal. The company is transforming from a pure model developer into a vertically integrated AI platform that controls hardware, software, and models end-to-end. The nine-month development cycle demonstrates that AI models themselves become accelerators of chip development – a self-reinforcing cycle that challenges established semiconductor paradigms.

For Swiss companies seeking to deploy AI strategically, Jalapeño means concretely: inference costs will decline, AI agents and autonomous systems will become more economical, and market power will shift further toward full-stack providers. The decision for or against OpenAI as a primary AI partner thus becomes a long-term infrastructure and cost decision – not merely a question of model quality.

Frequently asked questions

What is the OpenAI Jalapeño chip and what is it used for?: Jalapeño is OpenAI's first proprietary ASIC chip, specifically optimised for inference of large language models (LLMs). It is called an 'Intelligence Processor' and is designed to execute inference workloads significantly more efficiently than current Nvidia GPUs. The chip was developed in partnership with Broadcom and TSMC and is already running internally with GPT-5.3-Codex-Spark.
When will the Jalapeño chip be available and when will API costs drop?: First prototype runs begin late 2026. Gradual ramp-up occurs in 2027, with full production capacity reached in the first half of 2028. Cost reductions in OpenAI APIs could become noticeable from 2027/2028 onward, depending on how quickly OpenAI migrates inference workloads to Jalapeño.
How much cheaper will AI inference become with Jalapeño compared to Nvidia GPUs?: According to Broadcom CEO Hock Tan, Jalapeño aims to reduce inference costs per token by approximately 50 percent. OpenAI itself speaks of 'substantially better performance per watt' compared to current GPUs. Actual cost savings will depend on yield, scale, and total cost of ownership (TCO).
What does full-stack AI platform mean in the context of OpenAI?: Full-stack means OpenAI controls every layer of the AI infrastructure: model architecture, training, inference optimisation, software layer, and now hardware (chips). This vertical integration enables tighter feedback loops, better cost efficiency, and stronger differentiation versus providers reliant on external hardware.
What alternatives to Nvidia are other AI providers developing?: Google has used TPUs (Tensor Processing Units) since 2016, Amazon offers Trainium (training) and Inferentia (inference), Meta develops MTIA, and Microsoft is working on Maia chips. Anthropic is also exploring proprietary chip designs. These providers pursue similar strategies to OpenAI: cost reduction and independence from Nvidia.

Sources

Would you like to explore this topic for your company?

Check Availability

EU AI Act Omnibus 2026: What Swiss SMEs Must Know About AI Agents and High-Risk AI Which Tool Does an AI Agent Call First? The Decision Logic of Autonomous Content Pipelines