← all stories other 1 sources · 13h ago

OpenAI Engineers Find Way to Cut Inference Costs by More Than Half

If the cost reduction can be generalized across all users, OpenAI could lower prices, expand free access, or absorb more agent workloads without purchasing additional chips, protecting profit margins in an industry racing to build data centers.

Reporting from 1 sources: GIGAZINE.

OpenAI Engineers Find Way to Cut Inference Costs by More Than Half

OpenAI engineers have reportedly discovered a method to reduce inference costs by over 50% through new optimization techniques. The breakthrough, reported by The Information, could significantly lower operating expenses for the company, which spent an estimated $5 billion on inference in the first half of 2025 alone.

In early June 2026, OpenAI engineers told colleagues they had found a way to cut inference costs by more than half through new optimization methods, according to The Information. The specific techniques are unclear, but one source said that when applied to ChatGPT guest users, the number of NVIDIA GPUs required for part of the processing dropped to about 200.

Inference costs account for the majority of operating expenses for AI companies operating at scale. Edward Zitron estimates OpenAI spent over $5 billion on inference in just the first half of 2025, significantly exceeding expected revenue. The cost reduction reportedly comes from improved utilization of existing servers rather than hardware contract changes.

AI Weekly speculates the methods may involve smarter batching, improved cache reuse, quantization, or routing simpler queries to cheaper models, but notes it is unclear whether the same method works for free or paid account users who are not guest users.

Synthesized by Yomimono from the 1 cited source below, including Japanese-language reporting where cited, then editorially reviewed before publishing.

Sources