Google has unveiled a preview of Gemini 2.5 Flash-Lite, a reasoning mannequin optimized for price and pace, and introduced that two different Gemini fashions, Gemini 2.5 Professional and Gemini 2.5 Flash, are actually usually out there.
Google made the bulletins June 17. Gemini 2.5 fashions are considering fashions, able to reasoning by ideas earlier than responding, leading to enhanced efficiency and improved accuracy, Google stated.
Gemini 2.5 Flash-Lite has the bottom price and lowest latency within the Gemini 2.5 mannequin household, Google stated. Flash-Lite is a reasoning mannequin that permits dynamic management of the considering finances through an API parameter, however as a result of Flash-Lite is optimized for low latency and low price, considering is turned off by default. This mannequin is “nice” for prime throughput duties comparable to classification or summarization at scale, Google stated. Constructed as an improve to Gemini 1.5 Flash and a pair of.0 Flash fashions, Gemini 2.5 Flash-Lite presents higher efficiency throughout most evals and decrease time to the primary token, whereas additionally attaining increased tokens per second decode, based on Google. Every Gemini 2.5 mannequin has management over the considering finances, giving builders the power to decide on when and the way a lot the mannequin thinks earlier than producing a response.
