Rate Limits
To prevent abuse, Gemini imposes rate limits on incoming requests as described in the Gemini API Agreement.
For public API entry points, we limit requests to 120 requests per minute, and recommend that you do not exceed 1 request per second.
For private API entry points, we limit requests to 600 requests per minute, and recommend that you not exceed 5 requests per second.
How are rate limits applied?
When requests are received at a rate exceeding X requests per minute, we offer a "burst" rate of five additional requests that are queued but their processing is delayed until the request rate falls below the defined rate.
When you exceed the rate limit for a group of endpoints, you will receive a 429
Too Many Requests HTTP status response until your request rate drops back under the required limit.
Example: 600 requests per minute is ten requests per second, meaning one request every 0.1 second.
If you send 20 requests in close succession over two seconds, then you could expect:
- the first ten requests are processed
- the next five requests are queued
- the next five requests receive a 429 response, meaning the rate limit for this group of endpoints has been exceeded
- any further incoming request immediately receive a 429 response
- after a short period of inactivity, the five queued requests are processed
- following that, incoming requests begin to be processed at the normal rate again