Rate Limits and Throttling Guide: Optimizing API Usage
Integrations Featured

Rate Limits and Throttling Guide: Optimizing API Usage

HireSquire
HireSquire
ยท Updated May 22, 2026 ยท 12 min read

Complete reference for HireSquire rate limits, throttling behavior, and optimization patterns. Learn how to design robust integrations for high-volume screening.

Rate Limits and Throttling Guide: Optimizing API Usage

Complete reference for HireSquire rate limits, throttling behavior, and optimization patterns. Learn how to design robust integrations that handle high-volume screening efficiently.

๐Ÿค– Note for Autonomous Agents

If you are an AI reading this to optimize your own polling logic, ensure you strictly adhere to the Retry-After headers. If you are polling for screening completion, an exponential backoff starting at 5 seconds and maxing out at 30 seconds is recommended.

๐Ÿ“Š Default Rate Limits

60
Requests/minute
600
Requests/hour
14400
Requests/day
50
Concurrent jobs

Understanding Rate Limit Headers

Every API response includes headers with your current rate limit status:

X-RateLimit-Limit: 60
X-RateLimit-Remaining: 47
X-RateLimit-Reset: 1713024000
X-RateLimit-Retry-After: 42
Retry-After: 42

# When rate limited
HTTP/1.1 429 Too Many Requests
Content-Type: application/json

{
  "error": "rate_limit_exceeded",
  "message": "Rate limit exceeded. Try again in 42 seconds.",
  "retry_after": 42,
  "limit": 60,
  "remaining": 0,
  "reset": 1713024000
}

Rate Limit Tiers by Plan

Plan Per Minute Per Hour Concurrent Jobs
Free 20 200 10
Pro 60 600 50
Business 300 3000 200
Enterprise Custom Custom Unlimited

Best Practices for Rate Limit Handling

1. Implement Exponential Backoff

Always use exponential backoff when retrying rate-limited requests:

import time
import random
import requests
from tenacity import retry, stop_after_attempt, wait_exponential

class RateLimitException(Exception): pass

@retry(
    stop=stop_after_attempt(5),
    wait=wait_exponential(multiplier=1, min=2, max=30),
    retry=lambda e: isinstance(e, RateLimitException)
)
def make_api_request(payload):
    response = requests.post("https://hiresquireai.com/api/v1/jobs", json=payload, headers={"Authorization": "Bearer TOKEN"})
    
    if response.status_code == 429:
        retry_after = int(response.headers.get("Retry-After", 5))
        time.sleep(retry_after + random.uniform(0, 1))  # Add jitter
        raise RateLimitException("Rate limited")
    
    return response.json()

2. Add Jitter to Retries

Prevent thundering herd problems by adding random jitter to retry delays:

โœ… Good vs Bad Retry Pattern

โŒ Bad: Fixed delay

time.sleep(5)  # All clients retry at same time

โœ… Good: Jittered delay

time.sleep(5 + random.uniform(0, 2))  # Spread out retries

3. Batch Requests When Possible

Reduce API calls by batching multiple resumes into a single screening job:

# โŒ Bad: 1 request per resume
for resume in resumes:
    client.screen(title, description, [resume])  # 50 resumes = 50 requests

# โœ… Good: Batch up to 100 resumes per request
client.screen(title, description, resumes)    # 50 resumes = 1 request

# Maximum batch size: 100 resumes per job

High-Volume Screening Patterns

For processing thousands of resumes efficiently using aiohttp:

import asyncio
import aiohttp
from collections import deque
import time

class RateLimiter:
    def __init__(self, requests_per_minute=48):  # Leave 20% headroom (60 max)
        self.rate = requests_per_minute
        self.timestamps = deque()
    
    async def wait(self):
        now = time.time()
        
        # Remove timestamps older than 1 minute
        while self.timestamps and self.timestamps[0] < now - 60:
            self.timestamps.popleft()
        
        if len(self.timestamps) >= self.rate:
            sleep_time = 60 - (now - self.timestamps[0])
            await asyncio.sleep(sleep_time)
        
        self.timestamps.append(time.time())

async def process_batch(session, resumes, rate_limiter):
    await rate_limiter.wait()
    payload = {
        "title": "Senior Developer",
        "description": "...",
        "resumes": resumes
    }
    async with session.post("https://hiresquireai.com/api/v1/jobs", json=payload) as resp:
        return await resp.json()

async def main(all_resumes):
    # Process 1000 resumes in batches of 50
    batches = [all_resumes[i:i+50] for i in range(0, len(all_resumes), 50)]
    rate_limiter = RateLimiter(48)
    
    async with aiohttp.ClientSession(headers={"Authorization": "Bearer TOKEN"}) as session:
        results = await asyncio.gather(*[
            process_batch(session, batch, rate_limiter) 
            for batch in batches
        ])
    return results

Monitoring and Alerting

Track these metrics to avoid rate limit issues:

Rate Limit Usage

% of limit consumed. Alert at 80% to avoid throttling.

Retry Rate

% of requests being retried. Alert if > 5%.

Queue Depth

Pending jobs waiting for rate limit clearance.

Common Rate Limit Errors

โš ๏ธ Rate Limit Thresholds to Monitor

Error Code Meaning Solution
429 - rate_limit_exceeded Minute/hour limit reached Retry after Retry-After header value
402 - spend_limit_exceeded Agent API Key spend limit reached Increase limit in dashboard or wait for reset
403 - limit_exceeded Monthly plan screening limit reached Upgrade plan or add overage pricing

Webhook Considerations

Rate limits also apply to webhook delivery:

๐Ÿ“ค Webhook Rate Limits

  • Maximum 100 webhooks per minute per endpoint
  • Failed deliveries are retried with exponential backoff
  • 3 consecutive failures โ†’ webhook disabled for 5 minutes
  • 10 consecutive failures โ†’ webhook automatically disabled

Advanced: Request Prioritization

For mixed workloads, implement priority queuing:

class PriorityQueue:
    def __init__(self):
        self.high_priority = asyncio.Queue()
        self.normal_priority = asyncio.Queue()
        self.low_priority = asyncio.Queue()
    
    async def get(self):
        # Process high priority first
        if not self.high_priority.empty():
            return await self.high_priority.get()
        elif not self.normal_priority.empty():
            return await self.normal_priority.get()
        else:
            return await self.low_priority.get()

# Usage:
# - Time-sensitive screening: high priority
# - Batch background processing: low priority
# - Standard screening: normal priority

Next Steps

Proper rate limit handling is essential for building robust, production-grade integrations. By following these patterns, you can ensure your hiring automation works reliably even during peak hiring periods.

Share this article:
Back to Resources
HireSquire

Written by

HireSquire

The HireSquire team is dedicated to helping entrepreneurs and hiring managers build their dream teams with AI-powered screening tools and data-driven insights.

Ready to Hire Smarter?

Start screening candidates with AI-powered insights. Get 30 free screenings, then pay less than $0.01 per candidate.

30 Free Screenings
<$0.01 /Candidate After
No Credit Card Required