Complete reference for HireSquire rate limits, throttling behavior, and optimization patterns. Learn how to design robust integrations for high-volume screening.

Rate Limits and Throttling Guide: Optimizing API Usage

Complete reference for HireSquire rate limits, throttling behavior, and optimization patterns. Learn how to design robust integrations that handle high-volume screening efficiently.

🤖 Note for Autonomous Agents

If you are an AI reading this to optimize your own polling logic, ensure you strictly adhere to the Retry-After headers. If you are polling for screening completion, an exponential backoff starting at 5 seconds and maxing out at 30 seconds is recommended.

📊 Default Rate Limits

Requests/minute

600

Requests/hour

14400

Requests/day

Concurrent jobs

Understanding Rate Limit Headers

Every API response includes headers with your current rate limit status:

X-RateLimit-Limit: 60
X-RateLimit-Remaining: 47
X-RateLimit-Reset: 1713024000
X-RateLimit-Retry-After: 42
Retry-After: 42

# When rate limited
HTTP/1.1 429 Too Many Requests
Content-Type: application/json

{
  "error": "rate_limit_exceeded",
  "message": "Rate limit exceeded. Try again in 42 seconds.",
  "retry_after": 42,
  "limit": 60,
  "remaining": 0,
  "reset": 1713024000
}

Rate Limit Tiers by Plan

Plan	Per Minute	Per Hour	Concurrent Jobs
Free	20	200	10
Pro	60	600	50
Business	300	3000	200
Enterprise	Custom	Custom	Unlimited

Best Practices for Rate Limit Handling

1. Implement Exponential Backoff

Always use exponential backoff when retrying rate-limited requests:

import time
import random
import requests
from tenacity import retry, stop_after_attempt, wait_exponential

class RateLimitException(Exception): pass

@retry(
    stop=stop_after_attempt(5),
    wait=wait_exponential(multiplier=1, min=2, max=30),
    retry=lambda e: isinstance(e, RateLimitException)
)
def make_api_request(payload):
    response = requests.post("https://hiresquireai.com/api/v1/jobs", json=payload, headers={"Authorization": "Bearer TOKEN"})
    
    if response.status_code == 429:
        retry_after = int(response.headers.get("Retry-After", 5))
        time.sleep(retry_after + random.uniform(0, 1))  # Add jitter
        raise RateLimitException("Rate limited")
    
    return response.json()

2. Add Jitter to Retries

Prevent thundering herd problems by adding random jitter to retry delays:

✅ Good vs Bad Retry Pattern

❌ Bad: Fixed delay

time.sleep(5)  # All clients retry at same time

✅ Good: Jittered delay

time.sleep(5 + random.uniform(0, 2))  # Spread out retries

3. Batch Requests When Possible

Reduce API calls by batching multiple resumes into a single screening job:

# ❌ Bad: 1 request per resume
for resume in resumes:
    client.screen(title, description, [resume])  # 50 resumes = 50 requests

# ✅ Good: Batch up to 100 resumes per request
client.screen(title, description, resumes)    # 50 resumes = 1 request

# Maximum batch size: 100 resumes per job

High-Volume Screening Patterns

For processing thousands of resumes efficiently using aiohttp:

import asyncio
import aiohttp
from collections import deque
import time

class RateLimiter:
    def __init__(self, requests_per_minute=48):  # Leave 20% headroom (60 max)
        self.rate = requests_per_minute
        self.timestamps = deque()
    
    async def wait(self):
        now = time.time()
        
        # Remove timestamps older than 1 minute
        while self.timestamps and self.timestamps[0] < now - 60:
            self.timestamps.popleft()
        
        if len(self.timestamps) >= self.rate:
            sleep_time = 60 - (now - self.timestamps[0])
            await asyncio.sleep(sleep_time)
        
        self.timestamps.append(time.time())

async def process_batch(session, resumes, rate_limiter):
    await rate_limiter.wait()
    payload = {
        "title": "Senior Developer",
        "description": "...",
        "resumes": resumes
    }
    async with session.post("https://hiresquireai.com/api/v1/jobs", json=payload) as resp:
        return await resp.json()

async def main(all_resumes):
    # Process 1000 resumes in batches of 50
    batches = [all_resumes[i:i+50] for i in range(0, len(all_resumes), 50)]
    rate_limiter = RateLimiter(48)
    
    async with aiohttp.ClientSession(headers={"Authorization": "Bearer TOKEN"}) as session:
        results = await asyncio.gather(*[
            process_batch(session, batch, rate_limiter) 
            for batch in batches
        ])
    return results

Monitoring and Alerting

Track these metrics to avoid rate limit issues:

Rate Limit Usage

% of limit consumed. Alert at 80% to avoid throttling.

Retry Rate

% of requests being retried. Alert if > 5%.

Queue Depth

Pending jobs waiting for rate limit clearance.

Common Rate Limit Errors

⚠️ Rate Limit Thresholds to Monitor

Error Code	Meaning	Solution
429 - rate_limit_exceeded	Minute/hour limit reached	Retry after Retry-After header value
402 - spend_limit_exceeded	Agent API Key spend limit reached	Increase limit in dashboard or wait for reset
403 - limit_exceeded	Monthly plan screening limit reached	Upgrade plan or add overage pricing

Webhook Considerations

Rate limits also apply to webhook delivery:

📤 Webhook Rate Limits

Maximum 100 webhooks per minute per endpoint
Failed deliveries are retried with exponential backoff
3 consecutive failures → webhook disabled for 5 minutes
10 consecutive failures → webhook automatically disabled

Advanced: Request Prioritization

For mixed workloads, implement priority queuing:

class PriorityQueue:
    def __init__(self):
        self.high_priority = asyncio.Queue()
        self.normal_priority = asyncio.Queue()
        self.low_priority = asyncio.Queue()
    
    async def get(self):
        # Process high priority first
        if not self.high_priority.empty():
            return await self.high_priority.get()
        elif not self.normal_priority.empty():
            return await self.normal_priority.get()
        else:
            return await self.low_priority.get()

# Usage:
# - Time-sensitive screening: high priority
# - Batch background processing: low priority
# - Standard screening: normal priority

Next Steps

REST API Reference - Complete endpoint documentation
Agent Integration Guide - Production reliability patterns
Webhook Documentation - Security and verification
Integration Comparison - Choose the right approach

Proper rate limit handling is essential for building robust, production-grade integrations. By following these patterns, you can ensure your hiring automation works reliably even during peak hiring periods.

Core Platform

Ecosystem

Rate Limits and Throttling Guide: Optimizing API Usage

Rate Limits and Throttling Guide: Optimizing API Usage

🤖 Note for Autonomous Agents

📊 Default Rate Limits

Understanding Rate Limit Headers

Rate Limit Tiers by Plan

Best Practices for Rate Limit Handling

1. Implement Exponential Backoff

2. Add Jitter to Retries

✅ Good vs Bad Retry Pattern

3. Batch Requests When Possible

High-Volume Screening Patterns

Monitoring and Alerting

Common Rate Limit Errors

⚠️ Rate Limit Thresholds to Monitor

Webhook Considerations

📤 Webhook Rate Limits

Advanced: Request Prioritization

Next Steps

HireSquire

Related Articles

REST API vs MCP vs CLI: Which HireSquire Integration Should You Use?

Building AI Hiring Agents: AutoGen & Custom GPTs

REST API Complete Reference: Build Custom Hiring Integrations with HireSquire

Ready to Hire Smarter?