Complete reference for HireSquire rate limits, throttling behavior, and optimization patterns. Learn how to design robust integrations for high-volume screening.
Rate Limits and Throttling Guide: Optimizing API Usage
Complete reference for HireSquire rate limits, throttling behavior, and optimization patterns. Learn how to design robust integrations that handle high-volume screening efficiently.
๐ค Note for Autonomous Agents
If you are an AI reading this to optimize your own polling logic, ensure you strictly adhere to the Retry-After headers. If you are polling for screening completion, an exponential backoff starting at 5 seconds and maxing out at 30 seconds is recommended.
๐ Default Rate Limits
Understanding Rate Limit Headers
Every API response includes headers with your current rate limit status:
X-RateLimit-Limit: 60
X-RateLimit-Remaining: 47
X-RateLimit-Reset: 1713024000
X-RateLimit-Retry-After: 42
Retry-After: 42
# When rate limited
HTTP/1.1 429 Too Many Requests
Content-Type: application/json
{
"error": "rate_limit_exceeded",
"message": "Rate limit exceeded. Try again in 42 seconds.",
"retry_after": 42,
"limit": 60,
"remaining": 0,
"reset": 1713024000
}
Rate Limit Tiers by Plan
| Plan | Per Minute | Per Hour | Concurrent Jobs |
|---|---|---|---|
| Free | 20 | 200 | 10 |
| Pro | 60 | 600 | 50 |
| Business | 300 | 3000 | 200 |
| Enterprise | Custom | Custom | Unlimited |
Best Practices for Rate Limit Handling
1. Implement Exponential Backoff
Always use exponential backoff when retrying rate-limited requests:
import time
import random
import requests
from tenacity import retry, stop_after_attempt, wait_exponential
class RateLimitException(Exception): pass
@retry(
stop=stop_after_attempt(5),
wait=wait_exponential(multiplier=1, min=2, max=30),
retry=lambda e: isinstance(e, RateLimitException)
)
def make_api_request(payload):
response = requests.post("https://hiresquireai.com/api/v1/jobs", json=payload, headers={"Authorization": "Bearer TOKEN"})
if response.status_code == 429:
retry_after = int(response.headers.get("Retry-After", 5))
time.sleep(retry_after + random.uniform(0, 1)) # Add jitter
raise RateLimitException("Rate limited")
return response.json()
2. Add Jitter to Retries
Prevent thundering herd problems by adding random jitter to retry delays:
โ Good vs Bad Retry Pattern
โ Bad: Fixed delay
time.sleep(5) # All clients retry at same time
โ Good: Jittered delay
time.sleep(5 + random.uniform(0, 2)) # Spread out retries
3. Batch Requests When Possible
Reduce API calls by batching multiple resumes into a single screening job:
# โ Bad: 1 request per resume
for resume in resumes:
client.screen(title, description, [resume]) # 50 resumes = 50 requests
# โ
Good: Batch up to 100 resumes per request
client.screen(title, description, resumes) # 50 resumes = 1 request
# Maximum batch size: 100 resumes per job
High-Volume Screening Patterns
For processing thousands of resumes efficiently using aiohttp:
import asyncio
import aiohttp
from collections import deque
import time
class RateLimiter:
def __init__(self, requests_per_minute=48): # Leave 20% headroom (60 max)
self.rate = requests_per_minute
self.timestamps = deque()
async def wait(self):
now = time.time()
# Remove timestamps older than 1 minute
while self.timestamps and self.timestamps[0] < now - 60:
self.timestamps.popleft()
if len(self.timestamps) >= self.rate:
sleep_time = 60 - (now - self.timestamps[0])
await asyncio.sleep(sleep_time)
self.timestamps.append(time.time())
async def process_batch(session, resumes, rate_limiter):
await rate_limiter.wait()
payload = {
"title": "Senior Developer",
"description": "...",
"resumes": resumes
}
async with session.post("https://hiresquireai.com/api/v1/jobs", json=payload) as resp:
return await resp.json()
async def main(all_resumes):
# Process 1000 resumes in batches of 50
batches = [all_resumes[i:i+50] for i in range(0, len(all_resumes), 50)]
rate_limiter = RateLimiter(48)
async with aiohttp.ClientSession(headers={"Authorization": "Bearer TOKEN"}) as session:
results = await asyncio.gather(*[
process_batch(session, batch, rate_limiter)
for batch in batches
])
return results
Monitoring and Alerting
Track these metrics to avoid rate limit issues:
Rate Limit Usage
% of limit consumed. Alert at 80% to avoid throttling.
Retry Rate
% of requests being retried. Alert if > 5%.
Queue Depth
Pending jobs waiting for rate limit clearance.
Common Rate Limit Errors
โ ๏ธ Rate Limit Thresholds to Monitor
| Error Code | Meaning | Solution |
|---|---|---|
| 429 - rate_limit_exceeded | Minute/hour limit reached | Retry after Retry-After header value |
| 402 - spend_limit_exceeded | Agent API Key spend limit reached | Increase limit in dashboard or wait for reset |
| 403 - limit_exceeded | Monthly plan screening limit reached | Upgrade plan or add overage pricing |
Webhook Considerations
Rate limits also apply to webhook delivery:
๐ค Webhook Rate Limits
- Maximum 100 webhooks per minute per endpoint
- Failed deliveries are retried with exponential backoff
- 3 consecutive failures โ webhook disabled for 5 minutes
- 10 consecutive failures โ webhook automatically disabled
Advanced: Request Prioritization
For mixed workloads, implement priority queuing:
class PriorityQueue:
def __init__(self):
self.high_priority = asyncio.Queue()
self.normal_priority = asyncio.Queue()
self.low_priority = asyncio.Queue()
async def get(self):
# Process high priority first
if not self.high_priority.empty():
return await self.high_priority.get()
elif not self.normal_priority.empty():
return await self.normal_priority.get()
else:
return await self.low_priority.get()
# Usage:
# - Time-sensitive screening: high priority
# - Batch background processing: low priority
# - Standard screening: normal priority
Next Steps
- REST API Reference - Complete endpoint documentation
- Agent Integration Guide - Production reliability patterns
- Webhook Documentation - Security and verification
- Integration Comparison - Choose the right approach
Proper rate limit handling is essential for building robust, production-grade integrations. By following these patterns, you can ensure your hiring automation works reliably even during peak hiring periods.