Optimizing API Usage: Deduplication, Batch Endpoints, and Request Efficiency

When you are building a production pipeline on top of the Sorsa API - scraping followers, collecting tweets, enriching profiles, verifying campaign actions - the difference between a well-architected integration and a naive one can be 10x in request volume for the same output. This guide covers the patterns and techniques that minimize wasted API calls while maximizing the data you extract from each request.

Principle 1: Every Tweet Response Already Contains User Data

This is the single most important efficiency insight for working with Sorsa API. Every endpoint that returns tweets - /search-tweets, /user-tweets, /list-tweets, /comments, /quotes, /mentions - embeds the complete author profile inside each tweet object.

{
  "tweets": [
    {
      "id": "2029914600217473314",
      "full_text": "Great thread on API design patterns...",
      "likes_count": 142,
      "user": {
        "id": "1422280682240450563",
        "username": "dev_sarah",
        "display_name": "Sarah Chen",
        "description": "Staff engineer @stripe. APIs, distributed systems.",
        "followers_count": 12400,
        "followings_count": 890,
        "tweets_count": 4521,
        "verified": true,
        "location": "San Francisco",
        "created_at": "Mon Feb 01 09:15:22 +0000 2021"
      }
    }
  ]
}

The user object inside each tweet contains the same data that a dedicated /info call would return: ID, username, display name, bio, follower count, following count, tweet count, verified status, location, creation date, profile image, and more. What this means in practice: If you search for tweets about a topic and want to build a list of users discussing it, you do not need to make separate /info calls for each author. The user data is already in the response. Extract it directly:

# Collect unique users from a tweet search - zero extra API calls
seen_ids = set()
unique_users = []

for tweet in search_results:
    user = tweet["user"]
    if user["id"] not in seen_ids:
        seen_ids.add(user["id"])
        unique_users.append(user)

print(f"Found {len(unique_users)} unique users from {len(search_results)} tweets")

This one pattern can eliminate hundreds or thousands of unnecessary /info requests in a typical workflow.

Principle 2: Use Batch Endpoints When They Exist

Sorsa provides batch variants for the most common lookup operations. Using them instead of looping through the single-item endpoint reduces your request count dramatically.

`/info-batch` Instead of Looping `/info`

If you need profile data for multiple accounts, do not call /info in a loop. Use /info-batch to fetch them all at once:

# Inefficient: 10 accounts = 10 requests
for handle in handles:
    resp = requests.get(f"https://api.sorsa.io/v3/info?username={handle}", ...)

# Efficient: 10 accounts = 1 request
resp = requests.get(
    "https://api.sorsa.io/v3/info-batch",
    headers={"ApiKey": API_KEY},
    params={"usernames": ["NASA", "SpaceX", "Tesla", "OpenAI", "stripe",
                           "shopify", "vercel", "github", "notion", "linear"]},
)
profiles = resp.json().get("users", [])

Savings: 10 accounts = 1 request instead of 10. The reduction scales linearly.

`/tweet-info-bulk` Instead of Looping `/tweet-info`

If you have a list of tweet IDs (from an archive, a mention export, or a list of bookmarked links) and need their current engagement metrics and author data, use the bulk endpoint to hydrate up to 100 tweets in a single request:

# Inefficient: 100 tweets = 100 requests
for link in tweet_links:
    resp = requests.post("https://api.sorsa.io/v3/tweet-info", json={"tweet_link": link}, ...)

# Efficient: 100 tweets = 1 request
resp = requests.post(
    "https://api.sorsa.io/v3/tweet-info-bulk",
    headers={"ApiKey": API_KEY, "Content-Type": "application/json"},
    json={
        "tweet_links": [
            "https://x.com/user/status/111111",
            "https://x.com/user/status/222222",
            # ... up to 100 links
        ]
    },
)
tweets = resp.json().get("tweets", [])

Savings: 100 tweets = 1 request instead of 100. This is a 99% reduction.

Principle 3: Use `/list-tweets` Instead of Multiple `/user-tweets` Calls

If you are monitoring or collecting recent tweets from multiple accounts, do not poll each one individually. Add them to an X List and make a single /list-tweets request that returns the combined recent activity from all members.

# Inefficient: monitoring 30 accounts = 30 requests per poll cycle
for handle in competitor_handles:
    resp = requests.post("https://api.sorsa.io/v3/user-tweets", json={"link": f"https://x.com/{handle}"}, ...)

# Efficient: 1 request covers all 30 accounts
resp = requests.get(
    f"https://api.sorsa.io/v3/list-tweets?list_id={LIST_ID}",
    headers={"ApiKey": API_KEY},
)

Savings: 30 accounts per poll cycle = 1 request instead of 30. Over a month of 10-second polling, that is 259,200 requests instead of 7,776,000. This is covered in detail in the Real-Time Monitoring Guide.

Principle 4: Use `/info` as Your Universal Resolver

The /info endpoint accepts a username, user ID, or profile link - any format. It returns the full profile object including the permanent User ID. This makes it the most flexible endpoint for normalizing mixed inputs. If you receive account references in different formats and need to resolve all of them to full profiles, use /info once per account instead of calling a conversion endpoint (/username-to-id, /link-to-id) followed by a separate /info call:

# Inefficient: 2 requests per account
uid = username_to_id("stripe")          # request 1
profile = get_info(user_id=uid)         # request 2

# Efficient: 1 request per account
profile = requests.get(
    "https://api.sorsa.io/v3/info",
    headers={"ApiKey": API_KEY},
    params={"username": "stripe"},       # accepts username, user_id, or user_link
).json()
# profile already contains the user ID, plus everything else

When to use conversion endpoints separately: Only when you specifically need just the ID or just the handle, and do not need the full profile. For example, converting a list of 1,000 handles to IDs for database storage - if you already have the profile data from a prior step, use /username-to-id directly instead of fetching the full profile again.

Principle 5: Deduplicate in Your Database Layer

When you collect data from multiple sources - search results, follower lists, mention feeds, timeline scrapes - the same user will appear many times. Use the permanent User ID as your deduplication key and update existing records rather than inserting duplicates.

import sqlite3

def upsert_user(db, user):
    """Insert or update a user record keyed by permanent User ID."""
    db.execute("""
        INSERT INTO users (user_id, username, display_name, description,
                          followers_count, tweets_count, verified, updated_at)
        VALUES (?, ?, ?, ?, ?, ?, ?, datetime('now'))
        ON CONFLICT(user_id) DO UPDATE SET
            username = excluded.username,
            display_name = excluded.display_name,
            description = excluded.description,
            followers_count = excluded.followers_count,
            tweets_count = excluded.tweets_count,
            verified = excluded.verified,
            updated_at = datetime('now')
    """, (
        user["id"], user["username"], user.get("display_name", ""),
        user.get("description", ""), user.get("followers_count", 0),
        user.get("tweets_count", 0), user.get("verified", False),
    ))
    db.commit()


# Every time you encounter a user in any API response, upsert:
for tweet in search_results:
    upsert_user(db, tweet["user"])
    # The user's profile data stays fresh without separate /info calls

This means your user table is continuously updated with the latest profile data every time that user appears in any API response - search results, mentions, follower lists, comments, quotes. You never need to schedule a separate “profile refresh” job.

Principle 6: Avoid Re-fetching Data You Already Have

When building multi-step pipelines, pass data forward between steps instead of re-fetching it. Example: Audience geography analysis. The workflow is: (1) fetch followers, (2) look up country via /about for each follower. After step 1, you already have the full profile object for every follower (bio, follower count, verified status). Do not call /info again in step 2 - you only need /about for the country data that is not in the standard profile. Example: Campaign verification. When verifying follow + retweet + comment for a participant, the /check-comment response includes the full tweet object of the comment (when commented: true). If you need to analyze the comment text for quality, extract it from the verification response - do not make a separate /search-tweets or /comments call to find it again. Example: Building a user list from search results. If you searched for tweets and collected 500 unique users from the embedded user objects, and now want to find which of them have 10K+ followers - filter the data you already collected. Do not call /info for each of the 500 users.

Quick Reference: Choosing the Right Endpoint

You have…	You need…	Use this	Not this
A list of handles	Full profiles for all of them	`GET /info-batch` (one request)	`/info` in a loop
A list of tweet URLs	Full tweet data + author profiles	`POST /tweet-info-bulk` (up to 100/request)	`/tweet-info` in a loop
30 accounts to monitor	Recent tweets from all of them	`GET /list-tweets` (one request)	`/user-tweets` x 30
A handle, need the full profile	ID + bio + counts + everything	`GET /info`	`/username-to-id` then `/info`
Tweet search results	Author profiles	Extract `tweet["user"]` from response	`/info` for each author
Follower list	Profile data per follower	Already in `/followers` response	`/info` for each follower

Estimating Your Request Budget

When planning a project, estimate your total request count before you start:

Task	Efficient approach	Requests needed
Profile data for 50 accounts	`/info-batch`	~1
10,000 followers of an account	`/followers` with pagination (200/page)	50
Country data for those 10,000 followers	`/about` for each	10,000
100 tweets hydrated with current metrics	`/tweet-info-bulk`	1
Monitor 50 accounts every 10 seconds for a day	`/list-tweets`	8,640
Verify 5 tasks for 1,000 campaign participants	5 checks per user	5,000

The total for a typical competitive intelligence + campaign verification workflow might be: 50 (followers) + 10,000 (geography) + 1 (bulk tweets) + 8,640 (monitoring) + 5,000 (campaign) = ~23,700 requests. At 20 req/s, that is about 20 minutes of execution time.

Next Steps

Rate Limits and Best Practices - handling 429 errors and optimizing throughput.
Pagination - cursor-based pagination patterns for large-scale extraction.
API Reference - full specification for /info-batch, /tweet-info-bulk, and all 38 endpoints.

Getting started

Core Mechanics

Solution Guides

Optimizing API Usage

Optimizing API Usage: Deduplication, Batch Endpoints, and Request Efficiency

Principle 1: Every Tweet Response Already Contains User Data

Principle 2: Use Batch Endpoints When They Exist

`/info-batch` Instead of Looping `/info`

`/tweet-info-bulk` Instead of Looping `/tweet-info`

Principle 3: Use `/list-tweets` Instead of Multiple `/user-tweets` Calls

Principle 4: Use `/info` as Your Universal Resolver

Principle 5: Deduplicate in Your Database Layer

Principle 6: Avoid Re-fetching Data You Already Have

Quick Reference: Choosing the Right Endpoint

Estimating Your Request Budget

Next Steps

Getting started

Core Mechanics

Solution Guides

​Optimizing API Usage: Deduplication, Batch Endpoints, and Request Efficiency

​Principle 1: Every Tweet Response Already Contains User Data

​Principle 2: Use Batch Endpoints When They Exist

​/info-batch Instead of Looping /info

​/tweet-info-bulk Instead of Looping /tweet-info

​Principle 3: Use /list-tweets Instead of Multiple /user-tweets Calls

​Principle 4: Use /info as Your Universal Resolver

​Principle 5: Deduplicate in Your Database Layer

​Principle 6: Avoid Re-fetching Data You Already Have

​Quick Reference: Choosing the Right Endpoint

​Estimating Your Request Budget

​Next Steps

Optimizing API Usage: Deduplication, Batch Endpoints, and Request Efficiency

Principle 1: Every Tweet Response Already Contains User Data

Principle 2: Use Batch Endpoints When They Exist

`/info-batch` Instead of Looping `/info`

`/tweet-info-bulk` Instead of Looping `/tweet-info`

Principle 3: Use `/list-tweets` Instead of Multiple `/user-tweets` Calls

Principle 4: Use `/info` as Your Universal Resolver

Principle 5: Deduplicate in Your Database Layer

Principle 6: Avoid Re-fetching Data You Already Have

Quick Reference: Choosing the Right Endpoint

Estimating Your Request Budget

Next Steps