Skip to main content

Optimizing API Usage: Deduplication, Batch Endpoints, and Request Efficiency

When you are building a production pipeline on top of the Sorsa API - scraping followers, collecting tweets, enriching profiles, verifying campaign actions - the difference between a well-architected integration and a naive one can be 10x in request volume for the same output. This guide covers the patterns and techniques that minimize wasted API calls while maximizing the data you extract from each request.

Principle 1: Every Tweet Response Already Contains User Data

This is the single most important efficiency insight for working with Sorsa API. Every endpoint that returns tweets - /search-tweets, /user-tweets, /list-tweets, /comments, /quotes, /mentions - embeds the complete author profile inside each tweet object.
{
  "tweets": [
    {
      "id": "2029914600217473314",
      "full_text": "Great thread on API design patterns...",
      "likes_count": 142,
      "user": {
        "id": "1422280682240450563",
        "username": "dev_sarah",
        "display_name": "Sarah Chen",
        "description": "Staff engineer @stripe. APIs, distributed systems.",
        "followers_count": 12400,
        "followings_count": 890,
        "tweets_count": 4521,
        "verified": true,
        "location": "San Francisco",
        "created_at": "Mon Feb 01 09:15:22 +0000 2021"
      }
    }
  ]
}
The user object inside each tweet contains the same data that a dedicated /info call would return: ID, username, display name, bio, follower count, following count, tweet count, verified status, location, creation date, profile image, and more. What this means in practice: If you search for tweets about a topic and want to build a list of users discussing it, you do not need to make separate /info calls for each author. The user data is already in the response. Extract it directly:
# Collect unique users from a tweet search - zero extra API calls
seen_ids = set()
unique_users = []

for tweet in search_results:
    user = tweet["user"]
    if user["id"] not in seen_ids:
        seen_ids.add(user["id"])
        unique_users.append(user)

print(f"Found {len(unique_users)} unique users from {len(search_results)} tweets")
This one pattern can eliminate hundreds or thousands of unnecessary /info requests in a typical workflow.

Principle 2: Use Batch Endpoints When They Exist

Sorsa provides batch variants for the most common lookup operations. Using them instead of looping through the single-item endpoint reduces your request count dramatically.

/info-batch Instead of Looping /info

If you need profile data for multiple accounts, do not call /info in a loop. Use /info-batch to fetch them all at once:
# Inefficient: 10 accounts = 10 requests
for handle in handles:
    resp = requests.get(f"https://api.sorsa.io/v3/info?username={handle}", ...)

# Efficient: 10 accounts = 1 request
resp = requests.get(
    "https://api.sorsa.io/v3/info-batch",
    headers={"ApiKey": API_KEY},
    params={"usernames": ["NASA", "SpaceX", "Tesla", "OpenAI", "stripe",
                           "shopify", "vercel", "github", "notion", "linear"]},
)
profiles = resp.json().get("users", [])
Savings: 10 accounts = 1 request instead of 10. The reduction scales linearly.

/tweet-info-bulk Instead of Looping /tweet-info

If you have a list of tweet IDs (from an archive, a mention export, or a list of bookmarked links) and need their current engagement metrics and author data, use the bulk endpoint to hydrate up to 100 tweets in a single request:
# Inefficient: 100 tweets = 100 requests
for link in tweet_links:
    resp = requests.post("https://api.sorsa.io/v3/tweet-info", json={"tweet_link": link}, ...)

# Efficient: 100 tweets = 1 request
resp = requests.post(
    "https://api.sorsa.io/v3/tweet-info-bulk",
    headers={"ApiKey": API_KEY, "Content-Type": "application/json"},
    json={
        "tweet_links": [
            "https://x.com/user/status/111111",
            "https://x.com/user/status/222222",
            # ... up to 100 links
        ]
    },
)
tweets = resp.json().get("tweets", [])
Savings: 100 tweets = 1 request instead of 100. This is a 99% reduction.

Principle 3: Use /list-tweets Instead of Multiple /user-tweets Calls

If you are monitoring or collecting recent tweets from multiple accounts, do not poll each one individually. Add them to an X List and make a single /list-tweets request that returns the combined recent activity from all members.
# Inefficient: monitoring 30 accounts = 30 requests per poll cycle
for handle in competitor_handles:
    resp = requests.post("https://api.sorsa.io/v3/user-tweets", json={"link": f"https://x.com/{handle}"}, ...)

# Efficient: 1 request covers all 30 accounts
resp = requests.get(
    f"https://api.sorsa.io/v3/list-tweets?list_id={LIST_ID}",
    headers={"ApiKey": API_KEY},
)
Savings: 30 accounts per poll cycle = 1 request instead of 30. Over a month of 10-second polling, that is 259,200 requests instead of 7,776,000. This is covered in detail in the Real-Time Monitoring Guide.

Principle 4: Use /info as Your Universal Resolver

The /info endpoint accepts a username, user ID, or profile link - any format. It returns the full profile object including the permanent User ID. This makes it the most flexible endpoint for normalizing mixed inputs. If you receive account references in different formats and need to resolve all of them to full profiles, use /info once per account instead of calling a conversion endpoint (/username-to-id, /link-to-id) followed by a separate /info call:
# Inefficient: 2 requests per account
uid = username_to_id("stripe")          # request 1
profile = get_info(user_id=uid)         # request 2

# Efficient: 1 request per account
profile = requests.get(
    "https://api.sorsa.io/v3/info",
    headers={"ApiKey": API_KEY},
    params={"username": "stripe"},       # accepts username, user_id, or user_link
).json()
# profile already contains the user ID, plus everything else
When to use conversion endpoints separately: Only when you specifically need just the ID or just the handle, and do not need the full profile. For example, converting a list of 1,000 handles to IDs for database storage - if you already have the profile data from a prior step, use /username-to-id directly instead of fetching the full profile again.

Principle 5: Deduplicate in Your Database Layer

When you collect data from multiple sources - search results, follower lists, mention feeds, timeline scrapes - the same user will appear many times. Use the permanent User ID as your deduplication key and update existing records rather than inserting duplicates.
import sqlite3

def upsert_user(db, user):
    """Insert or update a user record keyed by permanent User ID."""
    db.execute("""
        INSERT INTO users (user_id, username, display_name, description,
                          followers_count, tweets_count, verified, updated_at)
        VALUES (?, ?, ?, ?, ?, ?, ?, datetime('now'))
        ON CONFLICT(user_id) DO UPDATE SET
            username = excluded.username,
            display_name = excluded.display_name,
            description = excluded.description,
            followers_count = excluded.followers_count,
            tweets_count = excluded.tweets_count,
            verified = excluded.verified,
            updated_at = datetime('now')
    """, (
        user["id"], user["username"], user.get("display_name", ""),
        user.get("description", ""), user.get("followers_count", 0),
        user.get("tweets_count", 0), user.get("verified", False),
    ))
    db.commit()


# Every time you encounter a user in any API response, upsert:
for tweet in search_results:
    upsert_user(db, tweet["user"])
    # The user's profile data stays fresh without separate /info calls
This means your user table is continuously updated with the latest profile data every time that user appears in any API response - search results, mentions, follower lists, comments, quotes. You never need to schedule a separate “profile refresh” job.

Principle 6: Avoid Re-fetching Data You Already Have

When building multi-step pipelines, pass data forward between steps instead of re-fetching it. Example: Audience geography analysis. The workflow is: (1) fetch followers, (2) look up country via /about for each follower. After step 1, you already have the full profile object for every follower (bio, follower count, verified status). Do not call /info again in step 2 - you only need /about for the country data that is not in the standard profile. Example: Campaign verification. When verifying follow + retweet + comment for a participant, the /check-comment response includes the full tweet object of the comment (when commented: true). If you need to analyze the comment text for quality, extract it from the verification response - do not make a separate /search-tweets or /comments call to find it again. Example: Building a user list from search results. If you searched for tweets and collected 500 unique users from the embedded user objects, and now want to find which of them have 10K+ followers - filter the data you already collected. Do not call /info for each of the 500 users.

Quick Reference: Choosing the Right Endpoint

You have…You need…Use thisNot this
A list of handlesFull profiles for all of themGET /info-batch (one request)/info in a loop
A list of tweet URLsFull tweet data + author profilesPOST /tweet-info-bulk (up to 100/request)/tweet-info in a loop
30 accounts to monitorRecent tweets from all of themGET /list-tweets (one request)/user-tweets x 30
A handle, need the full profileID + bio + counts + everythingGET /info/username-to-id then /info
Tweet search resultsAuthor profilesExtract tweet["user"] from response/info for each author
Follower listProfile data per followerAlready in /followers response/info for each follower

Estimating Your Request Budget

When planning a project, estimate your total request count before you start:
TaskEfficient approachRequests needed
Profile data for 50 accounts/info-batch~1
10,000 followers of an account/followers with pagination (200/page)50
Country data for those 10,000 followers/about for each10,000
100 tweets hydrated with current metrics/tweet-info-bulk1
Monitor 50 accounts every 10 seconds for a day/list-tweets8,640
Verify 5 tasks for 1,000 campaign participants5 checks per user5,000
The total for a typical competitive intelligence + campaign verification workflow might be: 50 (followers) + 10,000 (geography) + 1 (bulk tweets) + 8,640 (monitoring) + 5,000 (campaign) = ~23,700 requests. At 20 req/s, that is about 20 minutes of execution time.

Next Steps