Historical Data

Sorsa API provides full-archive access to public X (formerly Twitter) data back to March 2006. Historical retrieval uses the same endpoints, authentication, and pagination as recent data. There is no separate “full-archive” tier, no enterprise contract, and no time-window restriction on search. This page covers the two endpoints used for historical work, what the platform exposes versus what it doesn’t, and the patterns that hold up at scale.

Note: For a fuller walkthrough with a method comparison table, a CSV export pipeline, and additional code examples, see Historical Twitter Data: How to Search Old Tweets via API on the blog.

Endpoints

Endpoint	Use case	Pagination	Page size
`POST /v3/search-tweets`	Keyword-based archive search with date and engagement filters	`next_cursor`	~20 tweets
`POST /v3/user-tweets`	A specific account’s complete posting history	`next_cursor`	~20 tweets

Both accept the X search operator set, including since:, until:, from:, to:, min_faves:, min_retweets:, lang:, and filter: directives.

Keyword Archive Search

Use /search-tweets when you need every tweet matching a query within a date window, across all users. Pass the date-bounded query in the JSON body:

import requests, time

API_KEY = "YOUR_API_KEY"
URL = "https://api.sorsa.io/v3/search-tweets"

def search_archive(query, max_pages=50):
    all_tweets, next_cursor = [], None
    for _ in range(max_pages):
        body = {"query": query, "order": "latest"}
        if next_cursor:
            body["next_cursor"] = next_cursor

        resp = requests.post(
            URL,
            headers={"ApiKey": API_KEY, "Content-Type": "application/json"},
            json=body,
        )
        resp.raise_for_status()
        data = resp.json()

        all_tweets.extend(data.get("tweets", []))
        next_cursor = data.get("next_cursor")
        if not next_cursor:
            break
        time.sleep(0.1)
    return all_tweets


tweets = search_archive('"climate change" since:2015-06-01 until:2015-12-31 lang:en min_faves:10')

order accepts "latest" (chronological, default for time-bounded queries) or "popular" (engagement-ranked, better for content research).

Full Account Timeline

Use /user-tweets when you want the complete posting history of one account, oldest to newest, without a 3,200-tweet cap.

resp = requests.post(
    "https://api.sorsa.io/v3/user-tweets",
    headers={"ApiKey": API_KEY, "Content-Type": "application/json"},
    json={"link": "https://x.com/naval"},
)

Paginate with next_cursor until it returns null. The endpoint walks the timeline in reverse chronological order.

What You Can Retrieve

Every historical tweet returns with the same field set as a recent one:

Full text (no truncation, no URL replacement)
All six engagement metrics: likes_count, retweet_count, reply_count, quote_count, view_count, bookmark_count
Embedded user object with the author’s full profile
entities array with media URLs (photos, videos, GIFs) and link previews
Conversation metadata: conversation_id_str, in_reply_to_tweet_id, is_reply, is_quote_status
Language tag (lang)

See Response Format for the complete field reference.

Platform-Level Limits

These are X-side restrictions, not Sorsa-specific. No public API can work around them.

Deleted tweets are removed from X’s search index and cannot be retrieved.
Protected accounts are excluded from all public search and timeline results.
Profile snapshots are not historical. A tweet from 2014 returns the author’s current bio, username, and follower count, not the 2014 values.
Engagement metrics are not snapshots. Like, retweet, and view counts reflect current totals, not the counts as they stood on a specific past date. If you need point-in-time engagement, ingest tweets in real time via Real-Time Monitoring and store the metrics yourself.

Best Practices

Chunk Large Date Ranges

A single query across a multi-year window has no clean retry path and no per-period auditability. Split by month for year-scale collections, by week for volatile event windows.

def monthly_chunks(year):
    out = []
    for month in range(1, 13):
        since = f"{year}-{month:02d}-01"
        nm = month + 1 if month < 12 else 1
        ny = year if month < 12 else year + 1
        until = f"{ny}-{nm:02d}-01"
        out.append((since, until))
    return out

for since, until in monthly_chunks(2020):
    tweets = search_archive(f'bitcoin since:{since} until:{until} lang:en min_faves:50')

Filter Retweet Noise

Historical popular searches return waves of native retweets that bury original content. Add -filter:nativeretweets for sentiment, opinion, or content-pattern research. Use -filter:retweets to also exclude legacy RT @user: retweets.

Pair Engagement and Date Filters

Combining since: / until: with min_faves: or min_retweets: cuts noise and request volume sharply. Example:

"product launch" since:2022-03-01 until:2022-03-31 min_faves:100 -filter:retweets lang:en

Split Global Topics by Language

For worldwide events, separate queries per lang: give cleaner per-locale datasets than mixing languages.

Paginate Until the Cursor Is Empty

Terminate only when next_cursor is null, empty, or absent. Don’t stop early on small page sizes. Full pattern in Pagination.

Search Tweets: endpoint reference for /search-tweets
Search Operators: full operator dictionary
Pagination: cursor-based pagination details
Real-Time Monitoring: pair with historical backfill for forward-looking ingestion
Track Mentions: historical mention tracking for any handle
Optimizing API Usage: reduce request count on large collections

​Historical Data

​Endpoints

​Keyword Archive Search

​Full Account Timeline

​What You Can Retrieve

​Platform-Level Limits

​Best Practices

​Chunk Large Date Ranges

​Filter Retweet Noise

​Pair Engagement and Date Filters

​Split Global Topics by Language

​Paginate Until the Cursor Is Empty

​Related