Skip to main content

Historical Data

Sorsa API provides full-archive access to public X (formerly Twitter) data back to March 2006. Historical retrieval uses the same endpoints, authentication, and pagination as recent data. There is no separate “full-archive” tier, no enterprise contract, and no time-window restriction on search. This page covers the two endpoints used for historical work, what the platform exposes versus what it doesn’t, and the patterns that hold up at scale.
Note: For a fuller walkthrough with a method comparison table, a CSV export pipeline, and additional code examples, see Historical Twitter Data: How to Search Old Tweets via API on the blog.

Endpoints

EndpointUse casePaginationPage size
POST /v3/search-tweetsKeyword-based archive search with date and engagement filtersnext_cursor~20 tweets
POST /v3/user-tweetsA specific account’s complete posting historynext_cursor~20 tweets
Both accept the X search operator set, including since:, until:, from:, to:, min_faves:, min_retweets:, lang:, and filter: directives.
Use /search-tweets when you need every tweet matching a query within a date window, across all users. Pass the date-bounded query in the JSON body:
import requests, time

API_KEY = "YOUR_API_KEY"
URL = "https://api.sorsa.io/v3/search-tweets"

def search_archive(query, max_pages=50):
    all_tweets, next_cursor = [], None
    for _ in range(max_pages):
        body = {"query": query, "order": "latest"}
        if next_cursor:
            body["next_cursor"] = next_cursor

        resp = requests.post(
            URL,
            headers={"ApiKey": API_KEY, "Content-Type": "application/json"},
            json=body,
        )
        resp.raise_for_status()
        data = resp.json()

        all_tweets.extend(data.get("tweets", []))
        next_cursor = data.get("next_cursor")
        if not next_cursor:
            break
        time.sleep(0.1)
    return all_tweets


tweets = search_archive('"climate change" since:2015-06-01 until:2015-12-31 lang:en min_faves:10')
order accepts "latest" (chronological, default for time-bounded queries) or "popular" (engagement-ranked, better for content research).

Full Account Timeline

Use /user-tweets when you want the complete posting history of one account, oldest to newest, without a 3,200-tweet cap.
resp = requests.post(
    "https://api.sorsa.io/v3/user-tweets",
    headers={"ApiKey": API_KEY, "Content-Type": "application/json"},
    json={"link": "https://x.com/naval"},
)
Paginate with next_cursor until it returns null. The endpoint walks the timeline in reverse chronological order.

What You Can Retrieve

Every historical tweet returns with the same field set as a recent one:
  • Full text (no truncation, no URL replacement)
  • All six engagement metrics: likes_count, retweet_count, reply_count, quote_count, view_count, bookmark_count
  • Embedded user object with the author’s full profile
  • entities array with media URLs (photos, videos, GIFs) and link previews
  • Conversation metadata: conversation_id_str, in_reply_to_tweet_id, is_reply, is_quote_status
  • Language tag (lang)
See Response Format for the complete field reference.

Platform-Level Limits

These are X-side restrictions, not Sorsa-specific. No public API can work around them.
  • Deleted tweets are removed from X’s search index and cannot be retrieved.
  • Protected accounts are excluded from all public search and timeline results.
  • Profile snapshots are not historical. A tweet from 2014 returns the author’s current bio, username, and follower count, not the 2014 values.
  • Engagement metrics are not snapshots. Like, retweet, and view counts reflect current totals, not the counts as they stood on a specific past date. If you need point-in-time engagement, ingest tweets in real time via Real-Time Monitoring and store the metrics yourself.

Best Practices

Chunk Large Date Ranges

A single query across a multi-year window has no clean retry path and no per-period auditability. Split by month for year-scale collections, by week for volatile event windows.
def monthly_chunks(year):
    out = []
    for month in range(1, 13):
        since = f"{year}-{month:02d}-01"
        nm = month + 1 if month < 12 else 1
        ny = year if month < 12 else year + 1
        until = f"{ny}-{nm:02d}-01"
        out.append((since, until))
    return out

for since, until in monthly_chunks(2020):
    tweets = search_archive(f'bitcoin since:{since} until:{until} lang:en min_faves:50')

Filter Retweet Noise

Historical popular searches return waves of native retweets that bury original content. Add -filter:nativeretweets for sentiment, opinion, or content-pattern research. Use -filter:retweets to also exclude legacy RT @user: retweets.

Pair Engagement and Date Filters

Combining since: / until: with min_faves: or min_retweets: cuts noise and request volume sharply. Example:
"product launch" since:2022-03-01 until:2022-03-31 min_faves:100 -filter:retweets lang:en

Split Global Topics by Language

For worldwide events, separate queries per lang: give cleaner per-locale datasets than mixing languages.

Paginate Until the Cursor Is Empty

Terminate only when next_cursor is null, empty, or absent. Don’t stop early on small page sizes. Full pattern in Pagination.