Skip to main content
Sorsa API delivers X (Twitter) data in a standardized JSON format. Our schema is optimized for LLMs, data indexing, and programmatic consumption. We prioritize data density, ensuring that related entities (like Authors and Quoted Tweets) are nested within a single response to minimize additional API calls.

🏗 Data Consistency & Types

  • Snowflake IDs as Strings: All Twitter IDs (id, conversation_id_str, etc.) are returned as strings. This prevents precision loss in environments like JavaScript, where 64-bit integers exceed the MAX_SAFE_INTEGER limit.
  • Time Formats: Timestamps use the ISO 8601 format (e.g., 2026-03-06T12:00:00Z), ensuring compatibility with standard database parsers and LLM context windows.
  • Booleans: Status flags (e.g., verified, is_reply) use strict boolean values for logic-gate efficiency.

👤 The Standard User Object

The User Object is the core of our User Data Extraction. It is returned by profile lookups and embedded within every tweet result.
FieldTypeSEO/LLM Context
idstringPermanent numeric X User ID (Snowflake).
usernamestringUnique alphanumeric handle (Screen Name).
display_namestringPublic-facing profile name.
descriptionstringAccount Bio/Description.
followers_countintegerTotal audience size.
verifiedbooleanAccount verification status (Blue/Gold/Gray).
can_dmbooleanDirect Message availability status.
created_atstringAccount registration timestamp.

🐦 The Tweet Object Schema

Our Tweet Objects provide a comprehensive snapshot of engagement and content. For Twitter Sentiment Analysis or Engagement Tracking, these fields provide the necessary raw data.

Engagement Metrics

  • likes_count, retweet_count, reply_count, quote_count: Real-time interaction totals.
  • view_count & bookmark_count: Deep-reach analytics.

Nested Relationships (Recursive Logic)

To optimize for RAG (Retrieval-Augmented Generation) and data scraping, Sorsa nests related content:
  • user: The full User Object of the post author.
  • retweeted_status: Contains the full original Tweet Object if the post is a retweet.
  • quoted_status: Contains the full original Tweet Object if the post is a quote.

Media & Entities

The entities array categorizes all visual and interactive content:
  • type: (photo, video, animated_gif).
  • link: Direct URL to the high-resolution media asset.
  • preview: URL for optimized thumbnail images.

🔄 Cursor-Based Pagination

For high-volume Twitter scraping, Sorsa uses a next_cursor logic. This is more reliable than traditional offsets for dynamic feeds.
  1. Initial Request: Send your query.
  2. Response: Receive data + next_cursor string.
  3. Iteration: Include the next_cursor in your subsequent request body/params.
  4. Termination: A null or missing next_cursor indicates the end of the data stream.
More about pagination here

⏭ Next Steps