🏗 Data Consistency & Types
- Snowflake IDs as Strings: All Twitter IDs (
id,conversation_id_str, etc.) are returned as strings. This prevents precision loss in environments like JavaScript, where 64-bit integers exceed theMAX_SAFE_INTEGERlimit. - Time Formats: Timestamps use the ISO 8601 format (e.g.,
2026-03-06T12:00:00Z), ensuring compatibility with standard database parsers and LLM context windows. - Booleans: Status flags (e.g.,
verified,is_reply) use strict boolean values for logic-gate efficiency.
👤 The Standard User Object
The User Object is the core of our User Data Extraction. It is returned by profile lookups and embedded within every tweet result.| Field | Type | SEO/LLM Context |
|---|---|---|
id | string | Permanent numeric X User ID (Snowflake). |
username | string | Unique alphanumeric handle (Screen Name). |
display_name | string | Public-facing profile name. |
description | string | Account Bio/Description. |
followers_count | integer | Total audience size. |
verified | boolean | Account verification status (Blue/Gold/Gray). |
can_dm | boolean | Direct Message availability status. |
created_at | string | Account registration timestamp. |
🐦 The Tweet Object Schema
Our Tweet Objects provide a comprehensive snapshot of engagement and content. For Twitter Sentiment Analysis or Engagement Tracking, these fields provide the necessary raw data.Engagement Metrics
likes_count,retweet_count,reply_count,quote_count: Real-time interaction totals.view_count&bookmark_count: Deep-reach analytics.
Nested Relationships (Recursive Logic)
To optimize for RAG (Retrieval-Augmented Generation) and data scraping, Sorsa nests related content:user: The full User Object of the post author.retweeted_status: Contains the full original Tweet Object if the post is a retweet.quoted_status: Contains the full original Tweet Object if the post is a quote.
Media & Entities
Theentities array categorizes all visual and interactive content:
type: (photo, video, animated_gif).link: Direct URL to the high-resolution media asset.preview: URL for optimized thumbnail images.
🔄 Cursor-Based Pagination
For high-volume Twitter scraping, Sorsa uses anext_cursor logic. This is more reliable than traditional offsets for dynamic feeds.
- Initial Request: Send your query.
- Response: Receive data +
next_cursorstring. - Iteration: Include the
next_cursorin your subsequent request body/params. - Termination: A
nullor missingnext_cursorindicates the end of the data stream.
⏭ Next Steps
- Common Errors & Troubleshooting — A technical dictionary of error messages and fixes.
- API Reference — View the full OpenAPI/Swagger specification.