Polling Data
Polling vs. Waiting for Webhooks: Handling GET RESULTS
When a job is running, you may need to fetch results before it’s fully finished.
This guide explains how to handle polling for results and when to rely on webhooks. You’ll learn when to poll, how often, and how to avoid common pitfalls like infinite loops.
TL;DR
Polling | Waiting for Webhooks |
---|---|
⚡ Get results ASAP | 🕰️ Get results once the job finishes |
⏱️ Request every 30 seconds | 🚫 No polling—wait for webhook |
🚫 Stop when webhook is received | 📡 Webhook sends a real-time update |
🕰️ Used when users expect fast feedback | 📦 Used when users can wait for full results |
🔄 Needs a stop condition | ✅ No need for stop conditions |
❌ Can cause API overload | ⚡ Uses system resources efficiently |
What is Polling?
Polling means sending repeated Get a Run’s Results requests to the server to check for job results until the job is completed or a webhook notification is received. 📡
Polling is useful when you want to access job results before the job is fully finished.
When Should You Poll?
Polling is useful in these scenarios:
- ⚡ User needs immediate feedback: Users want to see results as soon as possible (before the job finishes).
- 📡 If webhooks are failing: Use polling as a fallback method when webhooks are unavailable.
- 🛠️ Local development: Polling avoids the need for externally accessible endpoints when testing workflows.
- 🏗 Unreliable webhook delivery: Firewalls or connectivity issues may prevent webhook notifications from being received.
When Should You Wait for Webhooks?
Webhooks are better in these cases:
- 🕰️ Users can wait for full results: If the workflow doesn’t require immediate results, webhooks reduce API overhead.
- 🔔 Real-time updates: The system automatically pushes updates, making it ideal for automated workflows.
- 🚀 Lower API rate limits risk: Unlike polling, webhooks prevent excessive API calls, avoiding throttling or rate limits.
- 📦 When you want to reduce API calls: As Webhooks only trigger when a job is completed, it avoids unnecessary calls to Get a Run’s Results, saving API usage and system resources.
How it works: A webhook is triggered when the job status changes (e.g., created, success, failed). When this happens, you get a real-time notification.
To fetch the job’s progress using Get a
Run: row_count
represents the
number of outputs. You can calculate the progress as % progress = (#output / #input) * 100
.
How to Poll Efficiently
If you decide to poll, follow these best practices:
- Start polling as soon as the job is created.
- ⏱️ Frequency: Request every 30 seconds if the job involves large inputs.
Follow these steps:
-
Start the Job: Launch a workflow using Captain Data’s API.
-
Retrieve the Job ID: Extract the
job_uid
from the API response. -
Monitor Job Status: Periodically check the job’s status.
-
Fetch Results: Once the status is marked as
finished
, retrieve the data.
Here’s a basic polling loop in Python to check a job’s status and retrieve results:
Ensure your polling logic has a clear stop condition (like a maximum timeout) to avoid infinite loops.
To track the progress of a Run, use the following:
- Use Get a Run to check the Run’s status, here
get_run()
- Check the Run’s status and act upon:
- Use Get a Run’s Results to retrieve the results once the status is “running”, here
get_run_results()
- 🚫 Stop polling as soon as the Run’s status is finished or failed, since further polling is unnecessary.
You can also take into consideration the warning
status depending on your
implementation.
Polling Scenarios in Action
Scenario 1: Polling for Early Results
Use Case: You want to access job results before the job finishes.
- Start polling
https://api.captaindata.co/v3/jobs/:job_uid/results
as soon as the job starts. - Frequency: Send requests every 30 seconds for large jobs.
Scenario 2: Waiting for Full Results
Use Case: The user is okay waiting until all results are ready.
- Don’t poll.
- Rely on the webhook to notify you when the job is completed.
- When the webhook triggers, make a single
GET JOB RESULTS
request to get the full job results.
If Polling Both Results & Job Status
- 📡 Limit polling frequency to once every 5 to 10 times to avoid excess API calls.
- 📦 Once the job is complete, stop polling immediately.
- ❌ Do not retry polling if you receive an “All Inputs Failed” error. This error indicates that the inputs are invalid (e.g., LinkedIn profile doesn’t exist).
Why not retry? “All Inputs Failed” means the system has determined that no valid inputs exist. Retrying the job will produce the same result, so polling won’t help.
Common Pitfalls to Avoid
1️⃣ Avoid Infinite Polling Loops
- 🔄 Always set a stop condition to exit the polling loop if no webhook is received.
- 🕰️ Use a timeout or maximum retry count to ensure the loop ends.
2️⃣ Don’t Poll Without a Reason
- 📡 If a webhook is available, use it instead of polling.
- 💡 Polling should be a fallback, not your default strategy.
3️⃣ Don’t Retry on ‘All Inputs Failed’
- ⚠️ If you receive “All Inputs Failed” (e.g., due to invalid profiles), it’s better to stop polling.
- 🚫 Retrying won’t help because the system has already identified the issue (like a missing profile).
For more details on how to implement GET JOB RESULTS or Webhook Handling, check out our API Reference.
Polling as an Alternative to Webhooks
When integrating Captain Data into your workflows, retrieving job results is a crucial step. While webhooks are typically the preferred method for automation, certain situations make polling a more practical and reliable alternative—especially when working in local environments or dealing with specific technical constraints.
Challenges with Webhooks
Webhooks are designed to send data automatically when a job’s status changes, but they come with specific requirements and potential challenges:
- Network Accessibility: Webhooks require an externally accessible endpoint, which can be difficult to configure in local development.
- Error Handling: Issues like
404 Not Found
orBad Gateway
errors can arise due to incorrect configurations or unstable connections. - Setup Complexity: Running webhooks locally often requires additional tools, such as ngrok, to expose local endpoints to the internet.
When Polling is a Good Alternative
Polling provides a flexible alternative by allowing you to actively check for job updates rather than waiting for webhooks to push data. This approach can be particularly useful in the following scenarios:
- Local Development: Polling avoids the need to configure external endpoints, making it easier to test workflows.
- Unreliable Webhook Delivery: If webhook delivery is inconsistent due to firewall rules or connectivity issues, polling provides a controlled way to retrieve data.
- Tighter Process Control: You can define when and how often to check job statuses, reducing dependency on external event triggers.
Polling in Local Development
Polling can be particularly useful in local development because:
- No External Endpoint Required: You don’t need to expose a local server to the internet.
- Straightforward Implementation: Everything runs in a single script without additional dependencies.
- Better Debugging Control: You can inspect and control API calls directly.
That said, webhooks remain the recommended approach for production environments, as they provide real-time updates and reduce unnecessary API calls. If you’re working locally but still want to test webhooks, you can use tools like ngrok to tunnel your local server.
Conclusion
Both polling and webhooks have their place in workflow automation. Polling is a practical alternative in local development and scenarios where webhooks pose connectivity challenges, while webhooks are ideal for handling high-volume asynchronous workflows in production.
For teams integrating Captain Data at scale, webhooks ensure efficient, event-driven updates. However, if you’re troubleshooting, working in a local environment, or need more control over job execution, polling can be a reliable method to retrieve results on demand.
If you have questions, contact us at support@captaindata.co.