Skip to main content

Version: Next

ApifyScheduler

A Scrapy scheduler that uses the Apify RequestQueue to manage requests.

This scheduler requires the asyncio Twisted reactor to be installed.

Index

Methods

Methods

init

__init__(async_thread_timeout): None

Parameters
- optionalasync_thread_timeout: timedelta = timedelta(seconds=60)
Returns None

close

close(reason): None

Close the scheduler.

Shut down the event loop and its thread gracefully.
Parameters
- reason: str
  The reason for closing the spider.
Returns None

enqueue_request

enqueue_request(request): bool

Add a request to the scheduler.

This could be called from either from a spider or a downloader middleware (e.g. redirect, retry, ...).
Parameters
- request: Request
  The request to add to the scheduler.
Returns bool
True if the request was successfully enqueued, False otherwise.

from_crawler

from_crawler(crawler): ApifyScheduler

Create the scheduler, reading the async-thread timeout from the Scrapy settings.

The APIFY_ASYNC_THREAD_TIMEOUT_SECS setting (in seconds) caps how long each coroutine run on the background event loop may take before timing out; it defaults to 60 seconds.
Parameters
- crawler: Crawler
Returns ApifyScheduler

has_pending_requests

has_pending_requests(): bool

Check if the scheduler has any pending requests.
Returns bool
True if the scheduler has any pending requests, False otherwise.

next_request

next_request(): Request | None

Fetch the next request from the scheduler.
Returns Request | None
The next request, or None if there are no more requests.

open

open(spider): Deferred[None] | None

Open the scheduler.
Parameters
- spider: Spider
  The spider that the scheduler is associated with.
Returns Deferred[None] | None

Page Options

Hide Inherited

__init__
close
enqueue_request
from_crawler
has_pending_requests
next_request
open