Skip to main content
Version: Next

ApifyScheduler

A Scrapy scheduler that uses the Apify RequestQueue to manage requests.

This scheduler requires the asyncio Twisted reactor to be installed.

Index

Methods

__init__

  • __init__(async_thread_timeout): None
  • Parameters

    • optionalasync_thread_timeout: timedelta = timedelta(seconds=60)

    Returns None

close

  • close(reason): None
  • Close the scheduler.

    Shut down the event loop and its thread gracefully.


    Parameters

    • reason: str

      The reason for closing the spider.

    Returns None

enqueue_request

  • enqueue_request(request): bool
  • Add a request to the scheduler.

    This could be called from either from a spider or a downloader middleware (e.g. redirect, retry, ...).


    Parameters

    • request: Request

      The request to add to the scheduler.

    Returns bool

    True if the request was successfully enqueued, False otherwise.

from_crawler

  • Create the scheduler, reading the async-thread timeout from the Scrapy settings.

    The APIFY_ASYNC_THREAD_TIMEOUT_SECS setting (in seconds) caps how long each coroutine run on the background event loop may take before timing out; it defaults to 60 seconds.


    Parameters

    • crawler: Crawler

    Returns ApifyScheduler

has_pending_requests

  • has_pending_requests(): bool
  • Check if the scheduler has any pending requests.


    Returns bool

    True if the scheduler has any pending requests, False otherwise.

next_request

  • next_request(): Request | None
  • Fetch the next request from the scheduler.


    Returns Request | None

    The next request, or None if there are no more requests.

open

  • open(spider): Deferred[None] | None
  • Open the scheduler.


    Parameters

    • spider: Spider

      The spider that the scheduler is associated with.

    Returns Deferred[None] | None