Skip to main content
Version: Next

ApifyCacheStorage

A Scrapy cache storage that uses the Apify KeyValueStore to store responses.

It can be set as a storage for Scrapy's built-in HttpCacheMiddleware, which caches responses to requests. See HTTPCache middleware settings (prefixed with HTTPCACHE_) in the Scrapy documentation for more information. Requires the asyncio Twisted reactor to be installed.

Index

Methods

__init__

  • __init__(settings): None
  • Parameters

    • settings: BaseSettings

    Returns None

close_spider

  • close_spider(_, current_time): None
  • Close the cache storage for a spider.

    Runs a best-effort cleanup sweep that deletes expired entries when expiration is enabled, then shuts down the background event-loop thread. The thread is always closed, even if the sweep fails.


    Parameters

    • _: Spider

      The spider being closed. Part of Scrapy's storage interface, unused here.

    • optionalcurrent_time: int | None = None

      Unix time in seconds used as the current time when deciding which entries have expired. Defaults to the current time.

    Returns None

open_spider

  • open_spider(spider): None
  • Open the cache storage for a spider.

    Starts the background event-loop thread and opens the spider's key-value store. If opening the store fails, the freshly started thread is closed so it is not leaked.


    Parameters

    • spider: Spider

      The spider the cache storage is being opened for.

    Returns None

retrieve_response

  • retrieve_response(_, request, current_time): Response | None
  • Retrieve a cached response for a request.

    A malformed, legacy, or expired cache entry is treated as a miss, so Scrapy re-fetches the request and re-stores it in the current format.


    Parameters

    • _: Spider

      The spider making the request. Part of Scrapy's storage interface, unused here.

    • request: Request

      The request to look up in the cache.

    • optionalcurrent_time: int | None = None

      Unix time in seconds used as the current time when checking whether the entry has expired. Defaults to the current time.

    Returns Response | None

    The cached response on a hit, or None on a miss, an expired entry, or an unreadable entry.

store_response

  • store_response(_, request, response): None
  • Store a response in the cache storage.


    Parameters

    • _: Spider

      The spider that produced the response. Part of Scrapy's storage interface, unused here.

    • request: Request

      The request the response belongs to. Its fingerprint is used as the cache key.

    • response: Response

      The response to store in the cache.

    Returns None