Apify SDK

Apify SDK

  • Guide
  • Examples
  • Reference
  • GitHub

›Reference

Guide

  • Motivation
  • Getting Started
  • Quick Start
  • What is an Actor
  • Environment Variables
  • Data Storage
  • Puppeteer Live View

Reference

  • Apify
  • AutoscaledPool
  • BasicCrawler
  • CheerioCrawler
  • Dataset
  • KeyValueStore
  • LiveViewServer
  • PseudoUrl
  • PuppeteerCrawler
  • PuppeteerPool
  • Request
  • RequestList
  • RequestQueue
  • SessionPool
  • Session
  • utils
  • utils.log
  • utils.puppeteer
  • utils.social

Request

Represents a URL to be crawled, optionally including HTTP method, headers, payload and other metadata. The Request object also stores information about errors that occurred during processing of the request.

Each Request instance has the uniqueKey property, which can be either specified manually in the constructor or generated automatically from the URL. Two requests with the same uniqueKey are considered as pointing to the same web resource. This behavior applies to all Apify SDK classes, such as RequestList, RequestQueue or PuppeteerCrawler.

Example use:

const request = new Apify.Request({
    url: 'http://www.example.com',
    headers: { Accept: 'application/json' },
});

...

request.userData.foo = 'bar';
request.pushErrorMessage(new Error('Request failed!'));

...

const foo = request.userData.foo;

Properties

ParamType
idString

Request ID

urlString

URL of the web page to crawl.

loadedUrlString

An actually loaded URL after redirects, if present. HTTP redirects are guaranteed to be included.

When using PuppeteerCrawler, meta tag and JavaScript redirects may, or may not be included, depending on their nature. This generally means that redirects, which happen immediately will most likely be included, but delayed redirects will not.

uniqueKeyString

A unique key identifying the request. Two requests with the same uniqueKey are considered as pointing to the same URL.

methodString

HTTP method, e.g. GET or POST.

payloadString | Buffer

HTTP request payload, e.g. for POST requests.

noRetryBoolean

The true value indicates that the request will not be automatically retried on error.

retryCountNumber

Indicates the number of times the crawling of the request has been retried on error.

errorMessagesArray

An array of error messages from request processing.

headersObject

Object with HTTP headers. Key is header name, value is the value.

userDataObject

Custom user data assigned to the request.

handledAtDate

Indicates the time when the request has been processed. Is null if the request has not been crawled yet.

  • Request
    • new Request(options)
    • .pushErrorMessage(errorOrMessage, [options])

new Request(options)

ParamTypeDefault
optionsobject

All Request parameters are passed via an options object with the following keys:

options.urlString

URL of the web page to crawl. It must be a non-empty string.

[options.uniqueKey]String

A unique key identifying the request. Two requests with the same uniqueKey are considered as pointing to the same URL.

If uniqueKey is not provided, then it is automatically generated by normalizing the URL. For example, the URL of HTTP://www.EXAMPLE.com/something/ will produce the uniqueKey of http://www.example.com/something.

The keepUrlFragment option determines whether URL hash fragment is included in the uniqueKey or not.

The useExtendedUniqueKey options determines whether method and payload are included in the uniqueKey, producing a uniqueKey in the following format: METHOD(payloadHash):normalizedUrl. This is useful when requests point to the same URL, but with different methods and payloads. For example: form submits.

Pass an arbitrary non-empty text value to the uniqueKey property to override the default behavior and specify which URLs shall be considered equal.

[options.method]String'GET'
[options.payload]String | Buffer

HTTP request payload, e.g. for POST requests.

[options.headers]Object{}

HTTP headers in the following format:

  {
      Accept: 'text/html',
      'Content-Type': 'application/json'
  }
[options.userData]Object{}

Custom user data assigned to the request. Use this to save any request related data to the request's scope, keeping them accessible on retries, failures etc.

[options.keepUrlFragment]Booleanfalse

If false then the hash part of a URL is removed when computing the uniqueKey property. For example, this causes the http://www.example.com#foo and http://www.example.com#bar URLs to have the same uniqueKey of http://www.example.com and thus the URLs are considered equal. Note that this option only has an effect if uniqueKey is not set.

[options.useExtendedUniqueKey]Booleanfalse

If true then the uniqueKey is computed not only from the URL, but also from the method and payload properties. This is useful when making requests to the same URL that are differentiated by method or payload, such as form submit navigations in browsers.

request.pushErrorMessage(errorOrMessage, [options])

Stores information about an error that occurred during processing of this request.

You should always use Error instances when throwing errors in JavaScript.

Nevertheless, to improve the debugging experience when using third party libraries that may not always throw an Error instance, the function performs a type inspection of the passed argument and attempts to extract as much information as possible, since just throwing a bad type error makes any debugging rather difficult.

ParamTypeDefault
errorOrMessageError | String

Error object or error message to be stored in the request.

[options]Object
[options.omitStack]Booleanfalse

Only push the error message without stack trace when true.

← PuppeteerPoolRequestList →
  • new Request(options)
  • request.pushErrorMessage(errorOrMessage, [options])
Apify SDK
Docs
GuideExamplesReference
Community
Stack OverflowTwitterFacebook
More
Apify CloudDocusaurusGitHub
Copyright © 2019 Apify Technologies s.r.o.