Class AsyncFetcher.Builder

  • Enclosing class:
    AsyncFetcher

    public static final class AsyncFetcher.Builder
    extends Object
    A builder for async fetcher class.
    • Method Detail

      • enableSocksProxy

        public AsyncFetcher.Builder enableSocksProxy()
        Enables SOCKS protocol for proxies (socks://). Experimental.
        Returns:
        this
      • register

        public AsyncFetcher.Builder register​(@NotNull
                                             @NotNull Callback callback)
        Register any callbacks that will be called when a page has been fetched.

        Please note that blocking callbacks will significantly reduce the rate at which request are processed. Please implement your own executors on I/O blocking callbacks.

        Parameters:
        callback - A set of FetcherCallback.
        Returns:
        this
      • disableCookies

        public AsyncFetcher.Builder disableCookies()
        Disables cookie storage.
        Returns:
        this
      • setFileManager

        public AsyncFetcher.Builder setFileManager​(FileManager fileManager)
        Sets the FileManager to be used. Defaults to none.

        If fileManager is set, all items fetched will be saved to storage.

        Parameters:
        fileManager - file manager to be used.
        Returns:
        this
      • setHeaders

        public AsyncFetcher.Builder setHeaders​(@NotNull
                                               @NotNull Map<String,​String> headers)
        Sets the headers to be used when fetching items. Defaults to none.
        Parameters:
        headers - a map to headers to be used.
        Returns:
        this
      • setNumIoThreads

        public AsyncFetcher.Builder setNumIoThreads​(int numIoThreads)
        Number of httpclient dispatcher threads.
        Parameters:
        numIoThreads - number of threads.
        Returns:
        this
      • setMaxConnections

        public AsyncFetcher.Builder setMaxConnections​(int maxConnections)
        Sets the maximum allowable connections at an instance.
        Parameters:
        maxConnections - the max allowable connections.
        Returns:
        this
      • setMaxRouteConnections

        public AsyncFetcher.Builder setMaxRouteConnections​(int maxRouteConnections)
        Sets the maximum allowable connections at an instance for a particular route (host).
        Parameters:
        maxRouteConnections - the max allowable connections per route.
        Returns:
        this
      • setProxyProvider

        public AsyncFetcher.Builder setProxyProvider​(ProxyProvider proxyProvider)
        Sets the ProxyProvider to be used. Defaults to none.
        Parameters:
        proxyProvider - proxy provider to be used.
        Returns:
        this
      • setSslContext

        public AsyncFetcher.Builder setSslContext​(SSLContext sslContext)
        Sets the ssl context for an encrypted response.
        Parameters:
        sslContext - SSLContext to be used.
        Returns:
        this
      • setStopCodes

        public AsyncFetcher.Builder setStopCodes​(@NotNull
                                                 @javax.validation.constraints.NotNull int... codes)
        Set a list of stop code that will interrupt crawling.
        Parameters:
        codes - A list of stop codes.
        Returns:
        this
      • setThreadFactory

        public AsyncFetcher.Builder setThreadFactory​(@NotNull
                                                     @NotNull ThreadFactory threadFactory)
        Set the thread factory that creates the httpclient dispatcher threads.
        Parameters:
        threadFactory - an instance of ThreadFactory.
        Returns:
        this
      • setUserAgent

        public AsyncFetcher.Builder setUserAgent​(@NotNull
                                                 @NotNull UserAgent userAgent)
        Sets the UserAgent to be used, if not set, default will be chosen.
        Parameters:
        userAgent - user agent generator to be used.
        Returns:
        this
      • setValidator

        public AsyncFetcher.Builder setValidator​(@NotNull
                                                 @NotNull Validator validator)
        Sets the Validator to be used. Defaults to StatusOkValidator and EmptyContentValidator.

        This will validate the fetched page and retry if page is not consistent with the specification set by the validator.

        Parameters:
        validator - validator to be used.
        Returns:
        this
      • setValidator

        public AsyncFetcher.Builder setValidator​(@NotNull
                                                 @NotNull Validator... validators)
        Sets the multiple validators to be used. Defaults to StatusOkValidator and EmptyContentValidator.

        This will validate the fetched page and retry if page is not consistent with the specification set by the validator.

        Parameters:
        validators - validator to be used.
        Returns:
        this
      • setRedirectStrategy

        public AsyncFetcher.Builder setRedirectStrategy​(org.apache.http.client.RedirectStrategy redirectStrategy)
        Sets the redirection strategy for a response received by the fetcher.
        Parameters:
        redirectStrategy - redirection strategy to be used.
        Returns:
        this
      • setValidatorRouter

        public AsyncFetcher.Builder setValidatorRouter​(ValidatorRouter router)
        Sets ValidatorRouter to be used. Defaults to none. Validator rules set in validator will always be used.
        Parameters:
        router - router validator setValidatorRouter to be used.
        Returns:
        this
      • setConnectionRequestTimeout

        public AsyncFetcher.Builder setConnectionRequestTimeout​(int connectionRequestTimeout)
        The timeout in milliseconds used when requesting a connection from the connection manager. A timeout value of zero is interpreted as an infinite timeout.
        Parameters:
        connectionRequestTimeout - timeout.
        Returns:
        this
      • setConnectTimeout

        public AsyncFetcher.Builder setConnectTimeout​(int connectTimeout)
        Determines the timeout in milliseconds until a connection is established. A timeout value of zero is interpreted as an infinite timeout.
        Parameters:
        connectTimeout - timeout.
        Returns:
        this
      • setSocketTimeout

        public AsyncFetcher.Builder setSocketTimeout​(int socketTimeout)
        Defines the socket timeout (SO_TIMEOUT) in milliseconds, which is the timeout for waiting for data or, put differently, a maximum period inactivity between two consecutive data packets).
        Parameters:
        socketTimeout - timeout.
        Returns:
        this
      • disableCompression

        public AsyncFetcher.Builder disableCompression()
        Disables request for compress pages and to decompress pages after it is fetched. Defaults to true.
        Returns:
        this
      • build

        public AsyncFetcher build()
        Builds the fetcher with the options specified.
        Returns:
        an instance of Fetcher.