Class Crawler.Builder

  • Enclosing class:
    Crawler

    public static final class Crawler.Builder
    extends java.lang.Object
    A builder for crawler class.
    • Method Detail

      • setName

        public Crawler.Builder setName​(@NotNull
                                       @NotNull java.lang.String name)
        Sets the name for crawler thread.
        Parameters:
        name - name for crawler thread
        Returns:
        this
      • setFetcher

        public Crawler.Builder setFetcher​(@NotNull
                                          @NotNull Fetcher fetcher)
        Sets the Fetcher to be used, if not set, default will be chosen.
        Parameters:
        fetcher - fetcher to be used.
        Returns:
        this
      • setParallelism

        public Crawler.Builder setParallelism​(int parallelism)
        Sets the parallelism level. Defaults to system thread count.
        Parameters:
        parallelism - the parallelism level.
        Returns:
        this
      • setWorkerManager

        public Crawler.Builder setWorkerManager​(@NotNull
                                                @NotNull WorkerManager workerManager)
        Sets the WorkerManager to be used, if not set, default will be chosen.
        Parameters:
        workerManager - result workerManager to be used.
        Returns:
        this
      • setScheduler

        @Deprecated
        public Crawler.Builder setScheduler​(@NotNull
                                            @NotNull java.util.concurrent.BlockingQueue<Job> jobQueue)
        Deprecated.
        Sets the JobQueue to be used, if not set, default will be chosen. This is deprecated, use setJobQueue instead.
        Parameters:
        jobQueue - scheduler to be used.
        Returns:
        this
      • setJobQueue

        public Crawler.Builder setJobQueue​(@NotNull
                                           @NotNull java.util.concurrent.BlockingQueue<Job> jobQueue)
        Sets the JobQueue to be used, if not set, default will be chosen.
        Parameters:
        jobQueue - scheduler to be used.
        Returns:
        this
      • setHandlerRouter

        public Crawler.Builder setHandlerRouter​(HandlerRouter router)
        Sets HandlerRouter to be used. Defaults to none.
        Parameters:
        router - handler router to be used.
        Returns:
        this
      • setMaxConnections

        public Crawler.Builder setMaxConnections​(int maxConnections)
        The number of concurrent connections allowed out of the client.
        Parameters:
        maxConnections - maximum number of concurrent connections.
        Returns:
        this
      • setMaxTries

        public Crawler.Builder setMaxTries​(int maxTries)
        Sets number of times to retry for a request. This number excludes the first try. Defaults to 50.
        Parameters:
        maxTries - max retry times.
        Returns:
        this
      • setPropRetainProxy

        public Crawler.Builder setPropRetainProxy​(double propRetainProxy)
        Sets the proportion of max tries where a specified proxy, if specified will be used. Number should be between 0 and 1 inclusive, Defaults to 0.05.

        This only comes into effect when a specific proxy is set for the request. This proxy set will be overridden beyond this threshold.

        Parameters:
        propRetainProxy - threshold percentage.
        Returns:
        this
      • setSleepScheduler

        public Crawler.Builder setSleepScheduler​(SleepScheduler sleepScheduler)
        Sets the SleepScheduler to be used, if not set, default will be chosen.
        Parameters:
        sleepScheduler - sleepAndGetTime scheduler to be used.
        Returns:
        this
      • setSession

        public Crawler.Builder setSession​(Session session)
        Sets the Session to be used, if not set, defaults to Session.EMPTY_SESSION.
        Parameters:
        session - Sessions where variables are defined
        Returns:
        this
      • build

        public Crawler build()
        Builds the crawler with the options specified.
        Returns:
        an instance of Crawler