Package ai.preferred.venom
Class Crawler.Builder
- java.lang.Object
-
- ai.preferred.venom.Crawler.Builder
-
- Enclosing class:
- Crawler
public static final class Crawler.Builder extends java.lang.Object
A builder for crawler class.
-
-
Method Summary
All Methods Instance Methods Concrete Methods Deprecated Methods Modifier and Type Method Description Crawler
build()
Builds the crawler with the options specified.Crawler.Builder
setFetcher(@NotNull Fetcher fetcher)
Sets the Fetcher to be used, if not set, default will be chosen.Crawler.Builder
setHandlerRouter(HandlerRouter router)
Sets HandlerRouter to be used.Crawler.Builder
setJobQueue(@NotNull java.util.concurrent.BlockingQueue<Job> jobQueue)
Sets the JobQueue to be used, if not set, default will be chosen.Crawler.Builder
setMaxConnections(int maxConnections)
The number of concurrent connections allowed out of the client.Crawler.Builder
setMaxTries(int maxTries)
Sets number of times to retry for a request.Crawler.Builder
setName(@NotNull java.lang.String name)
Sets the name for crawler thread.Crawler.Builder
setParallelism(int parallelism)
Sets the parallelism level.Crawler.Builder
setPropRetainProxy(double propRetainProxy)
Sets the proportion of max tries where a specified proxy, if specified will be used.Crawler.Builder
setScheduler(@NotNull java.util.concurrent.BlockingQueue<Job> jobQueue)
Deprecated.Crawler.Builder
setSession(Session session)
Sets the Session to be used, if not set, defaults toSession.EMPTY_SESSION
.Crawler.Builder
setSleepScheduler(SleepScheduler sleepScheduler)
Sets the SleepScheduler to be used, if not set, default will be chosen.Crawler.Builder
setWorkerManager(@NotNull WorkerManager workerManager)
Sets the WorkerManager to be used, if not set, default will be chosen.
-
-
-
Method Detail
-
setName
public Crawler.Builder setName(@NotNull @NotNull java.lang.String name)
Sets the name for crawler thread.- Parameters:
name
- name for crawler thread- Returns:
- this
-
setFetcher
public Crawler.Builder setFetcher(@NotNull @NotNull Fetcher fetcher)
Sets the Fetcher to be used, if not set, default will be chosen.- Parameters:
fetcher
- fetcher to be used.- Returns:
- this
-
setParallelism
public Crawler.Builder setParallelism(int parallelism)
Sets the parallelism level. Defaults to system thread count.- Parameters:
parallelism
- the parallelism level.- Returns:
- this
-
setWorkerManager
public Crawler.Builder setWorkerManager(@NotNull @NotNull WorkerManager workerManager)
Sets the WorkerManager to be used, if not set, default will be chosen.- Parameters:
workerManager
- result workerManager to be used.- Returns:
- this
-
setScheduler
@Deprecated public Crawler.Builder setScheduler(@NotNull @NotNull java.util.concurrent.BlockingQueue<Job> jobQueue)
Deprecated.Sets the JobQueue to be used, if not set, default will be chosen. This is deprecated, use setJobQueue instead.- Parameters:
jobQueue
- scheduler to be used.- Returns:
- this
-
setJobQueue
public Crawler.Builder setJobQueue(@NotNull @NotNull java.util.concurrent.BlockingQueue<Job> jobQueue)
Sets the JobQueue to be used, if not set, default will be chosen.- Parameters:
jobQueue
- scheduler to be used.- Returns:
- this
-
setHandlerRouter
public Crawler.Builder setHandlerRouter(HandlerRouter router)
Sets HandlerRouter to be used. Defaults to none.- Parameters:
router
- handler router to be used.- Returns:
- this
-
setMaxConnections
public Crawler.Builder setMaxConnections(int maxConnections)
The number of concurrent connections allowed out of the client.- Parameters:
maxConnections
- maximum number of concurrent connections.- Returns:
- this
-
setMaxTries
public Crawler.Builder setMaxTries(int maxTries)
Sets number of times to retry for a request. This number excludes the first try. Defaults to 50.- Parameters:
maxTries
- max retry times.- Returns:
- this
-
setPropRetainProxy
public Crawler.Builder setPropRetainProxy(double propRetainProxy)
Sets the proportion of max tries where a specified proxy, if specified will be used. Number should be between 0 and 1 inclusive, Defaults to 0.05.This only comes into effect when a specific proxy is set for the request. This proxy set will be overridden beyond this threshold.
- Parameters:
propRetainProxy
- threshold percentage.- Returns:
- this
-
setSleepScheduler
public Crawler.Builder setSleepScheduler(SleepScheduler sleepScheduler)
Sets the SleepScheduler to be used, if not set, default will be chosen.- Parameters:
sleepScheduler
- sleepAndGetTime scheduler to be used.- Returns:
- this
-
setSession
public Crawler.Builder setSession(Session session)
Sets the Session to be used, if not set, defaults toSession.EMPTY_SESSION
.- Parameters:
session
- Sessions where variables are defined- Returns:
- this
-
build
public Crawler build()
Builds the crawler with the options specified.- Returns:
- an instance of Crawler
-
-