Browser
A client able to retrieve and parse HTML pages from the web and from local resources.
An implementation of Browser
can fetch pages via HTTP GET or POST requests, parse the downloaded page and return a
net.ruippeixotog.scalascraper.model.Document instance, which can be queried via the scraper DSL or using its
methods.
Different net.ruippeixotog.scalascraper.browser.Browser implementations can embed pages with different runtime
behavior. For example, some browsers may limit themselves to parse the HTML content inside the page without
executing any scripts inside, while others may run JavaScript and allow for Document
instances with dynamic
content. The documentation of each implementation should be read for more information on the semantics of its
Document
and net.ruippeixotog.scalascraper.model.Element implementations.
Type members
Value members
Abstract methods
Returns the current set of cookies stored in this browser for a given URL.
Returns the current set of cookies stored in this browser for a given URL.
- Value parameters:
- url
the URL whose stored cookies are to be returned
- Returns:
a mapping of cookie names to their respective values.
Retrieves and parses a web page using a GET request.
Retrieves and parses a web page using a GET request.
- Value parameters:
- url
the URL of the page to retrieve
- Returns:
a
Document
containing the retrieved web page.
Parses a local HTML file with a specified charset.
Parses a local HTML file with a specified charset.
- Value parameters:
- charset
the charset of the file
- file
the HTML file to parse
- Returns:
a
Document
containing the parsed web page.
Parses an input stream with its content in a specified charset. The provided input stream is always closed before this method returns or throws an exception.
Parses an input stream with its content in a specified charset. The provided input stream is always closed before this method returns or throws an exception.
- Value parameters:
- charset
the charset of the input stream content
- inputStream
the input stream to parse
- Returns:
a
Document
containing the parsed web page.
Parses an HTML string.
Parses an HTML string.
- Value parameters:
- html
the HTML string to parse
- Returns:
a
Document
containing the parsed web page.
Submits a form via a POST request and parses the resulting page.
Submits a form via a POST request and parses the resulting page.
- Value parameters:
- form
a map containing the form fields to submit with their respective values
- url
the URL of the page to retrieve
- Returns:
a
Document
containing the resulting web page.
Concrete methods
Parses a local HTML file encoded in UTF-8.
Parses a local HTML file encoded in UTF-8.
- Value parameters:
- file
the HTML file to parse
- Returns:
a
Document
containing the parsed web page.
Parses a local HTML file with a specified charset.
Parses a local HTML file with a specified charset.
- Value parameters:
- charset
the charset of the file
- path
the path in the local filesystem where the HTML file is located
- Returns:
a
Document
containing the parsed web page.
Parses a local HTML file encoded in UTF-8.
Parses a local HTML file encoded in UTF-8.
- Value parameters:
- path
the path in the local filesystem where the HTML file is located
- Returns:
a
Document
containing the parsed web page.