A Browser implementation based on HtmlUnit, a GUI-less browser for Java
programs. HtmlUnitBrowser
simulates thoroughly a web browser, executing JavaScript code in the pages besides
parsing and modelling its HTML content. It supports several compatibility modes, allowing it to emulate browsers
such as Internet Explorer.
Both the net.ruippeixotog.scalascraper.model.Document and the net.ruippeixotog.scalascraper.model.Element
instances obtained from HtmlUnitBrowser
can be mutated in the background. JavaScript code can at any time change
attributes and the content of elements, reflected both in queries to Document
and on previously stored references
to Element
s. The Document
instance will always represent the current page in the browser's "window". This means
the Document
's location
value can change, together with its root element, in the event of client-side page
refreshes or redirections. However, Element
instances belong to a fixed DOM tree and they stop being meaningful as
soon as they are removed from the DOM or a client-side page reload occurs.
- Value parameters:
- browserType
the browser type and version to simulate
- proxy
an optional proxy configuration to use
- Companion:
- object
Type members
Types
Value members
Concrete methods
Inherited methods
Parses a local HTML file encoded in UTF-8.
Parses a local HTML file encoded in UTF-8.
- Value parameters:
- path
the path in the local filesystem where the HTML file is located
- Returns:
a
Document
containing the parsed web page.- Inherited from:
- Browser
Parses a local HTML file with a specified charset.
Parses a local HTML file with a specified charset.
- Value parameters:
- charset
the charset of the file
- path
the path in the local filesystem where the HTML file is located
- Returns:
a
Document
containing the parsed web page.- Inherited from:
- Browser
Parses a local HTML file encoded in UTF-8.
Parses a local HTML file encoded in UTF-8.
- Value parameters:
- file
the HTML file to parse
- Returns:
a
Document
containing the parsed web page.- Inherited from:
- Browser
Parses a resource with a specified charset.
Parses a resource with a specified charset.
- Value parameters:
- charset
the charset of the resource
- name
the name of the resource to parse
- Returns:
a
Document
containing the parsed web page.- Inherited from:
- Browser