HtmlUnitBrowser

class HtmlUnitBrowser(browserType: BrowserVersion, proxy: Option[ProxyConfig]) extends Browser

A Browser implementation based on HtmlUnit, a GUI-less browser for Java programs. HtmlUnitBrowser simulates thoroughly a web browser, executing JavaScript code in the pages besides parsing and modelling its HTML content. It supports several compatibility modes, allowing it to emulate browsers such as Internet Explorer.

Both the net.ruippeixotog.scalascraper.model.Document and the net.ruippeixotog.scalascraper.model.Element instances obtained from HtmlUnitBrowser can be mutated in the background. JavaScript code can at any time change attributes and the content of elements, reflected both in queries to Document and on previously stored references to Elements. The Document instance will always represent the current page in the browser's "window". This means the Document's location value can change, together with its root element, in the event of client-side page refreshes or redirections. However, Element instances belong to a fixed DOM tree and they stop being meaningful as soon as they are removed from the DOM or a client-side page reload occurs.

Value parameters:
browserType

the browser type and version to simulate

proxy

an optional proxy configuration to use

Companion:
object
trait Browser
class Object
trait Matchable
class Any

Type members

Value members

Concrete methods

def clearCookies(): Unit
def closeAll(): Unit

Closes all windows opened in this browser.

Closes all windows opened in this browser.

def cookies(url: String): Map[String, String]
def exec(req: WebRequest): HtmlUnitDocument
def get(url: String): HtmlUnitDocument
def parseFile(file: File, charset: String): HtmlUnitDocument
def parseInputStream(inputStream: InputStream, charset: String): HtmlUnitDocument
def parseString(html: String): HtmlUnitDocument
def post(url: String, form: Map[String, String]): HtmlUnitDocument
def userAgent: String

Inherited methods

def parseFile(path: String): DocumentType

Parses a local HTML file encoded in UTF-8.

Parses a local HTML file encoded in UTF-8.

Value parameters:
path

the path in the local filesystem where the HTML file is located

Returns:

a Document containing the parsed web page.

Inherited from:
Browser
def parseFile(path: String, charset: String): DocumentType

Parses a local HTML file with a specified charset.

Parses a local HTML file with a specified charset.

Value parameters:
charset

the charset of the file

path

the path in the local filesystem where the HTML file is located

Returns:

a Document containing the parsed web page.

Inherited from:
Browser
def parseFile(file: File): DocumentType

Parses a local HTML file encoded in UTF-8.

Parses a local HTML file encoded in UTF-8.

Value parameters:
file

the HTML file to parse

Returns:

a Document containing the parsed web page.

Inherited from:
Browser
def parseResource(name: String, charset: String): DocumentType

Parses a resource with a specified charset.

Parses a resource with a specified charset.

Value parameters:
charset

the charset of the resource

name

the name of the resource to parse

Returns:

a Document containing the parsed web page.

Inherited from:
Browser

Concrete fields

lazy val underlying: WebClient