Class HTMLParser
-
- All Implemented Interfaces:
-
org.apache.jmeter.protocol.http.parser.LinkExtractorParser
public abstract class HTMLParser extends BaseParser
HTMLParser subclasses can parse HTML content to obtain URLs.
-
-
Field Summary
Fields Modifier and Type Field Description public final static String
PARSER_CLASSNAME
public final static String
DEFAULT_PARSER
-
Method Summary
Modifier and Type Method Description Iterator<URL>
getEmbeddedResourceURLs(String userAgent, Array<byte> html, URL baseUrl, String encoding)
Get the URLs for all the resources that a browser would automatically download following the download of the HTML content, that is: images, stylesheets, javascript files, applets, etc... abstract Iterator<URL>
getEmbeddedResourceURLs(String userAgent, Array<byte> html, URL baseUrl, URLCollection coll, String encoding)
Get the URLs for all the resources that a browser would automatically download following the download of the HTML content, that is: images, stylesheets, javascript files, applets, etc... Iterator<URL>
getEmbeddedResourceURLs(String userAgent, Array<byte> html, URL baseUrl, Collection<URLString> coll, String encoding)
Get the URLs for all the resources that a browser would automatically download following the download of the HTML content, that is: images, stylesheets, javascript files, applets, etc... -
-
Method Detail
-
getEmbeddedResourceURLs
Iterator<URL> getEmbeddedResourceURLs(String userAgent, Array<byte> html, URL baseUrl, String encoding)
Get the URLs for all the resources that a browser would automatically download following the download of the HTML content, that is: images, stylesheets, javascript files, applets, etc...
URLs should not appear twice in the returned iterator.
Malformed URLs can be reported to the caller by having the Iterator return the corresponding RL String. Overall problems parsing the html should be reported by throwing an HTMLParseException.
- Parameters:
userAgent
- User Agenthtml
- HTML codebaseUrl
- Base URL from which the HTML code was obtainedencoding
- Charset
-
getEmbeddedResourceURLs
abstract Iterator<URL> getEmbeddedResourceURLs(String userAgent, Array<byte> html, URL baseUrl, URLCollection coll, String encoding)
Get the URLs for all the resources that a browser would automatically download following the download of the HTML content, that is: images, stylesheets, javascript files, applets, etc...
All URLs should be added to the Collection.
Malformed URLs can be reported to the caller by having the Iterator return the corresponding RL String. Overall problems parsing the html should be reported by throwing an HTMLParseException.
N.B. The Iterator returns URLs, but the Collection will contain objects of class URLString.
- Parameters:
userAgent
- User Agenthtml
- HTML codebaseUrl
- Base URL from which the HTML code was obtainedcoll
- URLCollectionencoding
- Charset
-
getEmbeddedResourceURLs
Iterator<URL> getEmbeddedResourceURLs(String userAgent, Array<byte> html, URL baseUrl, Collection<URLString> coll, String encoding)
Get the URLs for all the resources that a browser would automatically download following the download of the HTML content, that is: images, stylesheets, javascript files, applets, etc...
N.B. The Iterator returns URLs, but the Collection will contain objects of class URLString.
- Parameters:
userAgent
- User Agenthtml
- HTML codebaseUrl
- Base URL from which the HTML code was obtainedcoll
- Collection - will contain URLString objects, not URLsencoding
- Charset
-
-
-
-