Class HTMLParser

  • All Implemented Interfaces:
    org.apache.jmeter.protocol.http.parser.LinkExtractorParser

    
    public abstract class HTMLParser
    extends BaseParser
                        

    HTMLParser subclasses can parse HTML content to obtain URLs.

    • Nested Class Summary

      Nested Classes 
      Modifier and Type Class Description
    • Constructor Summary

      Constructors 
      Constructor Description
    • Enum Constant Summary

      Enum Constants 
      Enum Constant Description
    • Method Summary

      Modifier and Type Method Description
      Iterator<URL> getEmbeddedResourceURLs(String userAgent, Array<byte> html, URL baseUrl, String encoding) Get the URLs for all the resources that a browser would automatically download following the download of the HTML content, that is: images, stylesheets, javascript files, applets, etc...
      abstract Iterator<URL> getEmbeddedResourceURLs(String userAgent, Array<byte> html, URL baseUrl, URLCollection coll, String encoding) Get the URLs for all the resources that a browser would automatically download following the download of the HTML content, that is: images, stylesheets, javascript files, applets, etc...
      Iterator<URL> getEmbeddedResourceURLs(String userAgent, Array<byte> html, URL baseUrl, Collection<URLString> coll, String encoding) Get the URLs for all the resources that a browser would automatically download following the download of the HTML content, that is: images, stylesheets, javascript files, applets, etc...
      • Methods inherited from class org.apache.jmeter.protocol.http.parser.BaseParser

        getParser, isReusable
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Constructor Detail

    • Method Detail

      • getEmbeddedResourceURLs

         Iterator<URL> getEmbeddedResourceURLs(String userAgent, Array<byte> html, URL baseUrl, String encoding)

        Get the URLs for all the resources that a browser would automatically download following the download of the HTML content, that is: images, stylesheets, javascript files, applets, etc...

        URLs should not appear twice in the returned iterator.

        Malformed URLs can be reported to the caller by having the Iterator return the corresponding RL String. Overall problems parsing the html should be reported by throwing an HTMLParseException.

        Parameters:
        userAgent - User Agent
        html - HTML code
        baseUrl - Base URL from which the HTML code was obtained
        encoding - Charset
      • getEmbeddedResourceURLs

         abstract Iterator<URL> getEmbeddedResourceURLs(String userAgent, Array<byte> html, URL baseUrl, URLCollection coll, String encoding)

        Get the URLs for all the resources that a browser would automatically download following the download of the HTML content, that is: images, stylesheets, javascript files, applets, etc...

        All URLs should be added to the Collection.

        Malformed URLs can be reported to the caller by having the Iterator return the corresponding RL String. Overall problems parsing the html should be reported by throwing an HTMLParseException.

        N.B. The Iterator returns URLs, but the Collection will contain objects of class URLString.

        Parameters:
        userAgent - User Agent
        html - HTML code
        baseUrl - Base URL from which the HTML code was obtained
        coll - URLCollection
        encoding - Charset
      • getEmbeddedResourceURLs

         Iterator<URL> getEmbeddedResourceURLs(String userAgent, Array<byte> html, URL baseUrl, Collection<URLString> coll, String encoding)

        Get the URLs for all the resources that a browser would automatically download following the download of the HTML content, that is: images, stylesheets, javascript files, applets, etc...

        N.B. The Iterator returns URLs, but the Collection will contain objects of class URLString.

        Parameters:
        userAgent - User Agent
        html - HTML code
        baseUrl - Base URL from which the HTML code was obtained
        coll - Collection - will contain URLString objects, not URLs
        encoding - Charset