Class URLGenerator

java.lang.Object
org.graphstream.stream.SourceBase
org.graphstream.algorithm.generator.BaseGenerator
org.graphstream.algorithm.generator.URLGenerator
All Implemented Interfaces:
Generator, org.graphstream.stream.Source
Direct Known Subclasses:
WikipediaGenerator

public class URLGenerator
extends BaseGenerator
Generate a graph using the web. Some urls are given to start and the generator will extract links on these pages. Each url is a node and there is an edge between two urls when one has a link to the other. Links are extracted using the "href" attribute of html elements.
  • Nested Class Summary

    Nested Classes 
    Modifier and Type Class Description
    static class  URLGenerator.Mode  
    static interface  URLGenerator.URLFilter
    Defines url filter.

    Nested classes/interfaces inherited from class org.graphstream.stream.SourceBase

    org.graphstream.stream.SourceBase.ElementType
  • Constructor Summary

    Constructors 
    Constructor Description
    URLGenerator​(String... startFrom)  
  • Method Summary

    Modifier and Type Method Description
    void acceptOnlyMatchingURL​(String regex)
    Can be used to filter url.
    void addHostFilter​(String... hosts)
    Can be used to filter url according to the host.
    void addURL​(String url)
    Add an url to process.
    void begin()
    Begin the graph generation.
    void declineMatchingURL​(String regex)
    Can be used to filter url.
    void enableProgression​(boolean on)  
    boolean nextEvents()
    Perform the next step in generating the graph.
    void setDepthLimit​(int depthLimit)
    Set the maximum steps before stop.
    void setDirected​(boolean on)
    Create directed edges.
    void setEdgeWeightAttribute​(String attribute)
    Set the attribute key used to store weight of edges.
    void setMode​(URLGenerator.Mode mode)
    Set the way that url are converted to node id.
    void setNodeWeightAttribute​(String attribute)
    Set the attribute key used to store weight of nodes.
    void setThreadCount​(int count)
    Set the amount of threads used to parse urls.

    Methods inherited from class org.graphstream.stream.SourceBase

    addAttributeSink, addElementSink, addSink, attributeSinks, clearAttributeSinks, clearElementSinks, clearSinks, elementSinks, removeAttributeSink, removeElementSink, removeSink, sendAttributeChangedEvent, sendAttributeChangedEvent, sendEdgeAdded, sendEdgeAdded, sendEdgeAttributeAdded, sendEdgeAttributeAdded, sendEdgeAttributeChanged, sendEdgeAttributeChanged, sendEdgeAttributeRemoved, sendEdgeAttributeRemoved, sendEdgeRemoved, sendEdgeRemoved, sendGraphAttributeAdded, sendGraphAttributeAdded, sendGraphAttributeChanged, sendGraphAttributeChanged, sendGraphAttributeRemoved, sendGraphAttributeRemoved, sendGraphCleared, sendGraphCleared, sendNodeAdded, sendNodeAdded, sendNodeAttributeAdded, sendNodeAttributeAdded, sendNodeAttributeChanged, sendNodeAttributeChanged, sendNodeAttributeRemoved, sendNodeAttributeRemoved, sendNodeRemoved, sendNodeRemoved, sendStepBegins, sendStepBegins

    Methods inherited from class java.lang.Object

    equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

    Methods inherited from interface org.graphstream.stream.Source

    addAttributeSink, addElementSink, addSink, clearAttributeSinks, clearElementSinks, clearSinks, removeAttributeSink, removeElementSink, removeSink
  • Constructor Details

  • Method Details

    • begin

      public void begin()
      Description copied from interface: Generator
      Begin the graph generation. This usually is the place for initialization of the generator. After calling this method, call the Generator.nextEvents() method to add elements to the graph.
    • nextEvents

      public boolean nextEvents()
      Description copied from interface: Generator
      Perform the next step in generating the graph. While this method returns true, there are still more elements to add to the graph to generate it. Be careful that some generators never return false here, since they can generate graphs of arbitrary size. For such generators, simply stop calling this method when enough elements have been generated. A call to this method can produce an undetermined number of nodes and edges. Checking nodes count is advisable when generating the graph to avoid an unwanted big graph.
      Returns:
      true while there are elements to add to the graph.
    • addURL

      public void addURL​(String url)
      Add an url to process.
      Parameters:
      url - a new url
    • setDirected

      public void setDirected​(boolean on)
      Create directed edges.
      Parameters:
      on - true to create directed edges
    • setNodeWeightAttribute

      public void setNodeWeightAttribute​(String attribute)
      Set the attribute key used to store weight of nodes. Whenever a node is reached, its weight is increased by one.
      Parameters:
      attribute - attribute key of the weight of nodes
    • setEdgeWeightAttribute

      public void setEdgeWeightAttribute​(String attribute)
      Set the attribute key used to store weight of edges. Whenever an edge is reached, its weight is increased by one.
      Parameters:
      attribute - attribute key of the weight of edges
    • setMode

      public void setMode​(URLGenerator.Mode mode)
      Set the way that url are converted to node id. When mode is Mode.FULL, then the id is the raw url. With Mode.PATH, the query of the url is truncated so the url http://host/path?what=xxx will be converted as http://host/path. With Mode.HOST, the url is converted to the host name so the url http://host/path will be converted as http://host.
      Parameters:
      mode - mode specifying how to convert url to have node id
    • setThreadCount

      public void setThreadCount​(int count)
      Set the amount of threads used to parse urls. Threads are created in the nextEvents() step. At the end of this method, all working thread have stop.
      Parameters:
      count - amount of threads
    • setDepthLimit

      public void setDepthLimit​(int depthLimit)
      Set the maximum steps before stop. If 0 or less, limit is disabled.
      Parameters:
      depthLimit - maximum steps before stop
    • enableProgression

      public void enableProgression​(boolean on)
    • acceptOnlyMatchingURL

      public void acceptOnlyMatchingURL​(String regex)
      Can be used to filter url. Url not matching this regex will be discarded.
      Parameters:
      regex - regex used to filter url
    • declineMatchingURL

      public void declineMatchingURL​(String regex)
      Can be used to filter url. Url matching this regex will be discarded.
      Parameters:
      regex - regex used to filter url
    • addHostFilter

      public void addHostFilter​(String... hosts)
      Can be used to filter url according to the host. Note that several calls to this method may lead to discard all url. All hosts should be gived in a single call.
      Parameters:
      hosts - list of accepted hosts