org.graphstream.algorithm.generator
Class URLGenerator

java.lang.Object
  extended by org.graphstream.stream.SourceBase
      extended by org.graphstream.algorithm.generator.BaseGenerator
          extended by org.graphstream.algorithm.generator.URLGenerator
All Implemented Interfaces:
Generator, org.graphstream.stream.Source

public class URLGenerator
extends BaseGenerator

Generate a graph using the web. Some urls are given to start and the generator will extract links on these pages. Each url is a node and there is an edge between two urls when one has a link to the other. Links are extracted using the "href" attribute of html elements.


Nested Class Summary
static class URLGenerator.Mode
           
static interface URLGenerator.URLFilter
          Defines url filter.
 
Nested classes/interfaces inherited from class org.graphstream.stream.SourceBase
org.graphstream.stream.SourceBase.ElementType
 
Constructor Summary
URLGenerator(String... startFrom)
           
 
Method Summary
 void acceptOnlyMatchingURL(String regex)
          Can be used to filter url.
 void addHostFilter(String... hosts)
          Can be used to filter url according to the host.
 void addURL(String url)
          Add an url to process.
 void begin()
          Begin the graph generation.
 void declineMatchingURL(String regex)
          Can be used to filter url.
 void enableProgression(boolean on)
           
 boolean nextEvents()
          Perform the next step in generating the graph.
 void setDirected(boolean on)
          Create directed edges.
 void setEdgeWeightAttribute(String attribute)
          Set the attribute key used to store weight of edges.
 void setMode(URLGenerator.Mode mode)
          Set the way that url are converted to node id.
 void setNodeWeightAttribute(String attribute)
          Set the attribute key used to store weight of nodes.
 void setThreadCount(int count)
          Set the amount of threads used to parse urls.
 
Methods inherited from class org.graphstream.algorithm.generator.BaseGenerator
addEdgeAttribute, addEdgeLabels, addNodeAttribute, addNodeLabels, end, isUsingInternalGraph, removeEdgeAttribute, removeNodeAttribute, setDirectedEdges, setEdgeAttributesRange, setNodeAttributesRange, setRandomSeed, setUseInternalGraph
 
Methods inherited from class org.graphstream.stream.SourceBase
addAttributeSink, addElementSink, addSink, attributeSinks, clearAttributeSinks, clearElementSinks, clearSinks, elementSinks, removeAttributeSink, removeElementSink, removeSink, sendAttributeChangedEvent, sendAttributeChangedEvent, sendEdgeAdded, sendEdgeAdded, sendEdgeAttributeAdded, sendEdgeAttributeAdded, sendEdgeAttributeChanged, sendEdgeAttributeChanged, sendEdgeAttributeRemoved, sendEdgeAttributeRemoved, sendEdgeRemoved, sendEdgeRemoved, sendGraphAttributeAdded, sendGraphAttributeAdded, sendGraphAttributeChanged, sendGraphAttributeChanged, sendGraphAttributeRemoved, sendGraphAttributeRemoved, sendGraphCleared, sendGraphCleared, sendNodeAdded, sendNodeAdded, sendNodeAttributeAdded, sendNodeAttributeAdded, sendNodeAttributeChanged, sendNodeAttributeChanged, sendNodeAttributeRemoved, sendNodeAttributeRemoved, sendNodeRemoved, sendNodeRemoved, sendStepBegins, sendStepBegins
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface org.graphstream.stream.Source
addAttributeSink, addElementSink, addSink, clearAttributeSinks, clearElementSinks, clearSinks, removeAttributeSink, removeElementSink, removeSink
 

Constructor Detail

URLGenerator

public URLGenerator(String... startFrom)
Method Detail

begin

public void begin()
Description copied from interface: Generator
Begin the graph generation. This usually is the place for initialization of the generator. After calling this method, call the Generator.nextEvents() method to add elements to the graph.


nextEvents

public boolean nextEvents()
Description copied from interface: Generator
Perform the next step in generating the graph. While this method returns true, there are still more elements to add to the graph to generate it. Be careful that some generators never return false here, since they can generate graphs of arbitrary size. For such generators, simply stop calling this method when enough elements have been generated. A call to this method can produce an undetermined number of nodes and edges. Checking nodes count is advisable when generating the graph to avoid an unwanted big graph.

Returns:
true while there are elements to add to the graph.

addURL

public void addURL(String url)
Add an url to process.

Parameters:
url - a new url

setDirected

public void setDirected(boolean on)
Create directed edges.

Parameters:
on - true to create directed edges

setNodeWeightAttribute

public void setNodeWeightAttribute(String attribute)
Set the attribute key used to store weight of nodes. Whenever a node is reached, its weight is increased by one.

Parameters:
attribute - attribute key of the weight of nodes

setEdgeWeightAttribute

public void setEdgeWeightAttribute(String attribute)
Set the attribute key used to store weight of edges. Whenever an edge is reached, its weight is increased by one.

Parameters:
attribute - attribute key of the weight of edges

setMode

public void setMode(URLGenerator.Mode mode)
Set the way that url are converted to node id. When mode is Mode.FULL, then the id is the raw url. With Mode.PATH, the query of the url is truncated so the url http://host/path?what=xxx will be converted as http://host/path. With Mode.HOST, the url is converted to the host name so the url http://host/path will be converted as http://host.

Parameters:
mode - mode specifying how to convert url to have node id

setThreadCount

public void setThreadCount(int count)
Set the amount of threads used to parse urls. Threads are created in the nextEvents() step. At the end of this method, all working thread have stop.

Parameters:
count - amount of threads

enableProgression

public void enableProgression(boolean on)

acceptOnlyMatchingURL

public void acceptOnlyMatchingURL(String regex)
Can be used to filter url. Url not matching this regex will be discarded.

Parameters:
regex -

declineMatchingURL

public void declineMatchingURL(String regex)
Can be used to filter url. Url matching this regex will be discarded.

Parameters:
regex -

addHostFilter

public void addHostFilter(String... hosts)
Can be used to filter url according to the host. Note that several calls to this method may lead to discard all url. All hosts should be gived in a single call.

Parameters:
hosts - list of accepted hosts


Copyright © 2012. All Rights Reserved.