Class URLGenerator
java.lang.Object
org.graphstream.stream.SourceBase
org.graphstream.algorithm.generator.BaseGenerator
org.graphstream.algorithm.generator.URLGenerator
- All Implemented Interfaces:
Generator
,org.graphstream.stream.Source
- Direct Known Subclasses:
WikipediaGenerator
public class URLGenerator extends BaseGenerator
Generate a graph using the web. Some urls are given to start and the
generator will extract links on these pages. Each url is a node and there is
an edge between two urls when one has a link to the other. Links are
extracted using the "href" attribute of html elements.
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
URLGenerator.Mode
static interface
URLGenerator.URLFilter
Defines url filter. -
Constructor Summary
Constructors Constructor Description URLGenerator(String... startFrom)
-
Method Summary
Modifier and Type Method Description void
acceptOnlyMatchingURL(String regex)
Can be used to filter url.void
addHostFilter(String... hosts)
Can be used to filter url according to the host.void
addURL(String url)
Add an url to process.void
begin()
Begin the graph generation.void
declineMatchingURL(String regex)
Can be used to filter url.void
enableProgression(boolean on)
boolean
nextEvents()
Perform the next step in generating the graph.void
setDepthLimit(int depthLimit)
Set the maximum steps before stop.void
setDirected(boolean on)
Create directed edges.void
setEdgeWeightAttribute(String attribute)
Set the attribute key used to store weight of edges.void
setMode(URLGenerator.Mode mode)
Set the way that url are converted to node id.void
setNodeWeightAttribute(String attribute)
Set the attribute key used to store weight of nodes.void
setThreadCount(int count)
Set the amount of threads used to parse urls.Methods inherited from class org.graphstream.algorithm.generator.BaseGenerator
addEdgeAttribute, addEdgeAttribute, addEdgeAttribute, addEdgeLabels, addNodeAttribute, addNodeAttribute, addNodeAttribute, addNodeLabels, end, isUsingInternalGraph, removeEdgeAttribute, removeNodeAttribute, setDirectedEdges, setRandomSeed, setUseInternalGraph
Methods inherited from class org.graphstream.stream.SourceBase
addAttributeSink, addElementSink, addSink, attributeSinks, clearAttributeSinks, clearElementSinks, clearSinks, elementSinks, removeAttributeSink, removeElementSink, removeSink, sendAttributeChangedEvent, sendAttributeChangedEvent, sendEdgeAdded, sendEdgeAdded, sendEdgeAttributeAdded, sendEdgeAttributeAdded, sendEdgeAttributeChanged, sendEdgeAttributeChanged, sendEdgeAttributeRemoved, sendEdgeAttributeRemoved, sendEdgeRemoved, sendEdgeRemoved, sendGraphAttributeAdded, sendGraphAttributeAdded, sendGraphAttributeChanged, sendGraphAttributeChanged, sendGraphAttributeRemoved, sendGraphAttributeRemoved, sendGraphCleared, sendGraphCleared, sendNodeAdded, sendNodeAdded, sendNodeAttributeAdded, sendNodeAttributeAdded, sendNodeAttributeChanged, sendNodeAttributeChanged, sendNodeAttributeRemoved, sendNodeAttributeRemoved, sendNodeRemoved, sendNodeRemoved, sendStepBegins, sendStepBegins
-
Constructor Details
-
Method Details
-
begin
public void begin()Description copied from interface:Generator
Begin the graph generation. This usually is the place for initialization of the generator. After calling this method, call theGenerator.nextEvents()
method to add elements to the graph. -
nextEvents
public boolean nextEvents()Description copied from interface:Generator
Perform the next step in generating the graph. While this method returns true, there are still more elements to add to the graph to generate it. Be careful that some generators never return false here, since they can generate graphs of arbitrary size. For such generators, simply stop calling this method when enough elements have been generated. A call to this method can produce an undetermined number of nodes and edges. Checking nodes count is advisable when generating the graph to avoid an unwanted big graph.- Returns:
- true while there are elements to add to the graph.
-
addURL
Add an url to process.- Parameters:
url
- a new url
-
setDirected
public void setDirected(boolean on)Create directed edges.- Parameters:
on
- true to create directed edges
-
setNodeWeightAttribute
Set the attribute key used to store weight of nodes. Whenever a node is reached, its weight is increased by one.- Parameters:
attribute
- attribute key of the weight of nodes
-
setEdgeWeightAttribute
Set the attribute key used to store weight of edges. Whenever an edge is reached, its weight is increased by one.- Parameters:
attribute
- attribute key of the weight of edges
-
setMode
Set the way that url are converted to node id. When mode is Mode.FULL, then the id is the raw url. With Mode.PATH, the query of the url is truncated so the url http://host/path?what=xxx will be converted as http://host/path. With Mode.HOST, the url is converted to the host name so the url http://host/path will be converted as http://host.- Parameters:
mode
- mode specifying how to convert url to have node id
-
setThreadCount
public void setThreadCount(int count)Set the amount of threads used to parse urls. Threads are created in thenextEvents()
step. At the end of this method, all working thread have stop.- Parameters:
count
- amount of threads
-
setDepthLimit
public void setDepthLimit(int depthLimit)Set the maximum steps before stop. If 0 or less, limit is disabled.- Parameters:
depthLimit
- maximum steps before stop
-
enableProgression
public void enableProgression(boolean on) -
acceptOnlyMatchingURL
Can be used to filter url. Url not matching this regex will be discarded.- Parameters:
regex
- regex used to filter url
-
declineMatchingURL
Can be used to filter url. Url matching this regex will be discarded.- Parameters:
regex
- regex used to filter url
-
addHostFilter
Can be used to filter url according to the host. Note that several calls to this method may lead to discard all url. All hosts should be gived in a single call.- Parameters:
hosts
- list of accepted hosts
-