of the article
stores the lovely, pure text from the article, stripped of html, formatting, etcjust raw text with paragraphs separated by newlines. This is probably what you want to use.
description field in HTML source
field in the HTML source
of this article if found in the meta data
of this article we're parsing
holds the top Element we think is a candidate for the main body of the article
holds the top Image object that we think represents this article
holds a set of tags that may have been in the article, these are not meta keywords
holds a list of any movies we found on the page like youtube, vimeo
tores the final URL that we're going to try and fetch content against, this would be expanded if any escaped fragments were found in the starting url
stores the MD5 hash of the url to use for various identification tasks
stores the RAW HTML straight from the network connection
the JSoup Document object
this is the original JSoup document that contains a pure object from the original HTML without any cleaning options done on it
Sometimes useful to try and know when the publish date of an article was
A property bucket for consumers of goose to store custom data
extractions. This is populated by an implementation of
goose.extractors.AdditionalDataExtractor
which is executed
before document cleansing within goose.CrawlingActor#crawl
Facebook Open Graph data that that is found in Article Meta tags
A property bucket for consumers of goose to store custom data extractions.
A property bucket for consumers of goose to store custom data
extractions. This is populated by an implementation of
goose.extractors.AdditionalDataExtractor
which is executed
before document cleansing within goose.CrawlingActor#crawl
of this article if found in the meta data
stores the lovely, pure text from the article, stripped of html, formatting, etcjust raw text with paragraphs separated by newlines.
stores the lovely, pure text from the article, stripped of html, formatting, etcjust raw text with paragraphs separated by newlines. This is probably what you want to use.
the JSoup Document object
of this article we're parsing
tores the final URL that we're going to try and fetch content against, this would be expanded if any escaped fragments were found in the starting url
stores the MD5 hash of the url to use for various identification tasks
description field in HTML source
field in the HTML source
holds a list of any movies we found on the page like youtube, vimeo
Facebook Open Graph data that that is found in Article Meta tags
Sometimes useful to try and know when the publish date of an article was
this is the original JSoup document that contains a pure object from the original HTML without any cleaning options done on it
stores the RAW HTML straight from the network connection
holds a set of tags that may have been in the article, these are not meta keywords
of the article
holds the top Image object that we think represents this article
holds the top Element we think is a candidate for the main body of the article
An article
of the article
stores the lovely, pure text from the article, stripped of html, formatting, etcjust raw text with paragraphs separated by newlines. This is probably what you want to use.
description field in HTML source
field in the HTML source
of this article if found in the meta data
of this article we're parsing
holds the top Element we think is a candidate for the main body of the article
holds the top Image object that we think represents this article
holds a set of tags that may have been in the article, these are not meta keywords
holds a list of any movies we found on the page like youtube, vimeo
tores the final URL that we're going to try and fetch content against, this would be expanded if any escaped fragments were found in the starting url
stores the MD5 hash of the url to use for various identification tasks
stores the RAW HTML straight from the network connection
the JSoup Document object
this is the original JSoup document that contains a pure object from the original HTML without any cleaning options done on it
Sometimes useful to try and know when the publish date of an article was
A property bucket for consumers of goose to store custom data extractions. This is populated by an implementation of
goose.extractors.AdditionalDataExtractor
which is executed before document cleansing withingoose.CrawlingActor#crawl
Facebook Open Graph data that that is found in Article Meta tags