Class FindHTMLHrefHandler
- java.lang.Object
-
- org.apache.manifoldcf.crawler.connectors.webcrawler.FindHandler
-
- org.apache.manifoldcf.crawler.connectors.webcrawler.FindHTMLHrefHandler
-
- All Implemented Interfaces:
IDiscoveredLinkHandler,IHTMLHandler,IMetaTagHandler
public class FindHTMLHrefHandler extends FindHandler implements IHTMLHandler
This class is the handler for HTML parsing during state transitions
-
-
Field Summary
Fields Modifier and Type Field Description protected java.util.regex.PatternpreferredLinkPattern-
Fields inherited from class org.apache.manifoldcf.crawler.connectors.webcrawler.FindHandler
parentURI, targetURI
-
-
Constructor Summary
Constructors Constructor Description FindHTMLHrefHandler(java.lang.String parentURI, java.util.regex.Pattern preferredLinkPattern)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description voidapplyOverrides(LoginParameters lp)Apply overridesvoidfinishUp()Done with the document.voidnoteAHREF(java.lang.String rawURL)Note discovered hrefvoidnoteBASEHREF(java.lang.String rawURL)Note discovered basevoidnoteDiscoveredBase(java.lang.String rawURL)Inform the world of a new base HREF.voidnoteDiscoveredLink(java.lang.String rawURL)Override noteDiscoveredLinkvoidnoteFormEnd()Note the end of a formvoidnoteFormInput(java.util.Map inputAttributes)Note an input tagvoidnoteFormStart(java.util.Map formAttributes)Note the start of a formvoidnoteFRAMESRC(java.lang.String rawURL)Note discovered FRAME SRCvoidnoteIMGSRC(java.lang.String rawURL)Note discovered IMG SRCvoidnoteLINKHREF(java.lang.String rawURL)Note discovered hrefvoidnoteMetaTag(java.util.Map metaAttributes)Note a meta tagvoidnoteTextCharacter(char textCharacter)Note a character of text.-
Methods inherited from class org.apache.manifoldcf.crawler.connectors.webcrawler.FindHandler
getTargetURI
-
-
-
-
Method Detail
-
applyOverrides
public void applyOverrides(LoginParameters lp) throws org.apache.manifoldcf.core.interfaces.ManifoldCFException
Apply overrides- Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFException
-
noteTextCharacter
public void noteTextCharacter(char textCharacter) throws org.apache.manifoldcf.core.interfaces.ManifoldCFExceptionNote a character of text. Structured this way to keep overhead low for handlers that don't use text.- Specified by:
noteTextCharacterin interfaceIHTMLHandler- Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFException
-
noteMetaTag
public void noteMetaTag(java.util.Map metaAttributes) throws org.apache.manifoldcf.core.interfaces.ManifoldCFExceptionNote a meta tag- Specified by:
noteMetaTagin interfaceIMetaTagHandler- Parameters:
metaAttributes- are the attributes that belong to the tag.- Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFException
-
noteFormStart
public void noteFormStart(java.util.Map formAttributes) throws org.apache.manifoldcf.core.interfaces.ManifoldCFExceptionNote the start of a form- Specified by:
noteFormStartin interfaceIHTMLHandler- Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFException
-
noteFormInput
public void noteFormInput(java.util.Map inputAttributes) throws org.apache.manifoldcf.core.interfaces.ManifoldCFExceptionNote an input tag- Specified by:
noteFormInputin interfaceIHTMLHandler- Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFException
-
noteFormEnd
public void noteFormEnd() throws org.apache.manifoldcf.core.interfaces.ManifoldCFExceptionNote the end of a form- Specified by:
noteFormEndin interfaceIHTMLHandler- Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFException
-
noteDiscoveredBase
public void noteDiscoveredBase(java.lang.String rawURL) throws org.apache.manifoldcf.core.interfaces.ManifoldCFExceptionDescription copied from interface:IDiscoveredLinkHandlerInform the world of a new base HREF.- Specified by:
noteDiscoveredBasein interfaceIDiscoveredLinkHandler- Overrides:
noteDiscoveredBasein classFindHandler- Parameters:
rawURL- is the new base HREF, in raw form. This may be relative, malformed, etc.- Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFException
-
noteDiscoveredLink
public void noteDiscoveredLink(java.lang.String rawURL) throws org.apache.manifoldcf.core.interfaces.ManifoldCFExceptionOverride noteDiscoveredLink- Specified by:
noteDiscoveredLinkin interfaceIDiscoveredLinkHandler- Overrides:
noteDiscoveredLinkin classFindHandler- Parameters:
rawURL- is the raw discovered url. This may be relative, malformed, or otherwise unsuitable for use until final form is acheived.- Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFException
-
noteBASEHREF
public void noteBASEHREF(java.lang.String rawURL) throws org.apache.manifoldcf.core.interfaces.ManifoldCFExceptionNote discovered base- Specified by:
noteBASEHREFin interfaceIHTMLHandler- Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFException
-
noteAHREF
public void noteAHREF(java.lang.String rawURL) throws org.apache.manifoldcf.core.interfaces.ManifoldCFExceptionNote discovered href- Specified by:
noteAHREFin interfaceIHTMLHandler- Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFException
-
noteLINKHREF
public void noteLINKHREF(java.lang.String rawURL) throws org.apache.manifoldcf.core.interfaces.ManifoldCFExceptionNote discovered href- Specified by:
noteLINKHREFin interfaceIHTMLHandler- Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFException
-
noteIMGSRC
public void noteIMGSRC(java.lang.String rawURL) throws org.apache.manifoldcf.core.interfaces.ManifoldCFExceptionNote discovered IMG SRC- Specified by:
noteIMGSRCin interfaceIHTMLHandler- Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFException
-
noteFRAMESRC
public void noteFRAMESRC(java.lang.String rawURL) throws org.apache.manifoldcf.core.interfaces.ManifoldCFExceptionNote discovered FRAME SRC- Specified by:
noteFRAMESRCin interfaceIHTMLHandler- Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFException
-
finishUp
public void finishUp() throws org.apache.manifoldcf.core.interfaces.ManifoldCFExceptionDescription copied from interface:IHTMLHandlerDone with the document.- Specified by:
finishUpin interfaceIHTMLHandler- Throws:
org.apache.manifoldcf.core.interfaces.ManifoldCFException
-
-