Class BaseRepositoryConnector
- java.lang.Object
-
- org.apache.manifoldcf.core.connector.BaseConnector
-
- org.apache.manifoldcf.crawler.connectors.BaseRepositoryConnector
-
- All Implemented Interfaces:
IConnector,IRepositoryConnector
public abstract class BaseRepositoryConnector extends BaseConnector implements IRepositoryConnector
This base class describes an instance of a connection between a repository and ManifoldCF's standard "pull" ingestion agent. Each instance of this interface is used in only one thread at a time. Connection Pooling on these kinds of objects is performed by the factory which instantiates repository connectors from symbolic names and config parameters, and is pooled by these parameters. That is, a pooled connector handle is used only if all the connection parameters for the handle match. Implementers of this interface should provide a default constructor which has this signature: xxx(); Connectors are either configured or not. If configured, they will persist in a pool, and be reused multiple times. Certain methods of a connector may be called before the connector is configured. This includes basically all methods that permit inspection of the connector's capabilities. The complete list is: The purpose of the repository connector is to allow documents to be fetched from the repository. Each repository connector describes a set of documents that are known only to that connector. It therefore establishes a space of document identifiers. Each connector will only ever be asked to deal with identifiers that have in some way originated from the connector. Documents are fetched using processDocuments(), which then gets to decide how to dispose of the document using the methods available by means of the provided IProcessActivity object.
-
-
Field Summary
Fields Modifier and Type Field Description static java.lang.String_rcsid-
Fields inherited from class org.apache.manifoldcf.core.connector.BaseConnector
currentContext, params
-
Fields inherited from interface org.apache.manifoldcf.crawler.interfaces.IRepositoryConnector
GLOBAL_DENY_TOKEN, JOBMODE_CONTINUOUS, JOBMODE_ONCEONLY, MODEL_ADD, MODEL_ADD_CHANGE, MODEL_ADD_CHANGE_DELETE, MODEL_ALL, MODEL_CHAINED_ADD, MODEL_CHAINED_ADD_CHANGE, MODEL_CHAINED_ADD_CHANGE_DELETE, MODEL_PARTIAL
-
-
Constructor Summary
Constructors Constructor Description BaseRepositoryConnector()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description java.lang.StringaddSeedDocuments(ISeedingActivity activities, Specification spec, java.lang.String lastSeedVersion, long seedTime, int jobMode)Queue "seed" documents.java.lang.String[]getActivitiesList()Return the list of activities that this connector supports (i.e.java.lang.String[]getBinNames(java.lang.String documentIdentifier)Get the bin name strings for a document identifier.intgetConnectorModel()Tell the world what model this connector uses for getDocumentIdentifiers().java.lang.StringgetFormCheckJavascriptMethodName(int connectionSequenceNumber)Obtain the name of the form check javascript method to call.java.lang.StringgetFormPresaveCheckJavascriptMethodName(int connectionSequenceNumber)Obtain the name of the form presave check javascript method to call.intgetMaxDocumentRequest()Get the maximum number of documents to amalgamate together into one batch, for this connector.java.lang.String[]getRelationshipTypes()Return the list of relationship types that this connector recognizes.voidoutputSpecificationBody(IHTTPOutput out, java.util.Locale locale, Specification ds, int connectionSequenceNumber, int actualSequenceNumber, java.lang.String tabName)Output the specification body section.voidoutputSpecificationHeader(IHTTPOutput out, java.util.Locale locale, Specification ds, int connectionSequenceNumber, java.util.List<java.lang.String> tabsArray)Output the specification header section.voidprocessDocuments(java.lang.String[] documentIdentifiers, IExistingVersions statuses, Specification spec, IProcessActivity activities, int jobMode, boolean usesDefaultAuthority)Process a set of documents.java.lang.StringprocessSpecificationPost(IPostParameters variableContext, java.util.Locale locale, Specification ds, int connectionSequenceNumber)Process a specification post.booleanrequestInfo(Configuration output, java.lang.String command)Request arbitrary connector information.voidviewSpecification(IHTTPOutput out, java.util.Locale locale, Specification ds, int connectionSequenceNumber)View specification.-
Methods inherited from class org.apache.manifoldcf.core.connector.BaseConnector
check, clearThreadContext, connect, deinstall, disconnect, getConfiguration, install, isConnected, outputConfigurationBody, outputConfigurationBody, outputConfigurationHeader, outputConfigurationHeader, outputConfigurationHeader, pack, packFixedList, packList, packList, poll, processConfigurationPost, processConfigurationPost, setThreadContext, unpack, unpackFixedList, unpackList, viewConfiguration, viewConfiguration
-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface org.apache.manifoldcf.core.interfaces.IConnector
check, clearThreadContext, connect, deinstall, disconnect, getConfiguration, install, isConnected, outputConfigurationBody, outputConfigurationHeader, poll, processConfigurationPost, setThreadContext, viewConfiguration
-
-
-
-
Field Detail
-
_rcsid
public static final java.lang.String _rcsid
- See Also:
- Constant Field Values
-
-
Method Detail
-
getConnectorModel
public int getConnectorModel()
Tell the world what model this connector uses for getDocumentIdentifiers(). This must return a model value as specified above.- Specified by:
getConnectorModelin interfaceIRepositoryConnector- Returns:
- the model type value.
-
getActivitiesList
public java.lang.String[] getActivitiesList()
Return the list of activities that this connector supports (i.e. writes into the log).- Specified by:
getActivitiesListin interfaceIRepositoryConnector- Returns:
- the list.
-
getRelationshipTypes
public java.lang.String[] getRelationshipTypes()
Return the list of relationship types that this connector recognizes.- Specified by:
getRelationshipTypesin interfaceIRepositoryConnector- Returns:
- the list.
-
getBinNames
public java.lang.String[] getBinNames(java.lang.String documentIdentifier)
Get the bin name strings for a document identifier. The bin name describes the queue to which the document will be assigned for throttling purposes. Throttling controls the rate at which items in a given queue are fetched; it does not say anything about the overall fetch rate, which may operate on multiple queues or bins. For example, if you implement a web crawler, a good choice of bin name would be the server name, since that is likely to correspond to a real resource that will need real throttle protection.- Specified by:
getBinNamesin interfaceIRepositoryConnector- Parameters:
documentIdentifier- is the document identifier.- Returns:
- the set of bin names. If an empty array is returned, it is equivalent to there being no request rate throttling available for this identifier.
-
requestInfo
public boolean requestInfo(Configuration output, java.lang.String command) throws ManifoldCFException
Request arbitrary connector information. This method is called directly from the API in order to allow API users to perform any one of several connector-specific queries.- Specified by:
requestInfoin interfaceIRepositoryConnector- Parameters:
output- is the response object, to be filled in by this method.command- is the command, which is taken directly from the API request.- Returns:
- true if the resource is found, false if not. In either case, output may be filled in.
- Throws:
ManifoldCFException
-
addSeedDocuments
public java.lang.String addSeedDocuments(ISeedingActivity activities, Specification spec, java.lang.String lastSeedVersion, long seedTime, int jobMode) throws ManifoldCFException, ServiceInterruption
Queue "seed" documents. Seed documents are the starting places for crawling activity. Documents are seeded when this method calls appropriate methods in the passed in ISeedingActivity object. This method can choose to find repository changes that happen only during the specified time interval. The seeds recorded by this method will be viewed by the framework based on what the getConnectorModel() method returns. It is not a big problem if the connector chooses to create more seeds than are strictly necessary; it is merely a question of overall work required. The end time and seeding version string passed to this method may be interpreted for greatest efficiency. For continuous crawling jobs, this method will be called once, when the job starts, and at various periodic intervals as the job executes. When a job's specification is changed, the framework automatically resets the seeding version string to null. The seeding version string may also be set to null on each job run, depending on the connector model returned by getConnectorModel(). Note that it is always ok to send MORE documents rather than less to this method. The connector will be connected before this method can be called.- Specified by:
addSeedDocumentsin interfaceIRepositoryConnector- Parameters:
activities- is the interface this method should use to perform whatever framework actions are desired.spec- is a document specification (that comes from the job).seedTime- is the end of the time range of documents to consider, exclusive.lastSeedVersion- is the last seeding version string for this job, or null if the job has no previous seeding version string.jobMode- is an integer describing how the job is being run, whether continuous or once-only.- Returns:
- an updated seeding version string, to be stored with the job.
- Throws:
ManifoldCFExceptionServiceInterruption
-
processDocuments
public void processDocuments(java.lang.String[] documentIdentifiers, IExistingVersions statuses, Specification spec, IProcessActivity activities, int jobMode, boolean usesDefaultAuthority) throws ManifoldCFException, ServiceInterruptionProcess a set of documents. This is the method that should cause each document to be fetched, processed, and the results either added to the queue of documents for the current job, and/or entered into the incremental ingestion manager. The document specification allows this class to filter what is done based on the job. The connector will be connected before this method can be called.- Specified by:
processDocumentsin interfaceIRepositoryConnector- Parameters:
documentIdentifiers- is the set of document identifiers to process.statuses- are the currently-stored document versions for each document in the set of document identifiers passed in above.activities- is the interface this method should use to queue up new document references and ingest documents.jobMode- is an integer describing how the job is being run, whether continuous or once-only.usesDefaultAuthority- will be true only if the authority in use for these documents is the default one.- Throws:
ManifoldCFExceptionServiceInterruption
-
getMaxDocumentRequest
public int getMaxDocumentRequest()
Get the maximum number of documents to amalgamate together into one batch, for this connector.- Specified by:
getMaxDocumentRequestin interfaceIRepositoryConnector- Returns:
- the maximum number. 0 indicates "unlimited".
-
getFormCheckJavascriptMethodName
public java.lang.String getFormCheckJavascriptMethodName(int connectionSequenceNumber)
Obtain the name of the form check javascript method to call.- Specified by:
getFormCheckJavascriptMethodNamein interfaceIRepositoryConnector- Parameters:
connectionSequenceNumber- is the unique number of this connection within the job.- Returns:
- the name of the form check javascript method.
-
getFormPresaveCheckJavascriptMethodName
public java.lang.String getFormPresaveCheckJavascriptMethodName(int connectionSequenceNumber)
Obtain the name of the form presave check javascript method to call.- Specified by:
getFormPresaveCheckJavascriptMethodNamein interfaceIRepositoryConnector- Parameters:
connectionSequenceNumber- is the unique number of this connection within the job.- Returns:
- the name of the form presave check javascript method.
-
outputSpecificationHeader
public void outputSpecificationHeader(IHTTPOutput out, java.util.Locale locale, Specification ds, int connectionSequenceNumber, java.util.List<java.lang.String> tabsArray) throws ManifoldCFException, java.io.IOException
Output the specification header section. This method is called in the head section of a job page which has selected a repository connection of the current type. Its purpose is to add the required tabs to the list, and to output any javascript methods that might be needed by the job editing HTML. The connector will be connected before this method can be called.- Specified by:
outputSpecificationHeaderin interfaceIRepositoryConnector- Parameters:
out- is the output to which any HTML should be sent.locale- is the locale the output is preferred to be in.ds- is the current document specification for this job.connectionSequenceNumber- is the unique number of this connection within the job.tabsArray- is an array of tab names. Add to this array any tab names that are specific to the connector.- Throws:
ManifoldCFExceptionjava.io.IOException
-
outputSpecificationBody
public void outputSpecificationBody(IHTTPOutput out, java.util.Locale locale, Specification ds, int connectionSequenceNumber, int actualSequenceNumber, java.lang.String tabName) throws ManifoldCFException, java.io.IOException
Output the specification body section. This method is called in the body section of a job page which has selected a repository connection of the current type. Its purpose is to present the required form elements for editing. The coder can presume that the HTML that is output from this configuration will be within appropriate <html>, <body>, and <form> tags. The name of the form is always "editjob". The connector will be connected before this method can be called.- Specified by:
outputSpecificationBodyin interfaceIRepositoryConnector- Parameters:
out- is the output to which any HTML should be sent.locale- is the locale the output is preferred to be in.ds- is the current document specification for this job.connectionSequenceNumber- is the unique number of this connection within the job.actualSequenceNumber- is the connection within the job that has currently been selected.tabName- is the current tab name. (actualSequenceNumber, tabName) form a unique tuple within the job.- Throws:
ManifoldCFExceptionjava.io.IOException
-
processSpecificationPost
public java.lang.String processSpecificationPost(IPostParameters variableContext, java.util.Locale locale, Specification ds, int connectionSequenceNumber) throws ManifoldCFException
Process a specification post. This method is called at the start of job's edit or view page, whenever there is a possibility that form data for a connection has been posted. Its purpose is to gather form information and modify the document specification accordingly. The name of the posted form is always "editjob". The connector will be connected before this method can be called.- Specified by:
processSpecificationPostin interfaceIRepositoryConnector- Parameters:
variableContext- contains the post data, including binary file-upload information.locale- is the locale the output is preferred to be in.ds- is the current document specification for this job.connectionSequenceNumber- is the unique number of this connection within the job.- Returns:
- null if all is well, or a string error message if there is an error that should prevent saving of the job (and cause a redirection to an error page).
- Throws:
ManifoldCFException
-
viewSpecification
public void viewSpecification(IHTTPOutput out, java.util.Locale locale, Specification ds, int connectionSequenceNumber) throws ManifoldCFException, java.io.IOException
View specification. This method is called in the body section of a job's view page. Its purpose is to present the document specification information to the user. The coder can presume that the HTML that is output from this configuration will be within appropriate <html> and <body>tags. The connector will be connected before this method can be called.- Specified by:
viewSpecificationin interfaceIRepositoryConnector- Parameters:
out- is the output to which any HTML should be sent.locale- is the locale the output is preferred to be in.ds- is the current document specification for this job.connectionSequenceNumber- is the unique number of this connection within the job.- Throws:
ManifoldCFExceptionjava.io.IOException
-
-