Class BaseRepositoryConnector

  • All Implemented Interfaces:
    IConnector, IRepositoryConnector

    public abstract class BaseRepositoryConnector
    extends BaseConnector
    implements IRepositoryConnector
    This base class describes an instance of a connection between a repository and ManifoldCF's standard "pull" ingestion agent. Each instance of this interface is used in only one thread at a time. Connection Pooling on these kinds of objects is performed by the factory which instantiates repository connectors from symbolic names and config parameters, and is pooled by these parameters. That is, a pooled connector handle is used only if all the connection parameters for the handle match. Implementers of this interface should provide a default constructor which has this signature: xxx(); Connectors are either configured or not. If configured, they will persist in a pool, and be reused multiple times. Certain methods of a connector may be called before the connector is configured. This includes basically all methods that permit inspection of the connector's capabilities. The complete list is: The purpose of the repository connector is to allow documents to be fetched from the repository. Each repository connector describes a set of documents that are known only to that connector. It therefore establishes a space of document identifiers. Each connector will only ever be asked to deal with identifiers that have in some way originated from the connector. Documents are fetched using processDocuments(), which then gets to decide how to dispose of the document using the methods available by means of the provided IProcessActivity object.
    • Constructor Detail

      • BaseRepositoryConnector

        public BaseRepositoryConnector()
    • Method Detail

      • getConnectorModel

        public int getConnectorModel()
        Tell the world what model this connector uses for getDocumentIdentifiers(). This must return a model value as specified above.
        Specified by:
        getConnectorModel in interface IRepositoryConnector
        Returns:
        the model type value.
      • getActivitiesList

        public java.lang.String[] getActivitiesList()
        Return the list of activities that this connector supports (i.e. writes into the log).
        Specified by:
        getActivitiesList in interface IRepositoryConnector
        Returns:
        the list.
      • getRelationshipTypes

        public java.lang.String[] getRelationshipTypes()
        Return the list of relationship types that this connector recognizes.
        Specified by:
        getRelationshipTypes in interface IRepositoryConnector
        Returns:
        the list.
      • getBinNames

        public java.lang.String[] getBinNames​(java.lang.String documentIdentifier)
        Get the bin name strings for a document identifier. The bin name describes the queue to which the document will be assigned for throttling purposes. Throttling controls the rate at which items in a given queue are fetched; it does not say anything about the overall fetch rate, which may operate on multiple queues or bins. For example, if you implement a web crawler, a good choice of bin name would be the server name, since that is likely to correspond to a real resource that will need real throttle protection.
        Specified by:
        getBinNames in interface IRepositoryConnector
        Parameters:
        documentIdentifier - is the document identifier.
        Returns:
        the set of bin names. If an empty array is returned, it is equivalent to there being no request rate throttling available for this identifier.
      • requestInfo

        public boolean requestInfo​(Configuration output,
                                   java.lang.String command)
                            throws ManifoldCFException
        Request arbitrary connector information. This method is called directly from the API in order to allow API users to perform any one of several connector-specific queries.
        Specified by:
        requestInfo in interface IRepositoryConnector
        Parameters:
        output - is the response object, to be filled in by this method.
        command - is the command, which is taken directly from the API request.
        Returns:
        true if the resource is found, false if not. In either case, output may be filled in.
        Throws:
        ManifoldCFException
      • addSeedDocuments

        public java.lang.String addSeedDocuments​(ISeedingActivity activities,
                                                 Specification spec,
                                                 java.lang.String lastSeedVersion,
                                                 long seedTime,
                                                 int jobMode)
                                          throws ManifoldCFException,
                                                 ServiceInterruption
        Queue "seed" documents. Seed documents are the starting places for crawling activity. Documents are seeded when this method calls appropriate methods in the passed in ISeedingActivity object. This method can choose to find repository changes that happen only during the specified time interval. The seeds recorded by this method will be viewed by the framework based on what the getConnectorModel() method returns. It is not a big problem if the connector chooses to create more seeds than are strictly necessary; it is merely a question of overall work required. The end time and seeding version string passed to this method may be interpreted for greatest efficiency. For continuous crawling jobs, this method will be called once, when the job starts, and at various periodic intervals as the job executes. When a job's specification is changed, the framework automatically resets the seeding version string to null. The seeding version string may also be set to null on each job run, depending on the connector model returned by getConnectorModel(). Note that it is always ok to send MORE documents rather than less to this method. The connector will be connected before this method can be called.
        Specified by:
        addSeedDocuments in interface IRepositoryConnector
        Parameters:
        activities - is the interface this method should use to perform whatever framework actions are desired.
        spec - is a document specification (that comes from the job).
        seedTime - is the end of the time range of documents to consider, exclusive.
        lastSeedVersion - is the last seeding version string for this job, or null if the job has no previous seeding version string.
        jobMode - is an integer describing how the job is being run, whether continuous or once-only.
        Returns:
        an updated seeding version string, to be stored with the job.
        Throws:
        ManifoldCFException
        ServiceInterruption
      • processDocuments

        public void processDocuments​(java.lang.String[] documentIdentifiers,
                                     IExistingVersions statuses,
                                     Specification spec,
                                     IProcessActivity activities,
                                     int jobMode,
                                     boolean usesDefaultAuthority)
                              throws ManifoldCFException,
                                     ServiceInterruption
        Process a set of documents. This is the method that should cause each document to be fetched, processed, and the results either added to the queue of documents for the current job, and/or entered into the incremental ingestion manager. The document specification allows this class to filter what is done based on the job. The connector will be connected before this method can be called.
        Specified by:
        processDocuments in interface IRepositoryConnector
        Parameters:
        documentIdentifiers - is the set of document identifiers to process.
        statuses - are the currently-stored document versions for each document in the set of document identifiers passed in above.
        activities - is the interface this method should use to queue up new document references and ingest documents.
        jobMode - is an integer describing how the job is being run, whether continuous or once-only.
        usesDefaultAuthority - will be true only if the authority in use for these documents is the default one.
        Throws:
        ManifoldCFException
        ServiceInterruption
      • getMaxDocumentRequest

        public int getMaxDocumentRequest()
        Get the maximum number of documents to amalgamate together into one batch, for this connector.
        Specified by:
        getMaxDocumentRequest in interface IRepositoryConnector
        Returns:
        the maximum number. 0 indicates "unlimited".
      • getFormCheckJavascriptMethodName

        public java.lang.String getFormCheckJavascriptMethodName​(int connectionSequenceNumber)
        Obtain the name of the form check javascript method to call.
        Specified by:
        getFormCheckJavascriptMethodName in interface IRepositoryConnector
        Parameters:
        connectionSequenceNumber - is the unique number of this connection within the job.
        Returns:
        the name of the form check javascript method.
      • getFormPresaveCheckJavascriptMethodName

        public java.lang.String getFormPresaveCheckJavascriptMethodName​(int connectionSequenceNumber)
        Obtain the name of the form presave check javascript method to call.
        Specified by:
        getFormPresaveCheckJavascriptMethodName in interface IRepositoryConnector
        Parameters:
        connectionSequenceNumber - is the unique number of this connection within the job.
        Returns:
        the name of the form presave check javascript method.
      • outputSpecificationHeader

        public void outputSpecificationHeader​(IHTTPOutput out,
                                              java.util.Locale locale,
                                              Specification ds,
                                              int connectionSequenceNumber,
                                              java.util.List<java.lang.String> tabsArray)
                                       throws ManifoldCFException,
                                              java.io.IOException
        Output the specification header section. This method is called in the head section of a job page which has selected a repository connection of the current type. Its purpose is to add the required tabs to the list, and to output any javascript methods that might be needed by the job editing HTML. The connector will be connected before this method can be called.
        Specified by:
        outputSpecificationHeader in interface IRepositoryConnector
        Parameters:
        out - is the output to which any HTML should be sent.
        locale - is the locale the output is preferred to be in.
        ds - is the current document specification for this job.
        connectionSequenceNumber - is the unique number of this connection within the job.
        tabsArray - is an array of tab names. Add to this array any tab names that are specific to the connector.
        Throws:
        ManifoldCFException
        java.io.IOException
      • outputSpecificationBody

        public void outputSpecificationBody​(IHTTPOutput out,
                                            java.util.Locale locale,
                                            Specification ds,
                                            int connectionSequenceNumber,
                                            int actualSequenceNumber,
                                            java.lang.String tabName)
                                     throws ManifoldCFException,
                                            java.io.IOException
        Output the specification body section. This method is called in the body section of a job page which has selected a repository connection of the current type. Its purpose is to present the required form elements for editing. The coder can presume that the HTML that is output from this configuration will be within appropriate <html>, <body>, and <form> tags. The name of the form is always "editjob". The connector will be connected before this method can be called.
        Specified by:
        outputSpecificationBody in interface IRepositoryConnector
        Parameters:
        out - is the output to which any HTML should be sent.
        locale - is the locale the output is preferred to be in.
        ds - is the current document specification for this job.
        connectionSequenceNumber - is the unique number of this connection within the job.
        actualSequenceNumber - is the connection within the job that has currently been selected.
        tabName - is the current tab name. (actualSequenceNumber, tabName) form a unique tuple within the job.
        Throws:
        ManifoldCFException
        java.io.IOException
      • processSpecificationPost

        public java.lang.String processSpecificationPost​(IPostParameters variableContext,
                                                         java.util.Locale locale,
                                                         Specification ds,
                                                         int connectionSequenceNumber)
                                                  throws ManifoldCFException
        Process a specification post. This method is called at the start of job's edit or view page, whenever there is a possibility that form data for a connection has been posted. Its purpose is to gather form information and modify the document specification accordingly. The name of the posted form is always "editjob". The connector will be connected before this method can be called.
        Specified by:
        processSpecificationPost in interface IRepositoryConnector
        Parameters:
        variableContext - contains the post data, including binary file-upload information.
        locale - is the locale the output is preferred to be in.
        ds - is the current document specification for this job.
        connectionSequenceNumber - is the unique number of this connection within the job.
        Returns:
        null if all is well, or a string error message if there is an error that should prevent saving of the job (and cause a redirection to an error page).
        Throws:
        ManifoldCFException
      • viewSpecification

        public void viewSpecification​(IHTTPOutput out,
                                      java.util.Locale locale,
                                      Specification ds,
                                      int connectionSequenceNumber)
                               throws ManifoldCFException,
                                      java.io.IOException
        View specification. This method is called in the body section of a job's view page. Its purpose is to present the document specification information to the user. The coder can presume that the HTML that is output from this configuration will be within appropriate <html> and <body>tags. The connector will be connected before this method can be called.
        Specified by:
        viewSpecification in interface IRepositoryConnector
        Parameters:
        out - is the output to which any HTML should be sent.
        locale - is the locale the output is preferred to be in.
        ds - is the current document specification for this job.
        connectionSequenceNumber - is the unique number of this connection within the job.
        Throws:
        ManifoldCFException
        java.io.IOException