Log Aggregation Overview ======================== .. See subversion log for revision information. Introduction ------------ Logs will need to be aggregated from all Member Nodes and stored at each Coordinating Node. Each Coordinating Node must replicas its logging data to the other Coordinating Nodes. The process described below relies on several technologies: Quartz Scheduler, Hazelcast Data Distribution, Metacat Repository Storage, and the Solr Search platform. DataONE use cases ----------------- .. toctree:: :maxdepth: 1 UseCases/16_uc UseCases/17_uc Log Aggregation Scheduling -------------------------- Log Aggregation will performed once a day per Member Node. There is no set time as to when harvesting of log records will be performed on a daily basis. At the time it is performed, it will harvest all the records from the last run until the beginning of the current day (00:00:00). Therefore, the aggregate logs on the CN may be only as recent as 48 hrs in the past, depending on when a query is executed and when the last harvesting was run. In the Log Scheduler class, Quartz will be used as the mechanism to schedule log harvesting. Log Aggregation Distributed Execution ------------------------------------- The Log Aggregator will run as a distributed executable class for each Member node, as well as locally for the Metacat instance that acts as the storage mechanism for the Coordinating Node. The distribution of the execution will be handled by Hazelcast. There is a maximum number of executions of the Log Aggregator that may be running at any given time (# of CNs * max per CN). The Log Aggregator task will run a getLogRecords query against a Member Node. It will loop through the :class:`Types.Log` results, calling Log Publisher for each :class:`Types.LogEntry`, and place on a distributed Hazelcast topic a :class:`Types.LogEntry` to be broadcast to all members of the Hazelcast cluster (including itself). Log Aggregation Indexing ------------------------ Each Coordinating Node will have a Log Aggregator will be running a Log Subscriber that listens for events on the distributed Hazelcast topic. For each :class:`Types.LogEntry` message received, it will call Log Indexer. Log Indexer will take the :class:`Types.LogEntry` and create a Solr Document finally submitting the document to Solr for addition to the Lucene index. .. figure:: images/logAggregator_seq.png *Figure 1.* Sequence diagram illustrating the sequence of the Log Aggregator process. .. @startuml images/logAggregator_seq.png participant "Log Scheduler" as CnSchd << CN >> participant "Core API" as CnCore << CN >> participant "Log Aggregator" as CnAgg << CN >> participant "Core API" as MnCore << MN >> participant "Log Publisher" as CnPub << CN >> participant "Log Subscriber" as CnSub << CN >> participant "Log Indexer" as CnInd << CN >> CnSchd -> CnSchd : init() activate CnSchd CnSchd -> CnCore : CNCore.listNodes() activate CnCore CnCore -> CnSchd: NodeList deactivate CnCore CnSchd -> CnSchd : Schedule all MNs in quartz CnSchd -> CnAgg : Trigger Aggregation of MN activate CnAgg deactivate CnSchd CnAgg->MnCore : getLogRecords(session, fromDate, toDate) activate MnCore MnCore->CnAgg : Log deactivate MnCore CnAgg->CnPub : scheduleEvent(LogEvent) activate CnPub CnPub->CnPub : topic.publish(LogEvent) deactivate CnPub deactivate CnAgg CnSub -> CnSub : onMessage(LogEvent) activate CnSub CnSub -> CnInd : indexItem(LogEvent) activate CnInd CnInd -> CnInd : create LogEvent in Solr deactivate CnSub deactivate CnInd @enduml