DataONE API - 2.0

Use Case 01 - Get Object Identified by PID

Goal

Retrieve an object identified by PID.

Summary

A client has an identifier for some object within the DataONE system and is attempting to retrieve the referenced object from a node (Member Node or Coordinating Node). If the object exists on the node and the user has READ permission on the object, then the bytes of that object are returned, otherwise an error condition occurs.

This low level operation assumes that the client knows that the desired object is available on the target node. The normal process for retrieving an object given only the identifier is to first resolve the object, then retrieve the object from one of the identified nodes. Resolution is described in UC36.

Actors

  • Client requesting object
  • Coordinating Node
  • Member Node

Preconditions

  • Client has authenticated to the desired level (e.g. client may not have authenticated, so access might be anonymous).

Triggers

  • An object is requested from the DataONE system.

Post Conditions

  • The client has a copy of the object bytes (or an error message in the case of failure)
  • The node event log is updated with the results of the operation
  • Watchers are notified of the event.

@startuml images/01_uc.png
actor "User" as client
usecase "12. Authentication" as authen
note top of authen
Authentication may be provided
by an external service
end note

  actor "Coordinating Node" as CN
  actor "Member Node" as MN
  usecase "13. Authorization" as author
  usecase "01. Get Object" as GET
  usecase "16. Log event" as log
  usecase "21. Notify subscribers" as subscribe
  usecase "36. Resolve object" as resolve
  client -- GET
  CN -- GET
  MN -- GET
  GET ..> author: <<includes>>\n[optional]
  GET ..> authen: <<includes>>\n[optional]
  GET ..> log: <<includes>>
  GET ..> subscribe: <<includes>>
  GET ..> resolve: <<includes>>\n[optional]

@enduml

Figure 1. Use case 01 diagram showing actors and components involved in this action.

@startuml images/01_seq.png
participant "Client" as app_client << Application >>
participant "Read API" as n_crud << Node >>
participant "Authorization API" as n_authorize << Node >>
participant "Object Store" as n_ostore << Node >>
app_client -> n_crud: get(session, PID)
n_crud -> n_authorize: isAuthorized(session, PID, READ)
n_crud <- n_authorize: True, False, Err.NotFound
alt NotFound
  app_client <- n_crud: Err.NotFound
else False
  app_client <- n_crud: Err.NotAuthorized
else True
  n_crud -> n_ostore: read(PID)
  n_crud <- n_ostore: bytes
  n_crud --> n_crud: log(PID, READ)
  app_client <- n_crud: bytes
end
@enduml

Figure 2. Sequence diagram for Use Case 01 illustrating the sequence for retrieving an object identified by a PID from the DataONE system. No distinction is made between Member Node and Coordinating Node implementation as they are identical at this level of detail.

Notes

  1. For the GET operation, should isAuth() be performed only by CNs? Relying on the MN system metadata requires trusted implementation of the MN system and consistency of system metadata across all MNs (which will be the case, though with uncertain latency). Requiring all isAuth() operations to be performed by CNs will increase trust in the operation (assuming the operation is not spoofed by a MN) though will increase load on CNs. This should be specified in the Authorization use case.
  2. Data sent to watchers might include: timestamp, object identifier, user id, IP of client.