Authorization in DataONE ======================== This document outlines the mechanism for specifying authorization policies for objects and service in DataONE and a set of services for controlling access to those objects on Member Nodes and Coordinating Nodes. Overview -------- Users and services authenticate in DataONE to confirm their identity. The identity is then used for controlling access to objects, systems, and services within the DataONE framework. Requirements for Authorization are listed here: .. toctree:: :maxdepth: 1 AuthnAndAuthzRequirements Privacy and access control in DataONE are primarily for the protection and integrity of user contributed data and metadata via Member Nodes. There are, however, other entities in DataONE that also need protection, including DataONE specific services and system resources, like system metadata and components of the general software stack (e.g., databases, web servers) for Coordinating and Member Nodes. For this reason, all resources in DataONE, from data and metadata objects to system services, have an access policy (:class:`Types.AccessPolicy`), made up of one or more *access control rules* (:class:`Types.AccessRule`), that is used to determine who may access the resource. The process of confirming whether a user has privileges to access a resource in DataONE is called *authorization*. The act of authorization uses attribute information contained in the security token obtained by the user when authenticating with their identity provider, and compares such information to the resource access control rule. If the rule permits access by the :term:`principal` requesting the resource, then authorization succeeds and permission is granted to access the resource. The algorithm used to evaluate authorization for a resource is described in the section *Object Access Control* below. Because nodes that form the DataONE federation are managed by various administrative domains and may cross multiple political boundaries, "trust" relationships are crucial for DataONE to succeed in its security plan. In simple terms, this means that access control rules that are defined by one member of the federation are upheld be another member. It also means that trust may be revoked if a particular member does not behave accordingly within the federation. Access control rules may be dynamic and must be propagated with the resource they are designated to protect, such as when data or metadata objects are replicated to another Member Node. The language that specifies the policy for a given access control rule dictates only whether a user is allowed access to a given resource; to include the ability to explicitly deny access to a resource overly complicates management of the authorization process and is seldom used in practice. Access rules (:class:`Types.AccessRule`) consist of the system identity of the user, also known as the :term:`Subject`, the type of permission granted (e.g., *read*, *write*, or *changePermission*), and the :term:`identifier` of the resource being requested. An access policy is an optional element of the :term:`System Metadata` associated with an object. The default access policy is to deny access to the object to all users except the *subject* identified as the :attr:`Types.SystemMetadata.rightsHolder` in the System Metadata. DataONE will provide, where reasonable, a conversion of the internal access control rule to a subset of one or more industry standard policy languages to support interoperability between different organizations. Trust Relationships ------------------- Any authorization system in a federation requires trust among participants. For DataONE, there are five types of trust relationships among nodes in the federation: 1. **MN to CN**: Member Nodes need to have trust that Coordinating Nodes will respect and enforce their authorization policies, including any restrictions placed on where and when to create replicas of objects, and on the presentation of search results for restricted content. 2. **CN to MN**: Coordinating Nodes rely upon Member Nodes for limited services, and mainly expect Member Nodes to accurately implement the DataONE Service API, including replication services. 3. **CN to CN**: Each Coordinating Node contains a replica of the content of the others, and are configured to provide seamless failover and load-balancing for all incoming requests across the three nodes. Consequently, the Coordinating Nodes inherently trust one another fully. As the suite of Coordinating Node instances expands to other continents, this relationship may need to be re-examined. 4. **MN to MN**: Member Node to Member Node trust relies on one Member Node believing that another Member Node will respect the authorization policies that they publish for their objects and services. In the case of restricted access content, Member Nodes that house replicas of an object would need to faithfully enforce authorization policies that were expressed by the data owner. Because of this, Member Nodes can express replication policies for objects that indicate which other Member Nodes are acceptable targets for replication, and for which nodes they are willing to serve as replica stores. 5. **User to DataONE**: Users trust that the DataONE system, that is, the combination of Member and Coordinating Nodes interacting to provide the DataONE infrastructure and services, implements access control rules consistently and in compliance with the specifications provided when content was added to the system or subsequently modified. This implies minimal latency in propagation of rules between components of the system. Verification of proper technical implementation of these trust relationships is achieved through integration testing of the various components. This involves exercising a wide array of combinations of users, groups, and access control rules to ensure expected behavior as content moves around the DataONE infrastructure. The DataNet projects have a loosely defined requirement of interoperability between their respective implementations. This also implies that content and services *may* be shared between projects, and thus there will likely be additional trust relationships that need to be taken into consideration as the DataNet projects progress towards interoperability. Object Access Control --------------------- Access control for content managed by DataONE (:term:`Data` objects, :term:`Science Metadata` objects, and :term:`Resource Maps`) is determined by the :class:`Types.AccessPolicy` entry in the :class:`Types.SystemMetadata` associated with the object. In addition, the :term:`rightsHolder` of the System Metadata holds all permissions on the object, and the :term:`Authoritative Member Node` has equivalent privileges as the *rightsHolder*. The *Authoritative Member Node* is identified by one or more :term:`Subjects` listed in the Member Node :class:`Types.Node` record registered in the DataONE :term:`node registry`. Thus, the :class:`Types.NodeReference` entry recorded in the System Metadata *Authoritative Member Node* references the *Node* entry in the node registry, which in turn contains a list of *Subjects* that, when used in a request to access or manipulate an object, identify the user as the *Authoritative Member Node*. Permissions that can be associated with an object include: :Read: The ability to view the content of this object. :Write: The ability to change the content of this object via update services. Permissions are hierarchical, so *write* permission also includes *read* permission. :ChangePermission: The ability to change the authorization policies for this object. Includes both *read* and *write* permissions. Conceptually, an :class:`Types.AccessRule` is a tuple with three components: an *identifier* which indicates which object the rule applies to; a *subject* which indicates who the rule applies to; and a *permission* which indicates the level of access described by the rule. In practice, the *access rule* is contained in the System Metadata, and so each access rule contains a permission and list of subjects. A set of *access rules* are contained in the :class:`Types.AccessPolicy`, and these together with the *rights holder* and *authoritative member node* determine which subjects may perform operations on an object. Evaluation of a permission for an object is determined in a manner thus, where SUBJECT is the *subject* making the request, and PERMISSION is the *permission* being evaluated:: Is SUBJECT == rightsHolder? Yes -> return True Is SUBJECT IN authoritiveMemberNode.Subject? Yes -> return True for each accessRule in accessPolicy if PERMISSION is IN accessRule.Permission Is SUBJECT IN accessRule.Subject? Yes -> return True return False DataONE supports *equivalent identities*, where a single principal may have multiple subjects associated with them. As such, the ``SUBJECT`` in algorithm described above is actually a list of 1 or more subjects. The list of subjects to be used for comparison is determined from the *Session* parameter of an API call as follows:: SUBJECTS = [Session.subject, ] Adjusting Object Access Control ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Adjustments to access control for objects is made by altering the *accessPolicy* of the :class:`Types.SystemMetadata` for the object. The process is to retrieve a current copy of the system metadata from a Coordinating Node using the :func:`CNRead.getSystemMetadata` method, edit the :class:`Types.AccessPolicy` entry as necessary, then send the updated *AccessPolicy* structure to a Coordinating Node using the :func:`CNAuthorization.setAccessPolicy` method. Changes to *accessPolicy* are then propagated to other Coordinating Nodes through the Coordinating Node replication process (and hence to the search index), then to the Member Nodes that hold a copy of the object. Member Nodes are informed of a change to *accessPolicy* through the :func:`MNStorage.systemMetadataChanged` method which is called by a Coordinating Node. Member Nodes are expected to update the *accessPolicy* for an object as soon as possible after being informed of an update. Log Record Access Control ------------------------- Access to log records is evaluated in the same manner as access to objects. If the requesting *subject* does not have *read* permission for the *identifier* recorded in the log record, then they will be denied access to the log entry. Adjustments to access control for log records are made indirectly by adjusting access control for the referenced object(s). EDIT: Log records are now completely restricted to administrative users so as not to expose raw usage patterns for any/all public objects. Service Access Control ---------------------- DataONE services are accessed through HTTPS connections. Restrictions on agents (i.e. clients) that may call the services may be imposed through network configuration (e.g. restricting IP addresses that may call the service) or preferably through the *restriction* property of the :class:`Types.Service` entry in the *services* property of the :class:`Types.Node` entry describing the registered Member or Coordinating Node. The optional *restriction* property of the *service* lists subjects that have permission to invoke the service. If a *restriction* is not included with the service description, then any agent may call that service endpoint. NOTE: It is at the discretion of individual Node implementations as to whether these defined service restrictions will be enforced for the method in question. The service method restriction is meant only as a mechanism for node operators to record/manage restrictions to be enforced in a transparent manner. Adjusting Service Access Control ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Adjustments to access control for services, or more accurately, the methods exposed within a service, are made by altering the contents of the *restriction* property of the :class:`Types.Service` entry for the :class:`Types.Node` registration document for the node. These adjustments are made through the :func:`CNRegistration.updateNodeCapabilities` method by specifying a replacement node document. A current version of the node document should be retrieved from the Coordinating Node through the :func:`CNCore.listNodes` method. Changes to node registration information can only be performed by subjects listed in the *subject* property of the :class:`Types.Node` document for the node. Additional Authorization Constraints ------------------------------------ :TODO: Need to update this section to cover the additional constraints beyond subject authorization that will limit movement of content between components Some nodes may also want to conditionally provide access to some services based on a principal's current usage of a resource such as node storage or node bandwidth. * Create/Update constraints * MaximumStorageQuota * MaximumNetworkTransferQuota * Embargoes * Add ability to specify an embargo period during which the access policies would not be in effect, and rather resources are only privately accessible .. Note:: Add constraints and embargoes to the AccessPolicy language described below Access Policy Language ----------------------- :TODO: This section needs to be updated with the latest revisions to the AccessPolicy section. Also need to update / regenerate the example of access policy. Several existing authorization policy languages were evaluated for use in the DataONE architecture. Given the simplicity of authorization rules that DataONE needs to express, these specifications were deemed overly complex and would impose too signification of a cost on Member Node implementations. .. toctree:: :maxdepth: 2 Authorization-technologies .. Note:: Survey for additional policy languages to evaluate before deciding on a custom specification for DataONE. DataONE has designed a simple access policy language that can be embedded in several contexts and can be used to express access rules. The definitions of the elements in this AccessPolicy language are: .. attribute:: accessPolicy A set of rules that specifies as a whole the allowable permissions that a given user or system has for accessing a resource, including both data and metadata resources and service resources. An access policy consists of a sequence of allow rules that grant permissions to principals, which can be individual users, groups of users, symbolic users, or systems and services. :Cardinality: 1..1 :ValueSpace: :class:`Types.AccessPolicy` :Generated By: Client .. attribute:: allow A rule that is used to allow a principal to perform an action (such as read or write) on an object in DataONE. Rules are three-tuples (principal, permission, resource) specifying which permissions are allowed for the principal(s) for the resource(s) listed. Access control rules are specified by the OriginMemberNode when the object is first registered in DataONE. If no rules are specified at that time, then the object is deemed to be private and the only user with access to the object (read, write, or otherwise) is the RightsHolder. :Cardinality: 0..* :ValueSpace: :class:`Types.AccessRule` :Generated By: Client .. Note:: The 'deny' directive has been removed for simplicity, and because a survey of existing member nodes indicates it is not being used by the community. .. attribute:: principal The unique identifier representing a principal that is allowed or denied access to a resource. Principal identifiers are strings that are found transported in the subject field of an identifying certificate produced from the authentication system. Users, groups, systems, and services can all be represented as principals. :Cardinality: 1..* :ValueSpace: :class:`Types.Principal` :Generated By: Client .. attribute:: permission A string value indicating the set of actions that can be performed on a resource as specified in an access policy. The set of permissions include the ability to read a resource, modify a resource (write), and to change the set of access control policies for a resource (changePermission). In addition, there is a permission that controls ability to execute a service (execute). :Cardinality: 1..* :ValueSpace: :class:`Types.Permission` :Generated By: Client .. attribute:: resource The unique identifier (pid) for a resource in the system to which the access rules in this access policy apply. :Cardinality: 1..* :ValueSpace: :class:`Types.Identifier` :Generated By: Client An example instance of this syntax is: .. literalinclude:: /d1_schemas/accesspolicy-example.xml :language: xml Authorization Services ---------------------- :TODO: Update this section to include the latest revisions to the methods defined for managing and working with the access control for objects. In this section, define a set of Authorization services to be implemented at CN and MN. The current Authorization Service is defined as a standalone service. .. TODO:: Link these methods to the generated methods in the API specifications, eliminate redundancy of the description text between the two locations. isAuthorized(token, pid, action):: boolean Determine if the user authenticated by the token can take the action specified (read, write, changePermission, execute) on the resource named by the identifier pid. setAccess(token, Types.AccessPolicy):: void Set the access policy for a series of resources as specified by the provided AccessPolicy document. The user identified by the authentication token must have changePermission permission on all resources named in the AccessPolicy. If so, then the policies for those resources will be replaced (or created as needed) by the policies specified in AccessPolicy. If the user does not have sufficient permission, then the NotAuthorized exception must be thrown, and none of the policies should be applied (it is not sufficient to have appropriate permissions on just one resource -- if permission is not present for all listed resources, then implementations must roll back any changes and return NotAuthorized. Interaction diagrams -------------------- :TODO: Need to update authorization use cases and include references to them. .. Implementation phases --------------------- During the first DataONE Federated Security workshop, four phases for development were identified that involve increasingly sophisticated authorization and access control mechanisms. The four phases are: - **Phase 1: Mostly public access (target date: January 2011)**: Only publicly readable content is replicated. Only publicly readable content is indexed for search and retrieval. Access to restricted content is through origin member node only. No authentication is required to search and retrieve public content. Authentication is required to upload (create) content. - **Phase 2: Access control supported for search and retrieval**: ACLs respected by coordinating nodes. Authenticated users can discover content that is restricted to them or their groups. Restricted access content is not replicated. - **Phase 3: Access control supported for content replication**: Restricted access content is replicated to member nodes with compatible ACLs and pre-arranged trust agreements. - **Phase 4: Consistent semantic and functional interoperability for identity and security**: Restricted access content is replicated to any member node. Authentication by long-running workflows is supported. Phase 1 ~~~~~~~ .. @startuml images/authorization_seq.png actor User participant MN1 participant MN2 participant CN User -> CN: login(D1.username, password) activate CN CN --> MN1: token deactivate CN User -> MN1: create(token, pid, object, sysmeta) activate MN1 MN1 -> MN1: verify(token) MN1 -> MN1: isAuthorized(token, pid, OP_CREATE) MN1 --> User: pid deactivate MN1 @enduml .. image:: images/authorization_seq.png *Figure 1.* Only public objects are searchable and replicated in the system. Create, Read, Update, and Delete operations are controlled by member nodes for private objects, but read for public resources can be handled by any replicating member node, or a coordinating node in the case of metadata resources. .. figure:: images/anaz_phase1.png *Figure 2.* Trust relationships between components during phase 1 of Authz/Authn. Triangle = CN, Rectangle = MN, open circle = public data, filled circle = private data, dashed line = untrusted connection. A Coordinating Node retrieves only public content from a Member Node (A), and only publicly readable content is available to users through the Coordinating Nodes (B) and Member Nodes (C). A Coordinating Node must have a trusted relationship with Member Nodes to request replication operations (E) even though the content being replicated is publicly readable and does not require a trusted connection (D). Phase 2 ~~~~~~~ .. figure:: images/anaz_phase2.png *Figure 3.* Trust relationships between components during phase 2 of Authn/Authz. Triangle = CN, Rectangle = MN, open circle = public data, filled circle = private data, dashed line = untrusted connection, solid line = trusted connection, user with hat = authenticated user. Coordinating Nodes synchronize public and private content (A). Authenticated users can retrieve private data from the origin Member Node (B) and can discover and retrieve metadata from the Coordinating Nodes (C). Public content is replicated between Member Nodes (D) under the direction of a trusted connection from the Coordinating Nodes (E). Phase 3 ~~~~~~~ .. figure:: images/anaz_phase3.png *Figure 4.* Trust relationships between components during phase 3 of Authn/Authz. Triangle = CN, Rectangle = MN, open circle = public data, filled circle = private data, dashed line = untrusted connection, solid line = trusted connection, user with hat = authenticated user. Member Nodes of compatible technology (D) have a trust relationship that enables transfer of protected content from one member node to another (A). An authenticated user has the same access to private content replicated to other Member Nodes (B). Member Nodes with incompatible technology (i.e. unable to create a trusted relationship) are only able to replicate public content (C). Coordinating Nodes must have trusted relationships to all Member Nodes (E) to direct replication. Phase 4 ~~~~~~~ TBD Issues ------ - Located At CN or MN? * At CN requires global knowledge of ACLs * At CN requires a lot of network traffic for authorization on objects * At MN makes authorization of search results impossible * Compromise: Federated, each authoritative MN for an object keeps its ACL list, which gets synchronized to the CN at sync time * Assume most object write is at MN level, so best to not have to go to CN * Assume MN will want to control their own write access * Requires MN Authorization services - Efficiency of search results authorization * Need to authorize large number of search results in each operation * Has implications for search results cacheing