Authorization in DataONE
========================

This document outlines the mechanism for specifying authorization policies for
objects and service in DataONE and a set of services for controlling access to
those objects on Member Nodes and Coordinating Nodes.

Overview
--------

Users and services authenticate in DataONE to confirm their identity. The
identity is then used for controlling access to objects, systems, and services
within the DataONE framework. Requirements for Authorization are listed here:

.. toctree::
   :maxdepth: 1

   AuthnAndAuthzRequirements

Privacy and access control in DataONE are primarily for the protection and
integrity of user contributed data and metadata via Member Nodes. There are,
however, other entities in DataONE that also need protection, including DataONE
specific services and system resources, like system metadata and components of
the general software stack (e.g., databases, web servers) for Coordinating and
Member Nodes. For this reason, all resources in DataONE, from data and metadata
objects to system services, have an access policy (:class:`Types.AccessPolicy`),
made up of one or more *access control rules* (:class:`Types.AccessRule`),
that is used to determine who may access the resource. The process of confirming
whether a user has privileges to access a resource in DataONE is called
*authorization*. The act of authorization uses attribute information contained
in the security token obtained by the user when authenticating with their
identity provider, and compares such information to the resource access control
rule. If the rule permits access by the :term:`principal` requesting the
resource, then authorization succeeds and permission is granted to access the
resource. The algorithm used to evaluate authorization for a resource is
described in the section *Object Access Control* below.

Because nodes that form the DataONE federation are managed by various
administrative domains and may cross multiple political boundaries, "trust"
relationships are crucial for DataONE to succeed in its security plan. In simple
terms, this means that access control rules that are defined by one member of
the federation are upheld be another member. It also means that trust may be
revoked if a particular member does not behave accordingly within the
federation. Access control rules may be dynamic and must be propagated with
the resource they are designated to protect, such as when data or metadata
objects are replicated to another Member Node.

The language that specifies the policy for a given access control rule
dictates only whether a user is allowed access to a given resource; to include
the ability to explicitly deny access to a resource overly complicates
management of the authorization process and is seldom used in practice. Access
rules (:class:`Types.AccessRule`) consist of the system identity of the user,
also known as the :term:`Subject`, the type of permission granted (e.g.,
*read*, *write*, or *changePermission*), and the :term:`identifier` of the
resource being requested. 

An access policy is an optional element of the :term:`System Metadata`
associated with an object. The default access policy is to deny access to the
object to all users except the *subject* identified as the
:attr:`Types.SystemMetadata.rightsHolder` in the System Metadata.

DataONE will provide, where reasonable, a conversion of the internal access
control rule to a subset of one or more industry standard policy languages to
support interoperability between different organizations.


Trust Relationships
-------------------

Any authorization system in a federation requires trust among participants.
For DataONE, there are five types of trust relationships among nodes in the
federation:


1. 

   **MN to CN**: Member Nodes need to have trust that Coordinating Nodes will
   respect and enforce their authorization policies, including any restrictions
   placed on where and when to create replicas of objects, and on the
   presentation of search results for restricted content.

2. 

   **CN to MN**: Coordinating Nodes rely upon Member Nodes for limited services,
   and mainly expect Member Nodes to accurately implement the DataONE Service
   API, including replication services.

3. 

   **CN to CN**: Each Coordinating Node contains a replica of the content of the
   others, and are configured to provide seamless failover and load-balancing
   for all incoming requests across the three nodes. Consequently, the
   Coordinating Nodes inherently trust one another fully. As the suite of
   Coordinating Node instances expands to other continents, this relationship
   may need to be re-examined.

4. 

   **MN to MN**: Member Node to Member Node trust relies on one Member Node
   believing that another Member Node will respect the authorization policies
   that they publish for their objects and services. In the case of restricted
   access content, Member Nodes that house replicas of an object would need to
   faithfully enforce authorization policies that were expressed by the data
   owner. Because of this, Member Nodes can express replication policies for
   objects that indicate which other Member Nodes are acceptable targets for
   replication, and for which nodes they are willing to serve as replica stores.

5. 

   **User to DataONE**: Users trust that the DataONE system, that is, the
   combination of Member and Coordinating Nodes interacting to provide the
   DataONE infrastructure and services, implements access control rules
   consistently and in compliance with the specifications provided when content
   was added to the system or subsequently modified. This implies minimal
   latency in propagation of rules between components of the system.

Verification of proper technical implementation of these trust relationships is
achieved through integration testing of the various components. This involves
exercising a wide array of combinations of users, groups, and access control
rules to ensure expected behavior as content moves around the DataONE
infrastructure.

The DataNet projects have a loosely defined requirement of interoperability
between their respective implementations. This also implies that content and
services *may* be shared between projects, and thus there will likely be
additional trust relationships that need to be taken into consideration as the
DataNet projects progress towards interoperability.


Object Access Control
---------------------

Access control for content managed by DataONE (:term:`Data` objects,
:term:`Science Metadata` objects, and :term:`Resource Maps`) is determined by
the :class:`Types.AccessPolicy` entry in the :class:`Types.SystemMetadata`
associated with the object.

In addition, the :term:`rightsHolder` of the System Metadata holds all
permissions on the object, and the :term:`Authoritative Member Node` has
equivalent privileges as the *rightsHolder*.

The *Authoritative Member Node* is identified by one or more :term:`Subjects`
listed in the Member Node :class:`Types.Node` record registered in the DataONE
:term:`node registry`. Thus, the :class:`Types.NodeReference` entry recorded
in the System Metadata *Authoritative Member Node* references the *Node* entry
in the node registry, which in turn contains a list of *Subjects* that, when
used in a request to access or manipulate an object, identify the user as the
*Authoritative Member Node*.

Permissions that can be associated with an object include:

:Read: 

  The ability to view the content of this object.

:Write: 

  The ability to change the content of this object via update services.
  Permissions are hierarchical, so *write* permission also includes *read*
  permission.

:ChangePermission: 

  The ability to change the authorization policies for this object.
  Includes both *read* and *write* permissions.

Conceptually, an :class:`Types.AccessRule` is a tuple with three components:
an *identifier* which indicates which object the rule applies to; a *subject*
which indicates who the rule applies to; and a *permission* which indicates
the level of access described by the rule.

In practice, the *access rule* is contained in the System Metadata, and so
each access rule contains a permission and list of subjects. A set of *access
rules* are contained in the :class:`Types.AccessPolicy`, and these together
with the *rights holder* and *authoritative member node* determine which
subjects may perform operations on an object.

Evaluation of a permission for an object is determined in a manner thus, where
SUBJECT is the *subject* making the request, and PERMISSION is the
*permission* being evaluated::

  Is SUBJECT == rightsHolder?  Yes -> return True
  Is SUBJECT IN authoritiveMemberNode.Subject?  Yes -> return True
  for each accessRule in accessPolicy
    if PERMISSION is IN accessRule.Permission 
      Is SUBJECT IN accessRule.Subject? Yes -> return True
  return False

DataONE supports *equivalent identities*, where a single principal may have
multiple subjects associated with them. As such, the ``SUBJECT`` in algorithm
described above is actually a list of 1 or more subjects. The list of subjects
to be used for comparison is determined from the *Session* parameter of an API
call as follows::

  SUBJECTS = [Session.subject, ]
  
  
Adjusting Object Access Control
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Adjustments to access control for objects is made by altering the
*accessPolicy* of the :class:`Types.SystemMetadata` for the object. The
process is to retrieve a current copy of the system metadata from a
Coordinating Node using the :func:`CNRead.getSystemMetadata` method, edit the
:class:`Types.AccessPolicy` entry as necessary, then send the updated
*AccessPolicy* structure to a Coordinating Node using the
:func:`CNAuthorization.setAccessPolicy` method.

Changes to *accessPolicy* are then propagated to other Coordinating Nodes
through the Coordinating Node replication process (and hence to the search
index), then to the Member Nodes that hold a copy of the object. Member Nodes
are informed of a change to *accessPolicy* through the
:func:`MNStorage.systemMetadataChanged` method which is called by a
Coordinating Node. Member Nodes are expected to update the *accessPolicy* for
an object as soon as possible after being informed of an update.


Log Record Access Control
-------------------------

Access to log records is evaluated in the same manner as access to objects. If
the requesting *subject* does not have *read* permission for the *identifier*
recorded in the log record, then they will be denied access to the log entry.

Adjustments to access control for log records are made indirectly by adjusting
access control for the referenced object(s).

EDIT: Log records are now completely restricted to administrative users so as 
not to expose raw usage patterns for any/all public objects.


Service Access Control
----------------------

DataONE services are accessed through HTTPS connections. Restrictions on
agents (i.e. clients) that may call the services may be imposed through
network configuration (e.g. restricting IP addresses that may call the
service) or preferably through the *restriction* property of the
:class:`Types.Service` entry in the *services* property of the
:class:`Types.Node` entry describing the registered Member or Coordinating
Node.

The optional *restriction* property of the *service* lists subjects that have
permission to invoke the service. If a *restriction* is not included with the
service description, then any agent may call that service endpoint.

NOTE: It is at the discretion of individual Node implementations as to whether these 
defined service restrictions will be enforced for the method in question. The service method
restriction is meant only as a mechanism for node operators to record/manage 
restrictions to be enforced in a transparent manner.


Adjusting Service Access Control
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Adjustments to access control for services, or more accurately, the methods
exposed within a service, are made by altering the contents of the
*restriction* property of the :class:`Types.Service` entry for the
:class:`Types.Node` registration document for the node. These adjustments are
made through the :func:`CNRegistration.updateNodeCapabilities` method by
specifying a replacement node document. A current version of the node document
should be retrieved from the Coordinating Node through the
:func:`CNCore.listNodes` method.

Changes to node registration information can only be performed by subjects
listed in the *subject* property of the :class:`Types.Node` document for the
node.


Additional Authorization Constraints
------------------------------------

:TODO: 
  Need to update this section to cover the additional constraints beyond
  subject authorization that will limit movement of content between components

Some nodes may also want to conditionally provide access to some services based
on a principal's current usage of a resource such as node storage or node
bandwidth.

* Create/Update constraints

   * MaximumStorageQuota
   * MaximumNetworkTransferQuota

* Embargoes

   * Add ability to specify an embargo period during which the access policies
     would not be in effect, and rather resources are only privately accessible

.. Note::
    Add constraints and embargoes to the AccessPolicy language described below


Access Policy Language
-----------------------

:TODO: 
  This section needs to be updated with the latest revisions to the
  AccessPolicy section. Also need to update / regenerate the example of access 
  policy.

 
Several existing authorization policy languages were evaluated for use in the
DataONE architecture.  Given the simplicity of authorization rules that DataONE
needs to express, these specifications were deemed overly complex and would
impose too signification of a cost on Member Node implementations.

.. toctree::
   :maxdepth: 2

   Authorization-technologies

.. Note::
    Survey for additional policy languages to evaluate before deciding on a
    custom specification for DataONE.

DataONE has designed a simple access policy language that can be embedded in
several contexts and can be used to express access rules.  The definitions of
the elements in this AccessPolicy language are:

.. attribute:: accessPolicy

  A set of rules that specifies as a whole the allowable permissions that a
  given user or system has for accessing a resource, including both data and
  metadata resources and service resources. An access policy consists of a
  sequence of allow rules that grant permissions to principals, which can be
  individual users, groups of users, symbolic users, or systems and services.

  :Cardinality:
    1..1

  :ValueSpace:

    :class:`Types.AccessPolicy`

  :Generated By: Client

.. attribute:: allow

  A rule that is used to allow a principal to perform an action (such as
  read or write) on an object in DataONE. Rules are three-tuples (principal,
  permission, resource) specifying which permissions are allowed for the 
  principal(s) for the resource(s) listed. Access control rules are specified
  by the OriginMemberNode when the object is first registered in DataONE. If no
  rules are specified at that time, then the object is deemed to be private and
  the only user with access to the object (read, write, or otherwise) is the
  RightsHolder.

  :Cardinality:
    0..*

  :ValueSpace:
    :class:`Types.AccessRule`

  :Generated By: Client

.. Note::
    The 'deny' directive has been removed for simplicity, and because a survey
    of existing member nodes indicates it is not being used by the community.

.. attribute:: principal

  The unique identifier representing a principal that is allowed or denied
  access to a resource.  Principal identifiers are strings that are found
  transported in the subject field of an identifying certificate produced from
  the authentication system.  Users, groups, systems, and services can all be
  represented as principals.

  :Cardinality:
    1..*

  :ValueSpace:

    :class:`Types.Principal`

  :Generated By: Client

.. attribute:: permission

  A string value indicating the set of actions that can be performed on a
  resource as specified in an access policy.  The set of permissions include
  the ability to read a resource, modify a resource (write), and to change the
  set of access control policies for a resource (changePermission).  In
  addition, there is a permission that controls ability to execute a service
  (execute).

  :Cardinality:
    1..*

  :ValueSpace:
    :class:`Types.Permission`

  :Generated By: Client

.. attribute:: resource

  The unique identifier (pid) for a resource in the system to which the access
  rules in this access policy apply.

  :Cardinality:
    1..*

  :ValueSpace:

    :class:`Types.Identifier`

  :Generated By: Client


An example instance of this syntax is:

.. literalinclude:: /d1_schemas/accesspolicy-example.xml
   :language: xml

Authorization Services
----------------------

:TODO:
  Update this section to include the latest revisions to the methods defined
  for managing and working with the access control for objects.

In this section, define a set of Authorization services to be implemented at
CN and MN. The current Authorization Service is defined as a standalone
service.

.. TODO::

    Link these methods to the generated methods in the API specifications,
    eliminate redundancy of the description text between the two locations.

isAuthorized(token, pid, action):: boolean

    Determine if the user authenticated by the token can take the action
    specified (read, write, changePermission, execute) on the resource named by
    the identifier pid.

setAccess(token, Types.AccessPolicy):: void

    Set the access policy for a series of resources as specified by the
    provided AccessPolicy document.  The user identified by the authentication
    token must have changePermission permission on all resources named in the
    AccessPolicy.  If so, then the policies for those resources will be
    replaced (or created as needed) by the policies specified in AccessPolicy.
    If the user does not have sufficient permission, then the NotAuthorized
    exception must be thrown, and none of the policies should be applied (it is
    not sufficient to have appropriate permissions on just one resource -- if
    permission is not present for all listed resources, then implementations
    must roll back any changes and return NotAuthorized.


Interaction diagrams
--------------------

:TODO: Need to update authorization use cases and include references to them.

..
    Implementation phases
    ---------------------

    During the first DataONE Federated Security workshop, four phases for
    development were identified that involve increasingly sophisticated
    authorization and access control mechanisms. The four phases are:

    - **Phase 1: Mostly public access (target date: January 2011)**: Only publicly
      readable content is replicated. Only publicly readable content is indexed
      for search and retrieval. Access to restricted content is through origin
      member node only. No authentication is required to search and retrieve
      public content. Authentication is required to upload (create) content.

    - **Phase 2: Access control supported for search and retrieval**: ACLs
      respected by coordinating nodes. Authenticated users can discover content
      that is restricted to them or their groups. Restricted access content is not
      replicated.

    - **Phase 3: Access control supported for content replication**: Restricted
      access content is replicated to member nodes with compatible ACLs and
      pre-arranged trust agreements.

    - **Phase 4: Consistent semantic and functional interoperability for identity
      and security**: Restricted access content is replicated to any member node.
      Authentication by long-running workflows is supported.


Phase 1
~~~~~~~

.. 
   @startuml images/authorization_seq.png
   actor User
   participant MN1
   participant MN2
   participant CN
   User -> CN: login(D1.username, password)
   activate CN
   CN --> MN1: token
   deactivate CN
   User -> MN1: create(token, pid, object, sysmeta)
   activate MN1
   MN1 -> MN1: verify(token)
   MN1 -> MN1: isAuthorized(token, pid, OP_CREATE)
   MN1 --> User: pid
   deactivate MN1
   @enduml

.. image::  images/authorization_seq.png

*Figure 1.* Only public objects are searchable and replicated in the system. 
Create, Read, Update, and Delete operations are controlled by member nodes
for private objects, but read for public resources can be handled by any
replicating member node, or a coordinating node in the case of metadata
resources.

.. figure:: images/anaz_phase1.png

*Figure 2.* Trust relationships between components during phase 1 of
Authz/Authn. Triangle = CN, Rectangle = MN, open circle = public data, filled
circle = private data, dashed line = untrusted connection. A Coordinating Node
retrieves only public content from a Member Node (A), and only publicly
readable content is available to users through the Coordinating Nodes (B) and
Member Nodes (C). A Coordinating Node must have a trusted relationship with
Member Nodes to request replication operations (E) even though the content
being replicated is publicly readable and does not require a trusted
connection (D).


Phase 2
~~~~~~~

.. figure:: images/anaz_phase2.png

*Figure 3.* Trust relationships between components during phase 2 of
Authn/Authz. Triangle = CN, Rectangle = MN, open circle = public data, filled
circle = private data, dashed line = untrusted connection, solid line =
trusted connection, user with hat = authenticated user. Coordinating Nodes
synchronize public and private content (A). Authenticated users can retrieve
private data from the origin Member Node (B) and can discover and retrieve
metadata from the Coordinating Nodes (C). Public content is replicated between
Member Nodes (D) under the direction of a trusted connection from the
Coordinating Nodes (E).


Phase 3
~~~~~~~

.. figure:: images/anaz_phase3.png

*Figure 4.* Trust relationships between components during phase 3 of
Authn/Authz. Triangle = CN, Rectangle = MN, open circle = public data, filled
circle = private data, dashed line = untrusted connection, solid line =
trusted connection, user with hat = authenticated user. Member Nodes of
compatible technology (D) have a trust relationship that enables transfer of
protected content from one member node to another (A). An authenticated user
has the same access to private content replicated to other Member Nodes (B).
Member Nodes with incompatible technology (i.e. unable to create a trusted
relationship) are only able to replicate public content (C). Coordinating
Nodes must have trusted relationships to all Member Nodes (E) to direct
replication.

Phase 4
~~~~~~~

TBD

Issues
------

- Located At CN or MN?

   * At CN requires global knowledge of ACLs
   * At CN requires a lot of network traffic for authorization on objects
   * At MN makes authorization of search results impossible

   * Compromise: Federated, each authoritative MN for an object keeps its ACL
     list, which gets synchronized to the CN at sync time

     * Assume most object write is at MN level, so best to not have to go to
       CN
     * Assume MN will want to control their own write access
     * Requires MN Authorization services
     
- Efficiency of search results authorization

   * Need to authorize large number of search results in each operation
   * Has implications for search results cacheing