DataONE API - 2.0

Identity, Authentication, and Authorization Requirements

Category Requirement ID
API
765: Tools can access an API for authn and authz
820: Common API for authentication and authorization operations
Authentication
392: Identity and access control should be interoperable across datanets
764: Authentication and access control should be consistently available
765: Tools can access an API for authn and authz
820: Common API for authentication and authorization operations
Authorization
393: Access control rule evaluation must be highly scalable and responsive.
761: Users can specify authorization rules for data objects, science metadata objects, and process artifacts separately
764: Authentication and access control should be consistently available
765: Tools can access an API for authn and authz
766: Users should be able to easily assign proxy privileges to other users and to systems acting on their behalf for limited time durations
767: Users need to be able to express embargo rules for data
768: Need default authz policies that resolve problems associated with inaccessible principals
769: Authorization should support critical roles, such as curators and system administrators
770: Authorization system should be able to express the pseudo-principal concepts like ‘public’
772: Authentication services should be compatible with existing infrastructure and applications
777: Authorization rules should support common permission levels
795: System must support revocation of user permissions
820: Common API for authentication and authorization operations
xxx: Group Identifiers are equivalent to user identifiers in all ACL mechanisms
xxx: Local sites/data owners have ability to generate, populate, and modify groups
xxx: Group information can be replicated so that all MNs can see it and use it to
enforce ACLS
xxx: Need clear use cases for administering groups
xxx: Need clear business logic for replicating access control lists
xxx: Can access rules for Data Packages that cross operational bounds like MNs
xxx: System should support (and require?) transport-layer security
xxx: Need ability/clear policies to transfer ownership of abandoned objects
xxx: Support ability to use encryption to ensure restricted access from untrusted MNs
xxx: Can people with ‘SetPermission’ revoke access from original owners
– also does revoking SetPermission priv also revoke the privs for their
grantees
xxx: Need to establish the default set of permissions in absence of additional roles
xxx: Need to ensure that deleted accounts can not be replaced by new users (i.e.,
identities are globally unique over time)
xxx: Curator at institutional level has ability to create accounts for their group
xxx: Sites have ability to create groups
xxx: Sensitive data that is encrypted is generally not replicated except possibly to
avoid risks of leaks of that information (confused)
xxx: Need process for data consumer to request authorization for an object
xxx: Need process for data owner to receive and evaluate the requests
xxx: Need ability to create index of users so that clients can use that list to
assign access control rules, and ability to look up Identity for particular users
xxx: Should be able to assert write without read (controversial)
xxx: Users should be able to revoke their own access to an object
xxx: Allow users to create organic, self-created groups (e.g., for a lab or team)
xxx: Should have a user profile page to review/revise a user’s own identity, group, and
other Identity information
xxx: Nodes need to be able to assert minimum LOA re: who has write access
xxx: May have need for ‘Deny’ permissions on access for convenience (but probably
lower priority)
xxx: People can modify access control rules
xxx: Should be able to deposit content with embargo, but should be able to grant
anonymous access tokens for access to the data without the owner knowing who it
was that had access (use case for anonymous peer review in Dryad)
xxx: Need ability to query what I have access to
xxx: co-ownership model for permissions is needed for handling co-authorship
xxx: Should groups be able to own objects?
xxx: Need to restrict visibility to objects for which they don’t have access in all
services (e.g., search, listObjects, etc)
xxx: Member nodes should be able to restrict data access by individuals on Dept of
Commerce Embargo lists at high LOAs – possibly determine that we won’t support
this, but rather that we state these types of objects must not be uploaded
xxx: Anonymous access will be allowed for for publicly accessible objects
xxx: Can groups contain groups, and at what nesting depth? Yes, one level.
xxx: ID and acces control should be easy to use and not present barriers to adoption
and use
xxx: ID and access control should work in all geopolitical jurisdictions
xxx: ID and access control should comply with universal design standards
Identity
392: Identity and access control should be interoperable across datanets
IdentityProvision
390: Consistent mechanism for identifying users
391: Enable different classes of users commensurate with their roles.
762: User identities can be derived from existing institutional directory services
771: User identities should have simple string serializations that express both the user identity and namespace from which it is drawn
Interoperability
392: Identity and access control should be interoperable across datanets
765: Tools can access an API for authn and authz
820: Common API for authentication and authorization operations
Performance
393: Access control rule evaluation must be highly scalable and responsive.
763: Authentication and authorization services are geographically replicated
764: Authentication and access control should be consistently available

390: Consistent mechanism for identifying users

ID:https://trac.dataone.org/ticket/390

It is necessary to provide a mechanism for users to be identified in the DataONE system. There are several distinct roles that need to be supported for users.Rationale: Identity of users, contributors and other participants in DataONE is necessary to ensure appropriate policies for data sharing (read, write), attribution, and notification (e.g. subscription to types of data).

Fit Criteria

  • Users can identify themselves in the DataONE system
  • Identity is consistent across all nodes (i.e. identity associated with an object is consistent regardless of where the object is retrieved from or acted on)
  • Users can associate various accounts with a single identity
  • Identity information is sufficient to ensure appropriate attribution to content
  • Authentication and authorization mechanisms are recognized consistently by all participant nodes and services of the cicore.
  • Existing user directories in use in environmental science community can directly contribute identities (not “yet another” identity system)

391: Enable different classes of users commensurate with their roles.

ID:https://trac.dataone.org/ticket/391

There are several types of users that will be interacting with the DataONE infrastructure, as such it is necessary to ensure that user roles can be supported by the identity management infrastructure. Closely related to https://trac.dataone.org/ticket/390

Rationale: Different user classes or groups provides an effective mechanismfor indicating the types of interaction that might be supported by the system. The alternative is to specifically assign privileges for each user - an approach that is inefficient and potentially insecure as it is easy to miss an individual when setting privileges for a large number of users.

Fit Criteria

  • A well defined set of standard groups is identified and can be easily manage (e.g. administrators, data contributors, data readers)
  • Users can be assigned to and removed from groups
  • Additional groups can be created to support group functions as necessary
  • Users can create their own groups for ad-hoc collaboration when needed and without approval of system administrators
  • Access control rules can be associated with groups and operate as expected.

392: Identity and access control should be interoperable across datanets

ID:https://trac.dataone.org/ticket/392

There is a general requirement / suggestion by NSF that there should be interoperability between the various DataNet projects. Rationale: It seems like identity and access control is a good place where considerable value can be demonstrated to the user community if credentials and access control rules worked across the data net projects.

Fit Criteria
  • Users can sign into DataONE and DC with the same credentials
  • Once signed in to DataONE, access to DC services is seamless (no additional authentication required)

393: Access control rule evaluation must be highly scalable and responsive.

ID:https://trac.dataone.org/ticket/393

Access control for objects is evaluated for every object access in the DataONE infrastructure. As such, the mechanisms used to determine if a particular token (i.e. handle to an authenticated principle) must be very efficient and should not offer a barrier to the desired levels of access control in the system.

Rationale

Access control should not be an impediment to effective use of the content available through DataONE.

Fit Criteria

  • Access control rules can be evaluted for any token in an average of xxx milliseconds
  • Access control rules must not take longer than xxx milliseconds to evaluate
  • Access control must not block critical operations (e.g replications, synchronization)

761: Users can specify authorization rules for data objects, science metadata objects, and process artifacts separately

ID:https://trac.dataone.org/ticket/761

Users might be able to upload data and science metadata as an atomic operation, but each should be identified separately and access control rules should apply to the objects separately. For example, a user could grant public read access to a metadata object but only grant read access to certain colleagues for associated data objects.

Rationale: Enabling access control at the same level of granularity of objects in the system ensures that complete control over object conglomerations (packages, etc) is available.

Fit Criteria
  • All objects in the system have access control rules
  • Separate rules can be associated with the elements of a package during operations at the package level (e.g. create)

762: User identities can be derived from existing institutional directory services

ID:https://trac.dataone.org/ticket/762

Many existing directory services are in use in the environmental sciences, and participating member nodes should be able to expose their directories through a standardized mechanism to allow users to make use of existing identities. For example, the KNB LDAP server is a federation of multiple LDAP systems from around the world, and these identities have been used in access rules for many existing objects.Rationale: Re-use of existing infrastructure reduces cost of participation and minimizes confusion over which accounts to use and which rules are associated with what account.

Fit Criteria

  • The system provides a mechanism for exsiting directory services to join * The system provides a namespacing mechanism to differentiate users with the same id in different original directories (e.g., mjones@LTER, mjones@UCNRS)
  • The same email address can be associated with multiple identities
  • The same person or system can possess multiple identities
  • If a user has multiple identities, the user can express equivalence rules that indicate that they are linked, equivalent identities for the purposes of authentication and authorization

763: Authentication and authorization services are geographically replicated

ID:https://trac.dataone.org/ticket/763

Authentication and authorization are critical services that can not afford geographic delays, especially across continents, in order to allow adequate responsiveness. Users and developers of services should not have to know which authentication service is used (i.e. a load balancing and failover solution from a centralized address (probably the coordinating node address) should be able to access any of the replicated services. Replicas should be located at multiple trusted sites (probably coordinating nodes) that are geographically distributed (incl. across continents)

Fit Criteria

  • Authentication operations should be less than xxx milliseconds from any point in the network
  • Replicas of authentication and authorization services are geographically replicated
  • Failover across replicated services is automatic without client-side intervention

764: Authentication and access control should be consistently available

ID:https://trac.dataone.org/ticket/764

Authentication and authorization are a critical infrastructure bottleneck, and should be consistently available, likely through load balancing and failover solutions.

Fit Criteria

  • Authn and Authz should be available xx.xxxxx% (To be determined)

765: Tools can access an API for authn and authz

ID:https://trac.dataone.org/ticket/765

A standardized API allows us to build interoperable tools, and adapt existing tools to interoperate with the DataNets. All infrastructure components should be able to use these services, including CNs, MNs, and client tools and libraries.

Fit Criteria

  • Demonstrated interoperability of the API across 3 client and member node systems

766: Users should be able to easily assign proxy privileges to other users and to systems acting on their behalf for limited time durations

ID:https://trac.dataone.org/ticket/766

When users need to execute processes asynchronously, they need to be able to grant proxy privileges (e.g., to a workflow or grid system) to operate on their behalf in particular contexts. In addition, at times some users want others to be able to run and access data and operate on behalf of another, such as in a faculty student situation where the student acts as a proxy for the faculty member.

Fit Criteria

767: Users need to be able to express embargo rules for data

ID:https://trac.dataone.org/ticket/767

These embargo rules allow data to be published in the system, but not released until a particular date. By operating this way, users can use the system in their daily management of their data without worry of losing track of publication of the data at a later date. It encourages people to start using the system even long before they want to publicly release the data.

Fit Criteria

768: Need default authz policies that resolve problems associated with inaccessible principals

ID:https://trac.dataone.org/ticket/768

When principals die, retire, change careers, or lose interest in a research area, they may leave in the system data objects that would be otherwise useful to science but are restricted access. The authorization system should have carefully crafted default policies that encourage the public release and sharing of data, the expiration of embargo periods, and the movement of data into the public domain when it is legal and ethical to do so. Principals should be able to override these defaults to create more restrictive policies (e.g., for human subjects data) that will be respected indefinitely, but the defaults should encourage openness and sharing.

Fit Criteria

  • Defaults encourage openness and sharing, without alienating principals through unexpected release of their data, etc.

769: Authorization should support critical roles, such as curators and system administrators

ID:https://trac.dataone.org/ticket/769

While the principals contributing data should be able to specify access, they frequently struggle with the software systems intended to do so, and at times make mistakes. The system should support certain roles with elevated privielges for groups of objects to allow, e.g, a system administrator or data curator to change objects for which they are not otherwise granted access. For example, all objects that are associated with a particular field station might be managed by the information manager at that field station, and the person filling that role through time might change. Individual principals should be able to determine who has access to objects, both through explicit grants of access and through indirect roles that may be only implicitly defined.

Fit Criteria

  • Its possible for access by some roles to be assigned implicitly through certain membership criteria (e.g., a data object is part of an LTER site)

770: Authorization system should be able to express the pseudo-principal concepts like ‘public’

ID:https://trac.dataone.org/ticket/770

There should be well-known mechanisms in the authorization system to allow access rules that explicitly grant access to pseudo-principals, including:

  • public: anonymous, non-authenticated users
  • valid-user: authenticated user
  • registered-user: authenticated user with explicit minimal contact information
  • verified-user: authenticated user with explicit minimal contact information that has been verified as belonging to a real account holder

Fit Criteria

771: User identities should have simple string serializations that express both the user identity and namespace from which it is drawn

ID:https://trac.dataone.org/ticket/771

When user identities can be drawn from multiple providers, we need to be able to serialize both the id and the provider namespace, for example by encapsulating both in a single distinguished name (DN). Ideally this serialization would be relatively short, persistent, and human understandable, and ideally it should not contain spaces or other characters that make it difficult to utilize in a variety of contexts (such as command line applications).

An example DN that has worked for the KNB network is:

uid=jones,o=NCEAS,dc=ecoinformatics,dc=org

Fit Criteria

772: Authentication services should be compatible with existing infrastructure and applications

ID:https://trac.dataone.org/ticket/772

Many applications will need to be adapted to work with the authentication and authorization services provided. Ideally, the services chosen will be compatible with existing systems and support those systems through standard protocols. Applications will need to commonly connect to, for example, web applications using HTTP Basic Authentication for Apache and JAAS for servlets like Tomcat. In addition, some applications may want to connect via PAM and similar security mechanisms. Some identity services, such as LDAP, are commonly supported in these scenarios.

Fit Criteria
  • Software in common use at Member Nodes and as clients should be able to easily utilize the authentication and authorization services with minimal configuration

777: Authorization rules should support common permission levels

ID:https://trac.dataone.org/ticket/777

Several types of access directives are in common use in data packages in the environmental sciences, and the authorization system should support these. The most common authorization levels would include:

  • read: the ability to display or download an object
  • write: the ability to change the content of an object through an update operation (which does not mean it actually changes the object – it may just create a new version that obsoletes the old)
  • changePermission: the ability to change access control rules on the object

Often, the permission levels are nested, in that higher privilege levels encompass the lower levels as well (e.g., write access to an object implies read access).

See the EML access control module for a detailed explanation of these levels (eml-access module).

In addition to specifying levels of permissions on the individual data objects, the authorization system should allow node administrators to specify what services principals can utilize on their nodes, and any resource constraints that may apply. For example, a Member Node operator may want to specify for their node several rules, such as:

  • user joe can insert or update objects on node 32
  • user jack can not update objects on node 21
  • user joe has an aggregate storage limit of 1TB (may want to consider soft and hard resource limits)
  • user joe has a network bandwidth transfer limit of 10mb/s

Note that these types of node-level resource limitations may not be implemented currently on most member nodes, but the authorization system should be expressive enough to allow node operators to build in these restrictions.

795: System must support revocation of user permissions

ID:https://trac.dataone.org/ticket/795

The system should be able to revoke any user’s permissions and, ultimately, their direct access to the system, if the user is misbehaving within the system.

Although it is unclear as to who assigns permissions, I believe that the final responsibility and authority for access control is the DataONE administrator. As such, permissions and simple access to any part of the DataONE infrastructure, and perhaps member node infrastructure that is accessed through DataONE, should be revokable.

Fit Criteria

  • Administrator can change permissions for a user for any object
  • Permission changes are propagated through the system within XXX seconds
  • Read, write access rules can be altered for a user for all content in the system

820: Common API for authentication and authorization operations

ID:https://trac.dataone.org/ticket/820

There should be a common API utilized by the major software components of the infrastructure for DataONE (for all DataNet?) for authentication and authorization operations.

Rationale

A common API will help minimize inconsistencies that arise from functional and semantic mis-match when interacting across multiple systems.

Fit Criteria

  • CN, MN, and ITK libraries share a common API for authn and authz
  • Differing component implementations pass integration testing