Community Dictionary Service

From IdCommons

This is the home page for the Community Dictionary Service (CDS), a project of the Identity Schemas Working Group.


There are two problems in large-scale data sharing that a CDS can solve.

Establishing Consensus

So far the hardest problem in wide-area data sharing has been reaching consensus about schemas/vocabularies. It has proven impossible, even among the narrower communities that drove X.500 and LDAP directory standards.

Taking an open community dictionary approach solves this problem the same way human language does: it recognizes that the problem is beyond the scope of any single authority or standards body and instead focuses on a mechanism for the community as a whole to harmonize on actual practice. It is the Internet equivalent of "vote with your feet" (we might call it "vote with your feed".) This is precisely the approach Wikipedia has taken and why it has been such a success.

Achieving Transport-Protocol Independence

There is a second even deeper problem to be solved. In theory, a data sharing language can be be independent of the underlying protocol used to transfer the data (e.g., HTTP, WS-Trust, OpenID Attribute Exchange, etc.) However the deeper you get into the problem, the harder this becomes to achieve.

The reason is that unless the actors at the transport protocol level (digital subjects, RPs, identity agents) can identify and describe themselves and their own actions/authorizations directly in the data sharing language, you end up with an artifical dichotomy between the language used for the control layer (identification, authentication, authorization, access control) and the language used to describe the data. From a human language perspective, this would be the equivalent of people using one language – say Latin – for all their basic grammatical constructs and a different language – say English – for all the nouns and verbs they actually want to talk about. Over time the two languages would be forced together just for efficiency and ease of use.

The same convergence is imminent -- and highly desirable – for the emergence of a large-scale Internet data sharing network. For this reason their needs to be a data sharing protocol that speaks the same language as the dictionaries used to define the language. This is essentially the equivalent of saying that a dictionary of the English language should be written, described, and communicated by English-speaking people (vs. Yiddish, Russion, Japanese, etc.)

About XDI

XDI (XRI Data Interchange) was developed to solve this problem. It is both a simple RDF-based data description format and an "abstract" data sharing protocol that uses this format. This ability to be self-describing, to interleave control and content, makes it especially appropriate to the CDS. However because XDI is an RDF graph model, the CDS can generate dictionary definitions in other data description formats (e.g., RDF/OWL/HOWL, Schemat, OpenID AX Metadata, etc.)

Schemas vs. Dictionaries

In standard XML architecture, data is described using a schema written in a schema description language such as W3C XML Schema (WSD), RelaxNG (RNG), etc. This schema description language has its own small vocabulary for describing a grammar, i.e., rules for defining relationships between elements and attributes. This grammar vocabulary is typically independent of and not reused in the resulting schemas.

Again, this introduces an artificial separation between "control" vocabulary (grammar) and "content" vocabulary (everything else) that does not exist in human language. While it is true that the fewer terms in a language that a machine needs to understand, the easier it is for developers to implement the language, it does not follow that the machine-understandable terms and human-understandable terms need to be in different languages. As with human languages, the grammar of the language can be encoded in a relatively small number of terms that enable the definition, understanding, and usage of all the other terms in the same language.

This is the approach taken with XDI. Since it combines control and content in one language, its base level of definition is not a schema but a dictionary where, just like human language, each term in the language is defined using the same language. This does not prevent the definition of higher-level semantic constructs such as schemas, forms, templates, queries, etc., just as those same constructs are supported by human languages. But it establishes a common lower layer of semantic consensus that can now be shared across all these constructs rather than redefined separately for each of them.

Links to XDI Technical Info

XDI is still a young open standards effort, with no formal specifications published yet. What has been developed by the OASIS XDI Technical Committee is a very simple graph model and RDF-based data interchange format. This is written up in the The XDI RDF Model. The initial CDS implementation is based on this model.

CDS Use Cases, Feature Requests, and Roadmap

CDS Code, Documentation, and Operation

CDS Policies and Governance

  • CDS Policies discusses and documents specific technical and operational policies for the CDS.
  • CDS Governance discusses the larger governance questions for a "Wikipedia for machines"

See Also