Monday, October 18, 2010

JCR - JSR 170 - Introducing the Java Content Repository API

Introducing the Java Content Repository API



If you've ever tried to develop a content management application, you're all too aware of the difficulties inherent in implementing content systems. The landscape is fragmented, with numerous vendors offering proprietary repository engines. These difficulties exacerbate the complexity and maintainability of such systems, promote vendor lock-in, and increase the need for long-term legacy support in the enterprise market. With the growing popularity of corporate weblogs and electronic corporate document management, the need for a standardized content repository interface is more apparent than ever.
The Content Repository for Java Technology specification, developed under the Java Community Process as JSR-170, aims to meet these industry needs. The specification provides a unified API under the javax.jcr namespace that allows you to access any specification-compliant repository implementation in a vendor-neutral manner.
But API standardization is not the only feature that the Java Content Repository (JCR) brings to the table. A major advantage of JSR-170 is that it is not tied to any particular underlying architecture. The back-end data storage for a JSR-170 implementation, for instance, may be a filesystem, a WebDAV repository, an XML-backed system, or even an SQL database. Furthermore, the export and import facilities of JSR-170 allow an integrator to switch seamlessly between content back ends and JCR implementations. Finally, the JCR provides a straightforward interface that can be layered on top of a wide variety of existing content repositories, while simultaneously standardizing complex functionality such as versioning, access control, and searching.
There are several approaches that I could take when discussing the JCR. In this article, I examine the features offered by the JSR-170 specification from a developer's perspective, focusing on the available API and the interfaces that allow a programmer to efficiently use the JSR-170 repository in designing a content application. As an artificial example, I'll implement a trivial back end for a Wikipedia-like encyclopedia system, called JCRWiki, with support for binary content, versioning, backup, and search. I use Apache Jackrabbit, an open source implementation of JSR-170, to develop this application.
I'll begin with a high-level discussion of the Repository model to familiarize you with the JCR. The Repository model is a simple hierarchy and looks much like an n-ary tree. It consists of a single content repository, with one or more workspaces. (In this article, I'll limit the discussion to a single workspace.) Each workspace contains a tree of items; an item can be either a node or a property.A node can have zero or more children, and zero or more associated properties, where the actual content is stored.
Every node has one and only one primary node type. A primary node type defines the characteristics of the node, such as the properties and child nodes that the node is allowed to have. In addition to the primary node type, a node may also have one or moremixin types. A mixin type acts a lot like a decorator, providing extra characteristics to a node. A JCR implementation, in particular, can provide three predefined mixin types:
  • mix:versionable: allows a node to support versioning
  • mix:lockable: enables locking capabilities for a node
  • mix:referenceable: provides an auto-created jcr:uuid property that gives the node a unique, referenceable identifier
This structure is illustrated in Figure 1. Circles represent nodes, while rectangles represent properties. Of interest are nodes A, B, and C, descending from the singular root node. Node A has two properties: a string, "John," and an integer, 22.

Figure 1. A repository model with multiple workspaces


A repository model with multiple workspaces
Every repository must support the primary node type, nt:base. There are a number of other common node types that a repository may support:
  • nt:unstructured is the most flexible node type. It allows any number of child nodes or properties, which can have any names. This node type represents JCRWiki entries.
  • nt:file represents files. It requires a single child node, called jcr:content. This node type represents images and other binary content in a JCRWiki entry.
  • nt:folder node types can represent folders, like those in a conventional filesystem.
  • nt:resource commonly represents the actual content of a file.
  • nt:version is a required node type for repositories that support versioning.
The entire node type hierarchy can be found in section 6.7.22.1 of the JSR-170 specifications (see Resources for a link).
A useful but often overlooked feature of the Repository model is its support for namespaces. Namespaces prevent naming collisions among items and node types that come from different sources and application domains. Namespaces are defined with a prefix, delimited by a single : (colon) character. In the course of this article, you've already encountered the namespaces jcr for JCR internal properties, mix for mixin types, and nt for node types. In the JCRWiki, you'll use the wiki namespace for all your data.

No comments:

Post a Comment