D-Space’s Next Generation

D-Space’s Next Generation
John Mark Ockerbloom, Univ. Of Pennsylvania

DSpace project has become a non-profit organization to continue development. They have hired a director and will be hiring a CTO to oversee development. The new organization will support better intellectual property management for the software.

Why a new DSpace architecture?

  • Use, scale, and dependence on DSpace is growing.
  • The community has grown, uses have grown.
  • Repositories are growing older with greater preservation needs, and ad hoc patching can lead to dead-ends.

Architectural needs:

  • Set priorities.
  • Handle variety of content and metadata that institutions manage.
  • Make it easier to develop, customize, and compose with other systems.
  • No DSpace is an island.

Issue:

  • Set directions for an evolutionary, practical system design to serve community needs.

Process:

  • Discussions started in 2004
  • Summer 2006 group chosen to review complete architecture
  • Online discussion
  • Week-long summit in 2006
  • Follow-on activity

Move to integrate with other software; DSpace doesn’t have to provide all functionality.

Conducted survey: questions and comments about use and customization of DSpace repositories, 116 responses in one week.

Results:

  • Adaptation common
  • Customize metadata
  • About 1/4 change database schema
  • 1/2 made significant code changes
  • Problems keeping customizations in sync with new versions

Manifesto:

  • DSpace is primarily open source software for building digital repositories, avoid scope creep.
  • DSpace will be usable based purely on free and open source software.
  • DSpace will have a decoupled, stable, and application-neutral core, not the full distribution, applications and extensions built on it, more API’s into the core.
  • While usable for a variety of applications, DSpace will retain useful “out-of-the-box” functionality for common use cases as the standard distribution.

DSpace development:

  • DSpace will employ and support existing, open standards where possible and practical.
  • DSpace releases should be minimally disruptive.
  • DSpace will support an exit strategy for content.
  • DSpace will continue to evolve to reflect what users do.

Scalability can be measured by size, intensity of use, rate of ingestion, workflow processing, etc. Large-scale goals:

  • 10M items
  • 10 simultaneous depositors
  • 100 simultaneous users
  • 1 second response time

Interoperability can be defined as data interoperability, service interoperability, API-level interoperability, etc. What’s needed for this:

  • Publish concrete data model for content and metadata.
  • Content and metadata are fully exportable and importable.
  • Published, documented, stable core interface.
  • Common, standard protocols supported in release.

Highlights of DSpace2:

  • More powerful, flexible information model, diversity of content.
  • More ways to interact with, build on core.
  • Documentation.
  • Shift to XML-based configurable user interface model.
  • Focus on extended lifecycle of content.
  • More reuse of third-party development.
  • Multiple metadata records attached to items and sub-components.
  • Manifestations replace bundles.
  • Identifiers, currently use handles, but need persistent identifiers for Epeople.
  • Components with items should also have persistent identifiers.
  • Proposal: URI’s based on the item identifier with various qualifiers for manifestation, content file, and version.

Versioning:

  • Manage content over time.
  • Non-semantic revisions of content and metadata, format migrations, revised metadata, possibly minor content corrections (typos).
  • Semantic revisions can be separate items with relation metadata to link them.
  • Versions have identifiers.
  • Retain old versions? Matter of repository policy.

Metadata:

  • Metadata is just as important as original content, may be key to understanding and using content, one size doesn’t fit all
  • Metadata needs to be more flexible, preservable, serializable, not constrained to be flat.
  • Still needs to be easy and efficient to use.
  • Default schemas for items, content files, views of metadata can be projected into database schemas for efficiency of access.
  • Abstract data model separated from concrete data storage.

Aggregation:

  • User interface work, Manakin, at Texas A&M includes XML-based themes, aspects, self-contained components, packages. This software is available now and will be in DSpace 1.5.
  • Extension frameworks make it easier to integrate certain components. For now a simple add-on mechanism serves this purpose.
  • The core should include an event notification mechanism that allows loosely coupled, open-ended components. This function is currently in the core, but needs to come out of the core. There is a prototype for DSpace1, and they are working on one for DSpace2.
  • The notion of workflow needs to be applied to more than the ingestion process. This will be configurable with profiles and better tools for specifying and modifying repository workflows.

Road to DSpace2:

  • Core group details specs of core, documentation, and reimplementation within 2 years.
  • Architectural oversight committee monitors progress.
  • Wider community will support the DSpace distribution effort.
  • DSpace1 continues to evolve in the meantime with things like Manakin, event notification, etc.
  • Early DSpace2 work begins.

Report from Architectural Review group on DSpace site (http://wiki.dspace.org).


You can follow any responses to this entry through the RSS 2.0 feed.
You can leave a response, or create a trackback from your own site.

There are no comments yet, be the first to say something


Leave a Reply

You must be logged in to post a comment.

Subscribe to RSS feed

Top

Powered by WordPress, pimped by preuro.eu. Copyright © 2006-2009 Tom’s Thoughts