A Content Repository for TYPO3 5.0

A Content Repository for TYPO3 5.0TYPO3 Developer Days 25.-29.04.2007, Dietikon / Switzerland

Inspiring people toshare

Special guest: David NueschelerResponsible for the technology strategy and ongoing product development at Day. He joined Day in 1994

Specification lead on JSR 170 and JSR 283.

Also a committer on the Apache Jackrabbit Project and a member of the Apache Software Foundation

He will now tell us more about JCR


Why a CR for TYPO3?Flexible and extensible data structure

Object based storage and retrieval

Combines advantages of navigational and relational databases

Security can be enforced on a higher level

Cleaner and easier to use for the developer


More reasons for a CRData source abstraction instead of database abstraction

Data can be stored in different ways, a database is only one of them

Due to the higher level of abstraction, database specific functions and specialties like transactions, stored procedures, partitioning ... can be used on the CR implementation level

Depending on the CR implementation the speed gain for read access to the content tree can be immense


The Jackrabbit “shortcut”As there exists no PHP-based CR implementation, we looked for alternatives

Jackrabbit is the JSR 170 reference implementation, providing all required and optional features of the specification

Using it from PHP is possible with the PHP-Java-Bridge

Provides a way to write and test PHP-based unit tests that are needed for implementing a pure PHP-based CR

are we crazy?


A native PHP Content RepositoryTYPO3 5.0 will still run completely without Java - by accessing the PHP-based TYPO3 CR, based on the APIs defined in JSR 170 and JSR 283

The goal: A flexible and powerful content repository for TYPO3 written in PHP

We are not crazy

It is not impossible

Maybe not all of the standard will be implemented – but don’t tell anyone...


Current statusphpCR: The JSR-170 API exists as PHP interfaces, thanks to Travis Swicegood

The Jackrabbit bridge has proven to be a working setup, although it does not handle the full API yet - maybe it never will

We have a large set of unit tests available for the phpCRJackrabbit package

A first batch of those tests has been generalized to be usable for any implementation of the phpCR interfaces


Missing thingsA domain model for the CMS part of the project

A way for defining node types based on that model


Defining the CMS domain modelWe need to focus on the pure domain of the CMS

A first step is to find the common set of objects that form the domain of content management

So, let’s see...


Defining the CMS domain model

Page Content Element

Page Tree

SitemapPlugin

Backend Module

System Folder

Template Record

Category

Content ElementWorkspace


A possible hierarchy of thingsAssignment: try to come up with a hierarchy of objects that represent the content we currently have - and trim where possible

You have 10 minutes...


Node typesTo make good use of a CR, one needs to provide useful node types

A node type specifies

allowed and/or required sub nodes to a node

allowed and/or required properties of a node

supertypes of a node, i.e. inheritance


The node types of magnolia

<nt:hierarchyNode><mix:versionable>mgnl:metaData

<nt:hierarchyNode>mgnl:content

<nt:hierarchyNode>mgnl:contentNode

<nt:resource>mgnl:resource

<nt:base>*

<mgnl:content>mgnl:group

<mgnl:content>mgnl:role

<mgnl:content>mgnl:user

<nt:hierarchyNode>mgnl:reserve

All nodes can have arbitrary properties...


Our node types?The node types should (partly) reflect the domain model

Specifically the parts of the domain model, that need to be persisted

Coming up with a reasonable system of node types is not trivial

We need to further work on the domain model, before steps make sense...


CR configuration from codeCurrently MySQL tables are created when installing an extension

The definition is a plain SQL file

Further data comes from $TCA as defined in ext_tables.php and/or tca.php

Automation needs to stay around, of course

We need to create node types instead of tables and fields


CR configuration from codeGoals

Get rid of multiple places for defining things

Make it as transparent as possible

Create node types based on PHP objects

Use reflection to gather information about the objects

Create node type definition accordingly

What objects need a corresponding node type?


Changes to existing node typesChanging and removing a node type is possible

But what about nodes type being in use?

Jackrabbit currently rejects nontrivial changes

We will probably only change node types on explicit request

Changing a node type may fail if the result would be inconsistent repository content

Existing data needs to be removed before a node type can be removed


CR configuration from codeJSR 170 had no defined API for registering node types

JSR 283 will have it, and we will use that by

adding it to the phpCR interfaces

adding some wrapper for Jackrabbit

An intermediate step is the generation of a file containg the node type definition in Compact Namespace and Node Type Definition (CND) notation


Storing actual contentOne way is to store e.g. the text of a text content element as we do today, i.e. as a string

What about links in the text?

To be aware of links, we’d need to parse it and maintain a reference index

A possible syntax:<a href="${link:{uuid:{522c0cac-7d67-4324-869f- 7553426f95b0},repository:{website},workspace:{default},path:{/help/user-mailing-list}}}">some link</a>


Storing actual contentAn alternative could be to break up the content in smaller nodes

A working example is the DOM tree of a HTML document

Advantages

No need to have a seperate reference index

Queries for links always easily possible

Disadvantages

Adds quite some complexity


Open tasks & next steps

An awful lot of them...


Thanks for listeningKarsten Dambekalns <[email protected]>

A Content Repository for TYPO3 5.0

Technology

Transcript of A Content Repository for TYPO3 5.0