Content Migration: Quantum Leap

download Content Migration: Quantum Leap

If you can't read please download the document

Transcript of Content Migration: Quantum Leap

PowerPoint Presentation

Content Migration:Quantum Leap

Plone Conference 2008Washington, DCOctober 10, 2008Vitaliy Podobahttp://quintagroup.com

Good day!Now I'm going to present our approach to migration in Plone.

But before that, I'd like to introduce myself.

Vitaliy PodobaPlone Developer at Quintagroup.compiv (plone.org)vipod (#plone irc channel)

Who am I?

My name is Vitaliy Podoba.
I'm Zope/Plone developer at Quintagroup
'piv' this is my user name on plone.org
And 'vipod' this is my nickname at #plone irc channel.

As we all know Plone is celebrating his 7th birthday these days. So Happy Birthday Plone:-)

Plone Evolution

And all this time (7 years) Plone people were working hard to implement more and more Plone sites.
As time was passing not all sites were regularly upgraded to fresh Plone versions. And this is natural process.

But with the recent brand new superb Plone 3 release, customers are willing to move their old 2.0, 2.1, 2.5 websites to new Plone 3.

And this is also natural. Because, as you see, Plone 3 is really good;-)

So, what to do?

Plone Portal Migrationhttp://plone.org/documentation/manual/upgrade-guide

For the time being the main approach for Plone migration is its native built-in migration procedure. This approach is vital for us and it's difficult to overvalue it. We have been using it since the first Plone releases and we are sure that we will stillbe using it for quite a while. Plone portal migration is very well described in Plone migration Guide at plone.org.
But there are several rough edges in built-in plone migration mechanism in terms of content migration.

Plone Portal Migrationcan failloose controliterative (step-by-step) processno way back

1. The first and the worst thing about plone portal migration is that it can fail. It especially concerns very customized and really old Plone versions like 2.0 or 2.1.Plone is a very flexible system, but when migrating this will affect the outcome based on what changes you have made to your system.

For example:
* If you have a standard Plone site with simple customizations, it will likely work very well.
* But if you have installed and depend on a lot of third-party products produced by different developers, it's hard to say something definite
The migration tool handles most cases, but your mileage may vary. Heavily customized sites should take into account some extra time to do the transition.
2. Site Upgrades currently, utilize a function registry to enable upgrades, this process is labour-intensive and not introspectable with regards to the changes it makes to a site. So this is another rough edge in portal migration, called here as loose control.

3. You can not migrate, for example, from plone 2.1 to plone 3.1 omitting plone 2.5 version during the migration. You can do the migration only step-by-step.

4. Moreover, built-in migration procedure doesn't know how to migrate from newer Plone to the older one. So you can't take a step back.



Use Case: CMS.Info
http://www.contentmanagementsoftware.info

And now a few words about our use case which actually involved us in a Quantum Leap session.

We decided to test the content migration strategy on one of our well known websites: contentmanagementsoftware.info, which contains listings of Plone and Zope products.

From now on I will call it just CMS.Info..

Historically the website was migrated from Plone 2.0.5 to Plone 2.1 long time ago. Content was converted from regular CMF to Archetypes, leaving a lot of cruft in a database.

So we had to migrate from Plone 2.1 to Plone 3.1. And we took care of the content only. We didn't care about existing site's functionality, settings or preferences, because we wanted to replace existing functionality and features with new ones.

And what is more: a) CMS.Info portal had it's own blog, built using SimpleBlog product which isn't supported in Plone 3; b) and CMS.Info site had a few forms created with a help of PloneFormMailer which is also not compatible with Plone 3.

The last two tasks could definitely not be accomplished using standard plone migration.

That's why CMS.Info was ideal use case for Quantum Leap content migration.

Quantum Leap

This is a photo from my last mountains trip. Here I'm doing my quantum leap ;-) Taking off from Plone 2.1 version, flying over Plone 2.5 and landing on Plone 3.1 version. This all is true about our use case. We migrated content from Plone 2.1 to Plone 3.1 without intermediate, version by version steps.

That is why this session is called Quantum Leap. It's about a large and significant jump forward from Plone 2.1 to Plone 3.1 without intermediate major 2.5 version.

This approach can be well covered with a help of GenericSetup product, introduced by Tres Seaver.

Aside:
In physics, a quantum leap or quantum jump is a change of an electron from one energy state to another within an atom.. In real physical systems a quantum leap is not necessarily a large change, and can in fact be very insignificant.. In the popular sense, the term is usually applied to mean a large or significant change, which is thus not strictly correct.

GenericSetupTres Seaver

And now a short introduction to GenericSetup. Just in a few words for those who don't know what it is.


From product's README.txt: This product provides a mini-framework for expressing the configured state of a Zope Site as a set of file system artefacts. These artefacts consist of declarative XML files, which spell out the configuration settings for each "tool" in the site.GenericSetup introduces the idea of "setup profiles", which are a collection of XML files that describe different aspects of the site configuration.

But actually we need only one step from GenericSetup tool called 'Content' which is responsible for exporting and importing site content.

Now let's consider why GenericSetup approach is so good in our case.

Content Migrationmigrate only contentclean databasetransform portal type on-the-flymanipulate/transform an exported XMLtransfer content back and forwardand between different Plone versions

1. First of all, as we are not willing to migrate any portal settings or other functionality, we care only about content. Thus, we won't be doing unnecessary work.

2. We will acquire a clean, created from scratch database, without any trash which was possibly collected by previous migrations.

3. With this migration we will be able to transform one portal type into another one, if it's necessary. And I know, it really is, because we have some content based on no longer supported Plone products.

4. After export we will have a bunch of xml files which we will be able to manipulate/transform and import them back in as a new site. We will also be able to even exchange site content with other content consumers.

5. This approach is reciprocal. Saying it in other words it means that anything that can be exported can be re-imported.

6. And it could be done between different Plone versions.

I think now we have sufficient amount of arguments to start our investigation of what we have in Plone 'out-of-the-box'.

But at this point I should say that unfortunately nothing is perfect. Neither is Plone's export/import story.

Nothing is perfect

PSPS 7826: content import/exporthttp://thread.gmane.org/
gmane.comp.web.zope.plone.devel/
18968

As a proof of the fact that not everything is good with export/import in Plone, here is a link to plone's devel thread as for PSPS 7826 called 'content import/export'. And it is still in progress.

This means that at the moment Plone doesn't provide us with a solid content migration solution.

But anyway, let's consider what does it provides

.

CMFPloneGenericSetupCMFCoreCMFTopicArchetypes & ATContentTypesMarshall

What do we have?

Well, Plone together with GenericSetup, CMFCore, CMFTopic, Archetypes and Marshall product furnish us with the next things:

AT content types:FolderATDocumentATNewsItem

AT schema fields:+ UID+ schema fields-- reference fields-- image/file fields

CMF:typeworkflow historylocal roles

We can migrate 'out-of-the-box' Folder, ATDocument and ATNewsItem. And that is all about content types.

These content types will be handled with almost all their archetypes schema fields and unique id. You would probably ask why almost all - but not all? Well, reference fields, due to some unknown for me reasons, are simply skipped by Marshall during content export.
And file fields are not fetched properly either. It's because archetypes file field's accessor does not return file data, but only url to that file.

Apart from archetypes fields Marshall will process content object's type, workflow history and local roles for us.

That's all what we have by default.As you can see , this list is far from being complete. A lot of important issues are still missing.

The main task
or
what is missingtransfer all standard types
including topics with criteriatransfer AT Referencestransfer AT File fieldstransfer Properties (PropertyManager)transfer Comments (Disscussion items)

Here is a list of different issues that we needed to carry out to migrate CMS.Info

As you see, the main task here is to make it possible to migrate all the standard plone content types, like events, links, images, files, etc.

We don't have any mechanism to export/import AT references.

Also it is vital to fix a problem with AT file fields accessor which doesn't want to return us file data.

We have qSEOptimizer installed in our CMS.Info. This product intensively uses object properties (PropertyManager).
And we have a lot of objects with custom qSEOptimizer properties. So it is important for us to transfer such properties too.

Last but not the least, we don't have any migration handlers for comments, also known as discussion items.

After we figure out what do we need we can look at the possible existing solutions.

Possible Solutionscollective.plone.gsxmlRamon BartlStefan Eletzhofer

collective.transmogrifier
plone.app.transmogrifierMartijn Pieters

It seems like these two packages are the best we have for content import/export.

The first one collective.plone.gsxml was written by Ramon Bartl and Stefan Eletzhofer. The most important thing about this package is that it allows to export/import at least all standard plone AT based content. This package relies pretty much on the Marshall. So the problems with Marshall will persist with gsxml too. And in addition this package can transfer AT references.

Another one is collective.transmogrifier, written by Martijin Pieters, is really well designed and it's easy to extend it. But at the moment transmogrifier does not have any sections/handlers/sources which can help us to export content. Instead it contains a lot of different useful import hanlders.

That's why finally we choose transmogrifier as a base for our Quantum Leap content migration approach.

So thanks a lot to Martijn Pieters for it's great collective.transmogrifier package.
And of course thanks to Ramon Bartl and Stefan Eletzhofer for their work from what we could benefit.

And finally the result looks like this:

The Solutionquintagroup.transmogrifierhttp://svn.quintagroup.com/products

This package provides a lot of extension handlers for collection.transmogrifier like:
. PropertyManager property handler
handler for discussion items
references file fields topics with criteria etc.

But this is not all we managed to accomplish...

Some extra tasksSimpleBlog to QuillsPloneFormMailer to PloneFormGenmigrate Users and Membersmigrate portletstransfer RedirectionTool settings

We had a few more extra tasks during the migration.

The first task was to migrate SimpleBlog content to Quills.
The second one was similar: migrate PloneFormMailer forms to PloneFormGen forms.
Of course we had to move Users and Members.
We also had to migrate old style portlets to plone 3 ones, and transfer RedirectionTool settings.

For the first two items we created two more packages:

Extra Solutionsquintagroup.transmogrifier.simpleblog2quillsquintagroup.transmogrifier.pfm2pfgother.packages.follow

So if you need to migrate blogs based on SimpleBlog product to Quills content then it will be useful to use quintagroup.transmogrifier.simpleblog2quills package.

To move PloneFormMailer forms to PloneFormGen ones just use quintagroup.transmogrifier.pfm2pfg.

All these packages are available at http://svn.quintagroup.com/products svn repository.

We hope these packages will be useful for you. And we are looking forward to get some feedback from you to know what to improve, what to fix and what if missing.

Roadmapcontent versionslocal permission settingsblobssomething else?

To conclude, I would mention what is missed right now in our solution.

These are: content versions, local permission settings, blobs.
And I think there are a lot of other different things which I don't remember or don't even know about ;-)


LinksPlone Mailing List
http://thread.gmane.org/gmane.comp.web.zope.plone.devel/18968 Plone Feature Request
http://dev.plone.org/plone/ticket/7826 CMS.Info
http://www.contentmanagementsoftware.infocollective.transmogrifier
http://svn.plone.org/svn/collective/collective.transmogrifiercollective.plone.gsxml
http://pypi.python.org/pypi/collective.plone.gsxml/0.4.5Quintagroup Repository
http://svn.quintagroup.com/products

Here I listed a few links mentioned previously during this session.

Questions

Time for Questions.

and Thank you!

Good day!Now I'm going to present our approach to migration in Plone.

But before that, I'd like to introduce myself.My name is Vitaliy Podoba.
I'm Zope/Plone developer at Quintagroup
'piv' this is my user name on plone.org
And 'vipod' this is my nickname at #plone irc channel.

As we all know Plone is celebrating his 7th birthday these days. So Happy Birthday Plone:-)And all this time (7 years) Plone people were working hard to implement more and more Plone sites.
As time was passing not all sites were regularly upgraded to fresh Plone versions. And this is natural process.

But with the recent brand new superb Plone 3 release, customers are willing to move their old 2.0, 2.1, 2.5 websites to new Plone 3.

And this is also natural. Because, as you see, Plone 3 is really good;-)

So, what to do?

For the time being the main approach for Plone migration is its native built-in migration procedure. This approach is vital for us and it's difficult to overvalue it. We have been using it since the first Plone releases and we are sure that we will stillbe using it for quite a while. Plone portal migration is very well described in Plone migration Guide at plone.org.
But there are several rough edges in built-in plone migration mechanism in terms of content migration.

1. The first and the worst thing about plone portal migration is that it can fail. It especially concerns very customized and really old Plone versions like 2.0 or 2.1.Plone is a very flexible system, but when migrating this will affect the outcome based on what changes you have made to your system.

For example:
* If you have a standard Plone site with simple customizations, it will likely work very well.
* But if you have installed and depend on a lot of third-party products produced by different developers, it's hard to say something definite
The migration tool handles most cases, but your mileage may vary. Heavily customized sites should take into account some extra time to do the transition.
2. Site Upgrades currently, utilize a function registry to enable upgrades, this process is labour-intensive and not introspectable with regards to the changes it makes to a site. So this is another rough edge in portal migration, called here as loose control.

3. You can not migrate, for example, from plone 2.1 to plone 3.1 omitting plone 2.5 version during the migration. You can do the migration only step-by-step.

4. Moreover, built-in migration procedure doesn't know how to migrate from newer Plone to the older one. So you can't take a step back.



And now a few words about our use case which actually involved us in a Quantum Leap session.

We decided to test the content migration strategy on one of our well known websites: contentmanagementsoftware.info, which contains listings of Plone and Zope products.

From now on I will call it just CMS.Info..

Historically the website was migrated from Plone 2.0.5 to Plone 2.1 long time ago. Content was converted from regular CMF to Archetypes, leaving a lot of cruft in a database.

So we had to migrate from Plone 2.1 to Plone 3.1. And we took care of the content only. We didn't care about existing site's functionality, settings or preferences, because we wanted to replace existing functionality and features with new ones.

And what is more: a) CMS.Info portal had it's own blog, built using SimpleBlog product which isn't supported in Plone 3; b) and CMS.Info site had a few forms created with a help of PloneFormMailer which is also not compatible with Plone 3.

The last two tasks could definitely not be accomplished using standard plone migration.

That's why CMS.Info was ideal use case for Quantum Leap content migration.This is a photo from my last mountains trip. Here I'm doing my quantum leap ;-) Taking off from Plone 2.1 version, flying over Plone 2.5 and landing on Plone 3.1 version. This all is true about our use case. We migrated content from Plone 2.1 to Plone 3.1 without intermediate, version by version steps.

That is why this session is called Quantum Leap. It's about a large and significant jump forward from Plone 2.1 to Plone 3.1 without intermediate major 2.5 version.

This approach can be well covered with a help of GenericSetup product, introduced by Tres Seaver.

Aside:
In physics, a quantum leap or quantum jump is a change of an electron from one energy state to another within an atom.. In real physical systems a quantum leap is not necessarily a large change, and can in fact be very insignificant.. In the popular sense, the term is usually applied to mean a large or significant change, which is thus not strictly correct.

And now a short introduction to GenericSetup. Just in a few words for those who don't know what it is.


From product's README.txt: This product provides a mini-framework for expressing the configured state of a Zope Site as a set of file system artefacts. These artefacts consist of declarative XML files, which spell out the configuration settings for each "tool" in the site.GenericSetup introduces the idea of "setup profiles", which are a collection of XML files that describe different aspects of the site configuration.

But actually we need only one step from GenericSetup tool called 'Content' which is responsible for exporting and importing site content.

Now let's consider why GenericSetup approach is so good in our case.

1. First of all, as we are not willing to migrate any portal settings or other functionality, we care only about content. Thus, we won't be doing unnecessary work.

2. We will acquire a clean, created from scratch database, without any trash which was possibly collected by previous migrations.

3. With this migration we will be able to transform one portal type into another one, if it's necessary. And I know, it really is, because we have some content based on no longer supported Plone products.

4. After export we will have a bunch of xml files which we will be able to manipulate/transform and import them back in as a new site. We will also be able to even exchange site content with other content consumers.

5. This approach is reciprocal. Saying it in other words it means that anything that can be exported can be re-imported.

6. And it could be done between different Plone versions.

I think now we have sufficient amount of arguments to start our investigation of what we have in Plone 'out-of-the-box'.

But at this point I should say that unfortunately nothing is perfect. Neither is Plone's export/import story.

As a proof of the fact that not everything is good with export/import in Plone, here is a link to plone's devel thread as for PSPS 7826 called 'content import/export'. And it is still in progress.

This means that at the moment Plone doesn't provide us with a solid content migration solution.

But anyway, let's consider what does it provides

.Well, Plone together with GenericSetup, CMFCore, CMFTopic, Archetypes and Marshall product furnish us with the next things:
We can migrate 'out-of-the-box' Folder, ATDocument and ATNewsItem. And that is all about content types.

These content types will be handled with almost all their archetypes schema fields and unique id. You would probably ask why almost all - but not all? Well, reference fields, due to some unknown for me reasons, are simply skipped by Marshall during content export.
And file fields are not fetched properly either. It's because archetypes file field's accessor does not return file data, but only url to that file.

Apart from archetypes fields Marshall will process content object's type, workflow history and local roles for us.

That's all what we have by default.As you can see , this list is far from being complete. A lot of important issues are still missing. Here is a list of different issues that we needed to carry out to migrate CMS.Info

As you see, the main task here is to make it possible to migrate all the standard plone content types, like events, links, images, files, etc.

We don't have any mechanism to export/import AT references.

Also it is vital to fix a problem with AT file fields accessor which doesn't want to return us file data.

We have qSEOptimizer installed in our CMS.Info. This product intensively uses object properties (PropertyManager).
And we have a lot of objects with custom qSEOptimizer properties. So it is important for us to transfer such properties too.

Last but not the least, we don't have any migration handlers for comments, also known as discussion items.

After we figure out what do we need we can look at the possible existing solutions.

It seems like these two packages are the best we have for content import/export.

The first one collective.plone.gsxml was written by Ramon Bartl and Stefan Eletzhofer. The most important thing about this package is that it allows to export/import at least all standard plone AT based content. This package relies pretty much on the Marshall. So the problems with Marshall will persist with gsxml too. And in addition this package can transfer AT references.

Another one is collective.transmogrifier, written by Martijin Pieters, is really well designed and it's easy to extend it. But at the moment transmogrifier does not have any sections/handlers/sources which can help us to export content. Instead it contains a lot of different useful import hanlders.

That's why finally we choose transmogrifier as a base for our Quantum Leap content migration approach.

So thanks a lot to Martijn Pieters for it's great collective.transmogrifier package.
And of course thanks to Ramon Bartl and Stefan Eletzhofer for their work from what we could benefit.

And finally the result looks like this:

This package provides a lot of extension handlers for collection.transmogrifier like:
. PropertyManager property handler
handler for discussion items
references file fields topics with criteria etc.

But this is not all we managed to accomplish...

We had a few more extra tasks during the migration.

The first task was to migrate SimpleBlog content to Quills.
The second one was similar: migrate PloneFormMailer forms to PloneFormGen forms.
Of course we had to move Users and Members.
We also had to migrate old style portlets to plone 3 ones, and transfer RedirectionTool settings.

For the first two items we created two more packages:

So if you need to migrate blogs based on SimpleBlog product to Quills content then it will be useful to use quintagroup.transmogrifier.simpleblog2quills package.

To move PloneFormMailer forms to PloneFormGen ones just use quintagroup.transmogrifier.pfm2pfg.

All these packages are available at http://svn.quintagroup.com/products svn repository.

We hope these packages will be useful for you. And we are looking forward to get some feedback from you to know what to improve, what to fix and what if missing.To conclude, I would mention what is missed right now in our solution.

These are: content versions, local permission settings, blobs.
And I think there are a lot of other different things which I don't remember or don't even know about ;-)


Here I listed a few links mentioned previously during this session.

Time for Questions.

and Thank you!