12/01/2007

Exploring SharePoint CMP Export Files

Exploring SharePoint CMP Export Files And how to restore a single file from the export!

The missing example project has been found! (see below)


This article started as a quick exploration of SharePoint site export files. Along the way I found a handy way to recover a single file from a CMP and even wrote a utility to both explore these files and to extract individual files from these CMPs. There are multiple strategies to back up SharePoint, SharePoint’s native backup (Central Admin and stsadm.exe backup), stsadm.exe's Site Export, SQL Server backups and third party products. As you are probably aware, there is no way to recover a single file from a SQL Server backup and Microsoft did not supply a means to extract a single file from a native SharePoint backup. One of the features of the third party tools is the ability to recover a single file, library or list item from their backup. Well it turns out that there is at least a reasonable easy to extract single files from SharePoint site exports.

The SharePoint CMP Export File
The SharePoint CMP backup file is a CAB file in disguise. Rename the .CMP file to .CAB and you can then explore the contents using Windows Explorer or extract it with extrac32.exe (C:\Windows\System32). Between the XML files and the DAT files SharePoint has recorded everything needed to recreate the site. Just an odd note... all of the files inside of the CAB file are date stamped "9/18/2002"!

Here is a sample export:

stsadm -o export -url http://yourservername/a_site -filename c:\a_sitebackup


The files in the CMP file include:

File
Contents
Notes
Manifest.xml The list of “everything” in the export
Lists, Libraries, ASPX files, Library files, etc
ExportSettings.xml
   
Requirements.xml
A list of “things” needed by the target server to support the exported site.
Language, site template, web parts and features
RootObjectMap.xml
The URL the site was backup up from
Example: Url="/SiteDirectory/walkthrough"
SystemData.xml
Top level details of the backup
SharePoint and database version numbers, manifest file name…
UserGroup.xml
Site users and groups
 
ViewFormsList.xml
A list of views and forms
(seems to be redundant as the same data is in Manifest.xml)
????????.dat
One DAT file for each file in the export
ASPX, Master, plus every file in document libraries and list attachments. Just rename and you have the original file

Restoring a Single File

(this is the manual approach… see below for a program to do it for you…) All of the files, including your library documents and every ASPX page in the site, are stored in the CMP file as sequentially named files starting with 00000001.dat. Each of these files is documented in the Manifest.xml file in their own SPObject node.

To find and extract a single file:
  1. Extract the Manifest.xml file:
    extrac32.exe backupfile.cmp /L outputPath manifest.xml /E /Y
    I.e. extrac32.exe C:\MySiteBackup.cmp /L C:\Temp manifest.xml /E /Y
  2. Open the Manifest file in Notepad of some other editor and then search for the filename. Keep searching until you find the element that includes the filename. In that element you will then find in the FileValue attribute the name of the file stored in the CMP file.
    Here’s what you might find:

    <File Url="Shared Documents/Walkthrough Presentation.pptx" Id="b089eb84-4bbd-4197-b14b-cdb9366f7b1c" ParentWebId="f41ba9ed-2298-45c4-a472-c166946776bc" ParentWebUrl="/SiteDirectory/walkthrough" Name="Walkthrough Presentation.pptx" ListItemIntId="1" ListId="bd6ac39c-8516-4f13-89d6-220a4b61d3e2" ParentId="eb24be0f-d49f-4e84-8b18-f8158bf47c63" TimeCreated="2006-09-08T02:41:09" TimeLastModified="2006-09-08T02:41:09" Version="1.0" FileValue="00000037.dat" Author="15" ModifiedBy="15">
      <Properties>
        <Property Name="vti_title" Type="String" Access="ReadWrite" Value="Walkthrough Presentation" />
        <Property Name="xd_Signature" Type="Boolean" Access="ReadWrite" Value="false" />
        <Property Name="ContentTypeId" Type="LongText" Access="ReadWrite" Value="0x0101002C9F3418595D5642A69D915990334521" />
        <Property Name="vti_parserversion" Type="String" Access="ReadOnly" Value="12.0.0.4518" />
        <Property Name="Slides" Type="Integer" Access="ReadWrite" Value="3" />
        <Property Name="vti_filetype" Type="String" Access="ReadWrite" Value="pptx" />
      </Properties>
    </File>

  3. Extract the file:
    extrac32.exe C:\MySiteBackup.cmp /L C:\Temp 00000037.dat /E /Y
  4. Rename the file and you are done! (from 000000037.dat to “Walkthrough Presentation.pptx”)
  5. If you need the metadata (custom columns) associated with the document then explore the section of the XML.
Restoring a List Items
List items are not stored as files. The full text of a list item is recorded in the XML. Simply follow the first two steps above to extract the Manifest file and then search for the list name or some text from the item. Look for a element with a URL (DirName) that matches your list.

Here’s a sample:

<ListItem FileUrl="Lists/Announcements/1_.000" DocType="File"

ParentFolderId="e893aa18-01a4-4e11-994f-7f58829154b5"

Id="31d6d1cc-86eb-4822-aad9-9d22b068641c"

ParentWebId="f41ba9ed-2298-45c4-a472-c166946776bc"

ParentListId="675b5d94-79a4-41a0-9714-819647b7b48b" Name="1_.000"

DirName="SiteDirectory/walkthrough/Lists/Announcements" IntId="1"

DocId="d1ad54e5-c884-465d-b80c-fea560ea3a37" Version="1.0"

ContentTypeId="0x010400ECBBB05350418B428CBF8BDE561A874B"

Author="1073741823" ModifiedBy="1073741823"

TimeLastModified="2006-09-08T00:26:59" TimeCreated="2006-09-08T00:26:59"

ModerationStatus="Approved">

  <Fields>

<Field Name="Title" Value="Get Started with Windows SharePoint Services!"

FieldId="fa564e0f-0c70-4ab9-b863-0177e6ddd247" />

<Field Name="_ModerationComments"

FieldId="34ad21eb-75bd-4544-8c73-0e08330291fe" />

<Field Name="File_x0020_Type"

FieldId="39360f11-34cf-4356-9945-25c44e68dade" />

<Field Name="Body" Value=""&lt;div class="ExternalClass0F1CE1F788734CE3A8682DBFA3F34E5F">Microsoft Windows SharePoint Services helps you to be more effective by connecting people, information, and documents. For information on getting started, see Help.</div>" FieldId="7662cd2c-f069-4dba-9e35-082cf976e170" />

<Field Name="Expires" Value="09/08/2006 00:26:57"

FieldId="6a09e75b-8d17-4698-94a8-371eda1af1ac" />

  </Fields>
</ListItem>

 


My SharePoint Export Explorer (and file extractor) Program

To help understand the Manifest file I wrote a small .Net program to list the SPObject elements and a handful of attributes from each one.

The first step is to open the CAB file and extract the Manifest.xml file. As I have always been amazed at the features found in the .Net libraries, I figured I would find a CAB extractor library in the Framework. Turns out there was one in the one of the Betas, there is not there now. (There is a Zip library!) So I ended up using the CAB extractor found in Windows, extrac32.exe. (For details type extrac32 /? at the command prompt.) So within the app I used System.Diagnostics.Process to run extrac32 to extract the files. The Manifest.xml file is then loaded into an XmlDocument object, parsed into a DataTable and then displayed in a GridView. I'll post a link the EXE and source shortly...

(What happened to the sample EXE and code? It was lost when my laptop crashed! I'll recreate it one day...)

It’s been found!  See here…


.

7 comments:

Anonymous said...

What ever happened to the app? I would love t get my hands on that.

Interesting article, similar to what I have done with smigrate.fwp files in the past.

Mike Smith said...

Opps... I'll try to find it and get it linked from here in the next few days...

Thanks
ike

Patrick said...

Yeah, would love to see the app myself. I'm creating cmp files for our site and would like to extract the documents.

Anonymous said...

Any news on the app? Would be very handy.

Anonymous said...

Great article! You have helped me to find a method of restoring a single file than having to restore the whole thing!

What happened to that app that you developed ?

KeithV said...

Okay - I did a restore a few weeks ago. Now I have a .. local settings\Temp\{GUID} folder with a huge number of .xml and .dat files. Eating up tons of space. It's okay to delete this folder?

Mike Smith said...

KeithV,

If you want me to make a guess, ;-), I'd say delete it.

But... it would be better, and safer, to post your question in the MSDN forums. Most likely this one:
http://social.msdn.microsoft.com/Forums/en-US/sharepoint2010setup/threads

Mike

Note to spammers!

Spammers, don't waste your time... all posts are moderated. If your comment includes unrelated links, is advertising, or just pure spam, it will never be seen.