1/01/2010

SharePoint: Exploring SharePoint CMP Export Files (and a demo application)

 

I found the missing files! I've uploaded them to GitHub. See "Downloads" below. (These are the original files, unmodified since 2010.)


Nearly two years ago I wrote another blog on this topic and referred to a little application I had written to explore a CMP file that could also extract individual files from the CMP. Shortly after that, and before I could upload the app, the hard disk in my laptop died. I recovered most of the content of the drive, but had thought I had lost my Visual Studio directories. Guess what! I revisited that drive (I never throw things away…) looking for some old pictures and I found the missing app!

 

First go read the original article: Exploring SharePoint CMP Export Files. There you can see what’s in a CMP and how to manually extract the files.

 

To help understand the CMP Manifest file I wrote a small .Net program to list the SPObject elements and a handful of attributes from each one.


image

The first step is to open the CAB file and extract the Manifest.xml file. As I have always been amazed at the features found in the .Net libraries, I figured I would find a CAB extractor library in the Framework. Turns out there was one in the one of the Betas, there is not there now. (There is a Zip library though!) So I ended up using the CAB extractor found in Windows, extrac32.exe. (For details type “extrac32 /? | more” at the command prompt.) So within the app I used System.Diagnostics.Process to run extrac32 to extract the files. The Manifest.xml file is then loaded into an XmlDocument object, parsed into a DataTable and then displayed in a GridView.

The List Content button extracts the manifest.xml file from the CMP file and then extracts some of the descriptive content of the manifest file to display a fair amount of detail in the GridView. The manifest documents EVERYTHING about the site, so I added some checkboxes to filter the display.

  • Files Only – well… only displays manifest entries about files (including ASPX, master pages, and other non-content files)
  • Exclude ASPX – hides ASPX files (in the next update I’ll have it hide all non-content files)
  • Include List Items – I hid list content as a default as most of the interesting stuff is in the XML and is different for each list type. If you check this box then all list items are added to the grid. You can check scroll to the right and click in the XML column to see the full XML description of the list item.

Extract All Files extracts all of the files, then renames them and puts them into a SharePoint-like directory structure. Note that this example has three subsites in the backup.

             image

The extracted files are where you would expect them to be, Shared Documents in this example, and still have their correct Date Modified:

image

 

Extract Selected extracts individual files to any location you pick. To select a file click the “selectors” at the left of each row. To select multiple files use the normal click, shift-click, ctrl-click techniques.

            image

 

Downloads:

https://github.com/microsmith/SharePointBackupCMPExplorer

 

 

Downloads

I will probably most this into CodePlex.com, but for now you can download it from here:

  EXE only:   download  (9k)

  Project zip:  download  (107k)

 

.

16 comments:

Anonymous said...

Brilliant!!!!! Thank you for this great tool. It saves a ,lot of time and pain restoring my sharepoint files!

Frank said...

This is a great app for acquiring content. Any ideas how to overcome the dat change on the exported documents? The date modified changes to March 23 2003 on the documents.

Emily in Ecuador said...

Frank, I am also struggling with the .dat change. Did you figure out how to overcome it?

Anonymous said...

I tried your tool today. Almost worked perfectly. I had to add this to the source code to make it work:
case "File":
try
{
row["FileValue"] = fileNode.Attributes.GetNamedItem("FileValue").InnerText;
}
catch { }

home.aspx did not have a filevalue attribute so it excepted with a Not set to an instance of an object.

Als, use the c:\temp folder as it cannot handle long directory structures or the code has to be altered to do use quotes aroung c:\program files\this is my directory"
Other than that it work nice to recover the file I needed

Anonymous said...

THANK YOU. I was able to recover the file I needed.

Charles said...
This comment has been removed by the author.
Anonymous said...

Mike, with this tool you really saved my day! I have a dying Sharepoint server here and needed to get out all data but didn't want to use WebDAV for some reasons. It worked like a charm to create a backup with stsadm and then extract the files with your tool.
Many thanks and cheers from Germany, Alex.

Unknown said...

great job, Mike ...
but unfortunately the program seems not to work with cmp-files from Exchange 2013 ...

I'm unable to restore the files ;(
Seems like the files didn't get extracted successfully because the error occurs while renaming the file.

Any hints for me?

regards, Rico

Mike Smith said...

Rüdiger,

This was only tested with, and intended to be used with, SharePoint CMP files. I don't work with Exchange and can't offer any help there.

Mike

Unknown said...

Sry ... my fault ... ;)

Not Exchange 2013 ;)
SharePoint 2013 Foundation !

But the problem is gone ;)
Don't know why extraction failed yesterday.
Today all worked perfectly.

Thx, Mike for this great tool.
Made me getting a full backup of our different libraries separately without any issues ;)

regards

Rico

Unknown said...

Sry ... back again ;(

Did same procedure with another DocLib.
Got some more files like:
mydocs.cmp
mydocs1.cmp
...
mydocs8.cmp

by opening mydocs.cmp and list the items i got the content from the last successfully restored DocLib ...

thought, it was due to the "old" manifest.xml in the temp-folder ...
so i deleted the old one ...

After a fresh restart of SharePoint Backup Explorer i tried to open mydocs.cmp again.
But clicking "List Items" gave me : "Could not open C:\Temp\SharePointBackupExplorer\manifest.xml"
So it wasn't extracted?

Found the manifest.xml in mydocs8.cmp and extracted it to C:\Temp\SharePointBackupExplorer\manifest.xml and tried again ...
Unsuccessfully ... not all files were found ...

Do I need to "merge" the cmp-files manually to one file ?!

Hopefully awaiting an idea ;)

regards

Rico

Scot said...

Hi Mike,

Thanks for sharing your work. Nearly 12 years after your original post and it is still useful!

I ran into a similar problem that Rico described. Even though he posted a couple of years ago, I thought I would share what we did when our export consisted of multiple .cmp files.

One of our employees accidentally overwrote an Excel workbook in a SharePoint list, and unfortunately, the versioning had never been turned on and since it was replaced by another Excel file and not deleted, it did not exist in any user accessible SharePoint Recycle bins nor was it in the Site Settings Recycle Bin.

Fortunately we do daily backups of the content database. We were able to recover an earlier version of the database and use the SharePoint Central Administration Unattached Content Database Data Recovery tool to export the list where the Excel file once existed. The list had over 300 items/Excel files. It seems when the .cmp files are created, they are limited to a file size of about 25MB. In our case, we ended up with multiple .cmp files.

I surmised the manifest.xml was in the last .cmp file and your app was able to extract it and display all of the exported items, pages and files from the original list in the grid. The issue I ran into was that the .dat file I that corresponded to the Excel file I wanted existed in one of the other .cmp files, so the extract failed. It seems your application can only extract from the currently open .cmp file.

I manually merged the multiple .cmp files into a single .cmp file and then used your app to open the newly created .cmp file. I was then able to successfully extract the file I needed.

If you ever have the inclination or time to post your code to a repo on GitHub, I would love to contribute with an update that would either do an automatic merge of multiple .cmp files, or maybe if the extract routine doesn't find the .dat file in the current .cmp file, the application could look for other .cmp files in the current directory and try the extraction on each one until it succeeds.

The merge process is fairly simple:
1) Extract all of the .dat and .xml files for each of the numbered .cmp files.
2) Repackage all the files into a single .cab file with makecab.exe, but without any file size limitations.
3) change the extension on the new cabinet file from .cab to .cmp


Thanks again!

Scot said...

I learned a little bit more about cabinet files. In your code, each time the extrac32 exe is launched, you are including the /L, /E and /Y switches. I learned that if you also include the /A switch and you make sure the .cmp file you are including in the arguments is first one in the chain, extrac32 will process ALL cabinets if the export consisted of more than one .cmp file.

I discovered this tidbit here:

https://www.thewindowsclub.com/extract-cab-file-using-command-line

To make sure the first cabinet file is always the one passed into the arguments, I take the browsed .cmp file and strip off the extension, trim off any numbers with the string.TrimEnd method, (passing in an array of the digits 0 to 9,) and then add .cmp back onto the file name.

Again, my deepest appreciation for you sharing your application and the source code.

Russ Robinson said...

Is the cmp file viewer still available? It doesnt seem to download

JD said...

Hi Mike, looks like these links are down again. Would you still happen to have a copy of the utility you wrote?

Mike Smith said...

I found the missing files! I've uploaded them to GitHub. See "Downloads" below. (These are the original files, unmodified since 2010.)

https://github.com/microsmith/SharePointBackupCMPExplorer

Note to spammers!

Spammers, don't waste your time... all posts are moderated. If your comment includes unrelated links, is advertising, or just pure spam, it will never be seen.