Create in-memory OpenXml Document on-the-fly part-2

In create in-memory OpenXml Document on-the-fly part-1, We have create a .xlsx with a workbook, but that is not a valid document yet. A spreadsheet document must have spreadsheet, I will show how to do it in this article and how to add data to an empty spreadsheet. So stay put and here we go~

Adding worksheet is a little more work than adding a workbook, because it needs a name and a unique ID for the relationship. I generate a trivial name with an increment integer for worksheet name and use the autogenerated part ID as the relationship ID.I wrap these up in a method so that each time I need a new worksheet, I just invoke it and I am done.

/// <summary>
/// add a new worksheet to the package
/// with auto generate sheet name, id and relationship id
/// </summary>
/// <param name="package">package object</param>
public void AddNewWorksheet(SpreadsheetDocument package)
{
//add a worksheet to the workbook
WorkbookPart l_wbp = package.WorkbookPart;
var l_wsp = l_wbp.AddNewPart<WorksheetPart>();
GenerateWorksheetPart().Save(l_wsp);//get unique part id as relationship id
string l_rid = l_wbp.GetIdOfPart(l_wsp);//get unique sheetId
Workbook l_wb = l_wbp.Workbook;
Sheets l_sheets = l_wb.GetFirstChild<Sheets>();
UInt32Value l_sheetId = 1;
//count number of existing worksheets
foreach (Sheet l_sheet in l_sheets.Elements<Sheet>())
{
l_sheetId += 1;
}//generate sheetName
string l_sheetName = new StringBuilder("sheet").Append(l_sheetId.ToString()).ToString();
//add the relationship element to <sheets>
l_sheets.AppendChild<Sheet>(new Sheet() { Name = l_sheetName, SheetId = l_sheetId, Id = l_rid });
l_wb.Save(l_wbp);
}

Two things worth mentioning, remember to call Save() after you modified a part, a part is a .xml in your program, you must save it in order to make the modification effective. another thing is add System.Linq reference, part of my code needed it. It helps you to locate an element in a easier way. The method GenerateWorksheetPart() is copy from DocumentReflector, all default values, nothing special.

/// /xl/worksheets/sheet.xml
///
///
private static Worksheet GenerateWorksheetPart()
{
    var element =
        new Worksheet(    //root element worksheet
            new SheetDimension() { Reference = "A1" },
            new SheetViews(
                new SheetView() { TabSelected = true, WorkbookViewId = (UInt32Value)0U }),
            new SheetFormatProperties() { DefaultRowHeight = 16.5D },
            new SheetData(),
            new PhoneticProperties() { FontId = (UInt32Value)1U, Type = PhoneticValues.NoConversion },
            new PageMargins() { Left = 0.7D, Right = 0.7D, Top = 0.75D, Bottom = 0.75D, Header = 0.3D, Footer = 0.3D });
    return element;
}

Adding data is done by appending row element under sheetData element. You can use Worksheet.GetFirstChild() to get your sheetdata element. And then the structure of data is like this, rows can be under sheetData and cells can be under row. You will understand after reading the following code.

l_row = new Row() { RowIndex = (UInt32)rowIndex };
cellReference = columnName + rowIndex;    //A1 for example
l_cell = new Cell(new CellValue(col.ColumnName)) { CellReference = cellReference, DataType = CellValues.String };
l_row.AppendChild(l_cell);
columnName = getNextColName(columnName);

DataType can be assign using enum CellValues, no problem with that. getNextColName is a method I use to get the next column name, for example AZ will get BA.

/// 
/// get the next column name
/// 
/// origin column name e.g. AAA
/// next column e.g. AAB
private string getNextColName(string origin)
{
    return getNextColName(origin, origin.Length);
}

/// 
/// get the next column name
/// 
/// origin column name e.g. AAA
/// length of column name
/// next column e.g. AAB
private string getNextColName(string origin, int length)
{
    StringBuilder l_sb = new StringBuilder(origin.Length + 1);
    char l_endChar = origin.ToCharArray(length - 1, 1)[0];
    if (l_endChar == 'z' || l_endChar == 'Z')
    {
        if (origin.Length == 1)
        {
            return "AA";
        }
        //if ends with Z and length > 1
        l_sb.Append(getNextColName(origin, length - 1));
        l_sb.Append('A');
    }
    else
    {
        l_sb.Append(origin, 0, length - 1);
        l_sb.Append(getNextAlphabet(l_endChar));
    }
     return l_sb.ToString();
}

/// 
/// get the next alphabet, circular A->Z->A
/// 
/// origin Alphabet e.g. A
/// next Alphabet e.g. B
private char getNextAlphabet(char origin)
{
    int l_idx;
    Hashtable atoi = new Hashtable(26);
    char[] itoa =  'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z' };
     #region add mapping to hashtable
    atoi.Add('A', 0);
    atoi.Add('B', 1);
    atoi.Add('C', 2);
    atoi.Add('D', 3);
    atoi.Add('E', 4);
    atoi.Add('F', 5);
    atoi.Add('G', 6);
    atoi.Add('H', 7);
    atoi.Add('I', 8);
    atoi.Add('J', 9);
    atoi.Add('K', 10);
    atoi.Add('L', 11);
    atoi.Add('M', 12);
    atoi.Add('N', 13);
    atoi.Add('O', 14);
    atoi.Add('P', 15);
    atoi.Add('Q', 16);
    atoi.Add('R', 17);
    atoi.Add('S', 18);
    atoi.Add('T', 19);
    atoi.Add('U', 20);
    atoi.Add('V', 21);
    atoi.Add('W', 22);
    atoi.Add('X', 23);
    atoi.Add('Y', 24);
    atoi.Add('Z', 25);
    #endregion

    l_idx = (int)atoi[origin];
    //if the origin is Z, go back to A
    if (l_idx == 25)
    {
        l_idx = 0;
    }
    else
    {
        l_idx += 1;
    }
     return itoa[l_idx];
}

These three method is badly implement and I will tune them when I have time. So, this is all you need to make a .xlsx with data only. I will explain how to generate line chart and bar chart using drawingML in part 3, if I have time to do it.

Advertisements

Create in-memory OpenXml Document on-the-fly part-1

Requirement: Open XML SDK V2 CTP

Resource: OpenXml Developer

After you install this SDK, it contains 3 very useful tools which you help you a lot.

DocumentReflector – generate code from a document, the code will generate that document if you run it. It is very “hardcode” code but it gives you a good start.

OpenXmlClassesExplorer – tells you about parent-child relationship of those classes, the number of classes is vast, so this is the most import tool.

OpenXmlDiff – when you open your document and it is not correct, use this to compare with the origin version of it. Not really usefull.

I have only create spreadsheet document using open xml sdk, but I am sure the process is quite similar for word and powerpoint document.

It takes me days to understand the new document architecture of open xml, so let’s give you some background to save you some time.

All open xml documents, including spreadsheet, wordprocessing, powerpoint are actually .zip package. You can rename their extension to .zip, unarchive it to see what’s inside. 

The basic structure do not need to have folder, but you can use folder to make it more human understandable. the files that must be present are: .rels, document.xml

Let’s look at an example of .xlsx

_rels–.rels

xl–workbook.xml

xl–_rels–workbook.xml.rels

xl–worksheets–sheet1.xml

.rels is resposible for defining the relationships between contents. Act like glue to put those .xml together into a document. If you have a workbook.xml which reference to some worksheet .xml, then you will have a workbook.xml.rels to define that. Styles, Charts, Drawings, etc all works in the same way.

So, let’s start generating a .xlsx document. First step, I assume you are using some version of visual studio, I use VS 2008. Add two reference to you project. WindowsBase from .NET framework 3.0 and DocumentFormat.OpenXml from the Open XML SDK. It is because the Open XML SDK is base on DocumentFormat.OpenXml.Packaging which is in .NET framework 3.0.

Next Step, create your document package. Add the appropriate using statements, for example: 

using DocumentFormat.OpenXml.Packaging;
using DocumentFormat.OpenXml;
using DocumentFormat.OpenXml.Spreadsheet;

Create a memory stream for the package and add a workbook to it.

//create a package in memory
MemoryStream l_memStrm = new MemoryStream(512);
Package l_memPack = Package.Open(l_memStrm , FileMode.Create, FileAccess.ReadWrite);
SpreadsheetDocument l_package = SpreadsheetDocument.Create(l_memPack, SpreadsheetDocumentType.Workbook);
var workbookPart = l_package.AddWorkbookPart();
GenerateWorkbookPart().Save(workbookPart);

GenerateWorkbookPart() is a method that generate a workbookpart element. I get this from DocumentReflector and then remove unnecessary code. You can use mine if you want.

private static Workbook GenerateWorkbookPart()
{
var element =
new Workbook(   //root element workbook
new FileVersion() { ApplicationName = "xl", LastEdited = "4", LowestEdited = "4", BuildVersion = "4505" },   // fileversion, first child under workbook
new WorkbookProperties() { FilterPrivacy = true, DefaultThemeVersion = (UInt32Value)124226U },   //workbookpr, child under workbook
new BookViews(
new WorkbookView() { XWindow = 0, YWindow = 15, WindowWidth = (UInt32Value)19200U, WindowHeight = (UInt32Value)11865U }), //workbookview, child under bookviews
new Sheets(   //sheets, child under workbook, must have
), //sheet, child under sheets, must have
new CalculationProperties() { CalculationId = (UInt32Value)125725U });  //calcpr, the calculator engine version number, use to avoid implicit prompt for "save changes"
return element;
}

my comment pretty much explain the line of code. In part 2, I am going to add worksheet to make the spreadsheet complete and add data to it. see ya.