Synopsis of XML files stored in the archives supported by OpenTBS

Version 2020-05-10

Extensions: odt, odg, ods, odf, odp, docx, xlsx and pptx

This file is incomplete, feel free to send your own comments to: http://www.tinybutstrong.com/onlyyou.html

Table of Contents

LibreOffice & OpenOffice documents

That is extensions ODT, ODS, ODG, ODF, ODP, ODM.

All simple quotes "'" in texts are coded with "'" but they are automatically replaced by the OpenTBS plugin.

The main information is stored in the file 'content.xml'.

The pictures are stored in the directory 'Pictures' and should be registered into the file 'META-INF/manifest.xml'. (OpenTBS does it automatically for you when you use parameter "addpic").

Since OpenOffice 3.2, if a picture is not registered in the Manifest file, then it can produce a message error when opening the document.

Video and sound cannot be stored in OpenOffice documents.

TOP

Main file (content.xml)

Synopsis

<office:document-content>
   ...
   <office:body>
<office:text>
<text:p text:style-name="Standard">
   Normal new lines are made with a new paragraphs <text:p>...</text:p>
   Simple new lines are made with <text:line-break/>
   Tabs are made with <text:tab/>
   Page-breaks are made with a new paragraph having a style which has the attribute fo:break-before="page" or fo:break-after="page".
   Note that the page-break does not work if the attribute is in the paragraph element. A "break-before" at the first page, or a "break-after" on the last page has no effect.
   Local styles (bold, color,...) are made with <text:span text:style-name="T1">...</text:span>
</text:p>
<text:h>...</text:h> A paragraph typed as Header
   <text:list>...</text:list> A list of items
   <table:table>...</table:table> A table
   </office:text>
</office:body>
</office:document-content>

TOP

Sections

<text:p> ... </text:p>
<text:section text:style-name="Sect1" text:name="Section1">
<text:p> ... </text:p>
<text:p> ... </text:p>
</text:section>
<text:p> ... </text:p>

Slides in a presentation

<draw:page ....>
</draw:page>

TOP

Tables in a document

<table:table>
  <table:table-column ... />
  <table:table-column ... />
  <table:table-column ... />
  <table:table-row>
    <table:table-cell> ... </table:table-cell>
    <table:table-cell table:number-columns-spanned="2"> ... </table:table-cell>
    <table:covered-table-cell/> // virtual column that is covered by the span
    <table:table-cell> ... </table:table-cell>
  </table:table-row>
</table:table>

Cells merged vertically

<table:table-cell table:number-rows-spanned="2" office:value-type="string">
  <text:p>
    <text:span> CONTENT OF THE CELL </text:span>
  </text:p>
</table:table-cell>

Cell that are not displayed because of vertical merging or horizontal mergin, are replaced with <table:covered-table-cell/>. But such entities seems to be optionnal: LibreOffice display correclty the table if they are ommited.

Attributes table:number-rows-spanned="1" seems to be supported and has no effect.

Cells merged horizontally

<table:table-cell table:number-columns-spanned="2" office:value-type="string">
  <text:p>
    <text:span> CONTENT OF THE CELL </text:span>
  </text:p>
</table:table-cell>

<table:covered-table-cell/>

TOP

Comments

Comments in ODS workbooks (OpenOffice Calc)

<table:table-row table:style-name="ro2">

  <table:table-cell ...> ... </table:table-cell>

  <table:table-cell office:value-type="float" office:value="3.6">
    <office:annotation draw:style-name="gr1" draw:text-style-name="P1" svg:width="2.899cm" svg:height="0.991cm" svg:x="9.632cm" svg:y="2.786cm" draw:caption-point-x="-0.61cm" draw:caption-point-y="1.511cm">
      <dc:date>2011-02-02T00:00:00</dc:date>
      <text:p text:style-name="P1">Here is my comment.</text:p>
    </office:annotation>
    <text:p>3,60</text:p>
  </table:table-cell>

  <table:table-cell ...> ... </table:table-cell>

</table:table-row>

Comments in ODT documents (OpenOffice Write)

<text:p text:style-name="Standard">
   Here is the start of the text and now 
   <office:annotation>
     <dc:creator>Skrol29</dc:creator>
     <dc:date>2011-02-02T16:13:58.42</dc:date>
     <text:p text:style-name="P1">
       <text:span text:style-name="T1">Here is my comment.</text:span>
     </text:p>
   </office:annotation>
   comment is just inserted there.
 </text:p>

TOP

Headers and footers in a document

ODT and ODS

Header and footer contents are saved in the "sytles.xml" file: You can choose for each page to hide or display the header and footer. Nevertheless, the contents is always the same for each page.

<office:document-styles ...>
  <office:master-styles>
    <style:master-page ...>
      <style:header>
        <text:p text:style-name="Header">here is the header</text:p>
      </style:header>
      <style:footer>
        <text:p text:style-name="Footer">here is the footer</text:p>
      </style:footer> 
    </style:master-page>
  </office:master-styles>
</office:document-styles>

ODP

Header and footer contents are saved in "content.xml" file. They are defined as available styles and can be use for any slide and for the whole document in the "handout" view which is a set of several slide.

<office:presentation>

  <presentation:header-decl presentation:name="hdr1">Here is my header</presentation:header-decl>
  <presentation:footer-decl presentation:name="ftr1">Here is my footer for slide 1</presentation:footer-decl>
  <presentation:footer-decl presentation:name="ftr2">Here is my footer</presentation:footer-decl>

  <draw:page ... presentation:use-footer-name="ftr1">
  </draw:page>
  ...
</office:presentation> 

TOP

Charts

The location of the chart is defined in the main subfile "contents.xml".

<text:p text:style-name="Standard">
  <draw:frame draw:name="My name" text:anchor-type="paragraph" svg:x="2.401cm" svg:y="1.379cm" svg:width="14.102cm" svg:height="7.622cm" draw:z-index="0">
    <draw:object xlink:href="./Object 1" xlink:type="simple" xlink:show="embed" xlink:actuate="onLoad" />
    <draw:image xlink:href="./ObjectReplacements/Object 1" xlink:type="simple" xlink:show="embed" xlink:actuate="onLoad" />
    <svg:title>My title</svg:title>
    <svg:desc>My description</svg:desc>
  </draw:frame>
</text:p>

An image preview is saved in the file "ObjectReplacements/Object 1", and this image will be displayed automatically instead of the real Chart view until the chart is manually changed in the document.

In order to avoid this preview, the followings must be deteled :

Data and properties of the chart are saved into the corresponding subdirectory "Object 1".

Data are saved in "Object 1/contents.xml" in a local table that groups data of all series of the chart.

If the document is an ODS workbook then the local table is still in the chart, but it is used as a cache for data. And series has a reference to cells in the workbook instead of cells in the local table.

With ODS, if the cached data are modified then, it is useless since the software will update the cached data with the cells in the workbook.

With ODS, if references of a series are replaced with references of the local table, then the document will be opened with no error but the software will assume that all series are linked to the local table.

<table:table table:name="local-table">

  <table:table-header-columns>
    <table:table-column />
  </table:table-header-columns>
  <table:table-columns>
    <table:table-column table:number-columns-repeated="3" />
  </table:table-columns>

  <table:table-header-rows>
    <table:table-row>
      <table:table-cell><text:p /></table:table-cell>
      <table:table-cell office:value-type="string"><text:p>column 1</text:p></table:table-cell>
      <table:table-cell office:value-type="string"><text:p>column 2</text:p></table:table-cell>
      <table:table-cell office:value-type="string"><text:p>column 3</text:p></table:table-cell>
    </table:table-row>
  </table:table-header-rows>

  <table:table-rows>

    <table:table-row>
      <table:table-cell office:value-type="string"><text:p>line 1</text:p></table:table-cell>
      <table:table-cell office:value-type="float" office:value="9.1"><text:p>9.1</text:p></table:table-cell>
      <table:table-cell office:value-type="float" office:value="3.2"><text:p>3.2</text:p></table:table-cell>
      <table:table-cell office:value-type="float" office:value="4.54"><text:p>4.54</text:p></table:table-cell>
    </table:table-row>

    <table:table-row>
      <table:table-cell office:value-type="string"><text:p>line 2</text:p></table:table-cell>
      <table:table-cell office:value-type="float" office:value="2.4"><text:p>2.4</text:p></table:table-cell>
      <table:table-cell office:value-type="float" office:value="8.8"><text:p>8.8</text:p></table:table-cell>
      <table:table-cell office:value-type="float" office:value="9.65"><text:p>9.65</text:p></table:table-cell>
    </table:table-row>

    <table:table-row>
      <table:table-cell office:value-type="string"><text:p>line 3</text:p></table:table-cell>
      <table:table-cell office:value-type="float" office:value="3.1"><text:p>3.1</text:p></table:table-cell>
      <table:table-cell office:value-type="float" office:value="1.5"><text:p>1.5</text:p></table:table-cell>
      <table:table-cell office:value-type="float" office:value="3.7"><text:p>3.7</text:p></table:table-cell>
    </table:table-row>

    <table:table-row>
      <table:table-cell office:value-type="string"><text:p>line 4</text:p></table:table-cell>
      <table:table-cell office:value-type="float" office:value="4.3"><text:p>4.3</text:p></table:table-cell>
      <table:table-cell office:value-type="float" office:value="9.02"><text:p>9.02</text:p></table:table-cell>
      <table:table-cell office:value-type="float" office:value="6.2"><text:p>6.2</text:p></table:table-cell>
    </table:table-row>

  </table:table-rows>

</table:table>

TOP

Textboxes

<text:p text:style-name="Standard">
  <draw:frame draw:style-name="fr1" draw:name="Frame1" text:anchor-type="paragraph" svg:width="4.535cm" draw:z-index="0">
    <draw:text-box fo:min-height="2.461cm">
      <text:p text:style-name="Frame_20_contents">Message in a textbox</text:p>
    </draw:text-box>
  </draw:frame>
  Usual text
</text:p>

TOP

Special to ODF (OpenOffice Math Formula)

Any comment in the formula must be entered between text delimiters which are the double quotes ("). Newlines are made with the keyword 'newline' outside the text delimiter.

TOP

Pictures/Images

Binary contents is saved as a file in "Pictures/".

Short synopsis of the control in the document:

<text:p ...>
  <draw:frame draw:style-name="fr1" draw:name="images1" text:anchor-type="paragraph" svg:width="0.847cm" svg:height="0.847cm" draw:z-index="0">
    <draw:image xlink:href="Pictures/100000000000002000000020A0D29467.jpg" xlink:type="simple" xlink:show="embed" xlink:actuate="onLoad"/>
  </draw:frame>
</text:p>

TOP

Fields

Here is an example with a conditional field.

<text:p>
  Here is a field:
<text:conditional-text text:condition="0=1" text:string-value-if-true="Test is true" text:string-value-if-false="Test is false" text:current-value="false">
Test is false
</text:conditional-text>
<text:s/>
</text:p>

Microsoft Office documents

That is documents with extension DOCX, XLSX, PPTX.

TOP

Microsoft Word document (.DOCX)

The main file is usually "word/document.xml", but its actual location is defined in the file "[Content_Types].xml", in the element: <Override PartName="/word/document.xml" ContentType="application/vnd.openxmlformats-officedocument.wordprocessingml.document.main+xml"/>

Note: I've test to change the "word/document.xml" name in both the "[Content_Types].xml" file and the archive, but this makes Word 2010 to be unable to open the document, saying it is corrupted.

TOP

The main file "word/document.xml" (DOCX)

<w:document>
  <w:body>

    <w:p> // New paragraph

      <w:pPr> // Parameters of the paragraph
        <w:rPr> ... </w:rPr> // Set of parameters for a Run
        <w:sectPr> // Start a new section. Sections are a set of page layout (margin, columns, ...) available until the next section.
          <w:type w:val="continuous"/> // May be present whe the section is defined manually.
        </w:sectPr>
        <w:pageBreakBefore/> // Page break before the paragraph (way #1)
      </w:pPr>

     <w:r> // New run item. A run item is a set of content having common layout properties.
       <w:rPr> // Set of parameters for a Run. Examples: <w:i/> is italic, <w:b/> is bold. </w:rPr>
         <w:t>Your text is here</w:t>
          // Simple new lines are made with <w:br/>
          // Page breaks can also be made with <w:br w:type="page"/> (way #2)
     </w:r>

     <w:tab/> // Tabs are placed between <w:r> elements.

     <w:r>
       <w:t xml:space="preserve"> Next text </w:t>
     </w:r> // spaces between entities are dealt using attribute xml:space="preserve"

   </w:p>

  </w:body>
</w:document>

What are attributes "w:rsidR" and "w:rsidRPr" for?

Attribute "w:rsidR" is a Revision ID. Each new user on a doc has a new id, and each of its modification is marked with its RsID.

More info: http://blogs.msdn.com/brian_jones/archive/2006/12/11/what-s-up-with-all-those-rsids.aspx

TOP

Tables (DOCX)

<w:p>...</w:p>

<w:tbl> // a table takes the same place as a paragraph
  <w:tblPr></w:tblPr>
  <w:tblGrid>
    <w:gridCol ... /> // principally define colmun widths
    <w:gridCol ... />
  </w:tblGrid>
  <w:tr>
    <w:tc> ... </w:tc>
    <w:tc> ... </w:tc>
    ...
  </w:tr>
</w:tbl>

<w:p>...</w:p> 
Cells merged vertically
<w:tc> 
  <w:tcPr>
    <w:vMerge w:val="restart"/>  // marks the cell to start a new cell-merging
    <w:vMerge w:val="continue"/> // marks the cell to continue the cell-mergin (the cell is merged with a previous one having "restart" or "continue")
    <w:vMerge/> // same as above
    // no <w:vMerge> entity means the cell is not merged. 
  </w:tcPr>
  ...
</w:tc>
Cells mertged horizontally
<w:tc> 
  <w:tcPr>
    <w:gridSpan w:val="2"/> // It works like "colspan" in HTML.
  </w:tcPr>
  ...
</w:tc>

TOP

Headers and footers (DOCX)

They are 3 types of headers and footers in Microsoft Word : Default, Even (for even numbered pages only) and First (for the first page only). Event and First types are optional.

Each section of the document may have its own set of header/footer of the 3 types, but by default a new section has his headers/footers linked with the previous sections.

Each headers and footers are saved in separated XML file. If no header/footer is defined for the document, then they are no header nor footer XML files. Even and First headers are optional, they may not be defined for a section, and so have no corresponding XML files.

Example of header and footer files: "word/header1.xml" and "word/footer1.xml".

The actual type and locations of Headers and Footers are defined in the main document "word/document.xml"with the section's properties.

<w:sectPr>
  <w:headerReference w:type="default" r:id="rId13"/>
  <w:footerReference w:type="default" r:id="rId14"/>
  <w:headerReference w:type="first" r:id="rId15"/>
  <w:footerReference w:type="first" r:id="rId16"/>
  <w:headerReference w:type="even" r:id="rId17"/>
  <w:footerReference w:type="even" r:id="rId18"/>
<w:sectPr>
Relation files

The information related to r:id are stored in the file "word/_rels/document.xml.rels"

<Relationships xmlns="http://schemas.openxmlformats.org/package/2006/relationships">
  <Relationship Id="rId8" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/image" Target="media/image1.png"/>
  <Relationship Id="rId13" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/header" Target="header1.xml"/>
  ...
</Relationships>

Locations are also appering in the file "[Content_Types].xml", in the elements like:

<Override PartName="/word/header1.xml" ContentType="application/vnd.openxmlformats-officedocument.wordprocessingml.header+xml"/>
   // and
<Override PartName="/word/footer1.xml" ContentType="application/vnd.openxmlformats-officedocument.wordprocessingml.footer+xml"/>

Some referenced header/footers may have no actual files because of no data.

Example of header source:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>

<w:hdr ...>
  <w:p>
    <w:pPr><w:pStyle w:val="my_header"/></w:pPr>
    <w:r>
      <w:t>here is the text of the header</w:t>
    </w:r>
  </w:p>
</w:hdr>

TOP

Comments, footnotes and endnotes (DOCX)

Like headers and footers, they are aslo saved in separated XML files.

TOP

Bookmarks

<w:p>
  <w:r><w:t xml:space="preserve">Here is a </w:t></w:r>
  <w:bookmarkStart w:id="0" w:name="mybookmark"/>
  <w:r><w:t>bookmark</w:t></w:r>
  <w:bookmarkEnd w:id="0"/>
  <w:r><w:t>.</w:t></w:r>
</w:p>

TOP

Textboxes

<w:p>
  <w:r>
    <mc:AlternateContent> // indicate a liste of different possible choices

      <mc:Choice Requires="wps"> // first choice and its condition
        <w:drawing>
          <wp:anchor ...>
            ...
          </wp:anchor>
        </w:drawing>
      </mc:Choice>

      <mc:Fallback> // last choice if no condition is true
         <w:pict>
          <v:shapetype ...>
          ...
          </v:shapetype>
          <v:shape ...>
            <v:textbox ...>
              <w:txbxContent>
                <w:p> <w:r> <w:t>Here s a text box.</w:t>  </w:r> </w:p>
              </w:txbxContent>
            </v:textbox>
          </v:shape>
        </w:pict>
      </mc:Fallback>

    </mc:AlternateContent>
  </w:r>
</w:p>

TOP

Charts (DOCX)

The first chart is saved under "word/charts/chart1.xml", and so on for the next ones.

There is two way of storing the chart's data : directly in the chart (data is qualified as literal) or in an internal Excel spreadsheet (data is qualified as referenced).

By default, the software stores data in an internal Excel Spreadsheet embedded in the DOCX document
For example: "word/embeddings/Worksheet_Microsoft_Excel1.xlsx".). And the path of the Excel file is saved into "word/charts/_rels/chart1.xml.rels".
The chart file contains both the references to cells in the spreadsheet and the cached values.

If the cached values are modified, then the software will update them according to the internal Excel Spreadsheet when the document is opened.

If the internal Excel Spreadsheet and its link to the chart are deleted, then the software will open the docuement with no error, and will use the caced values as if it was literal values.

It possible to manually change the definition of a series to have it litral inseat of referenced.

Title of the chart, the axes and the series are saved in "chart1.xml". Other custom text boxes are saved in a shape file. For example : "word/drawings/drawing1.xml".

Example of a series saved in the XML (the tags are different for an XY series):

<c:ser>

  <c:idx val="0"/>
  <c:order val="0"/>

  <c:tx>
    <c:strRef>
      <c:f>Sheet1!$A$2</c:f>
      <c:strCache><c:ptCount val="1"/><c:pt idx="0"><c:v>Here is the name of the Series</c:v></c:pt></c:strCache>
    </c:strRef>
  </c:tx>

  <c:spPr>
    <a:solidFill><a:srgbClr val="9999FF"/></a:solidFill>
    <a:ln w="12700"><a:solidFill><a:srgbClr val="000000"/></a:solidFill><a:prstDash val="solid"/></a:ln>
  </c:spPr>
  <c:invertIfNegative val="0"/>

  <c:cat>
    <c:strRef>
      <c:f>Sheet1!$B$1:$E$1</c:f>
      <c:strCache>
        <c:ptCount val="4"/>
        <c:pt idx="0"><c:v>Category A</c:v></c:pt>
        <c:pt idx="1"><c:v>Category B</c:v></c:pt>
        <c:pt idx="2"><c:v>Category C</c:v></c:pt>
        <c:pt idx="3"><c:v>Category D</c:v></c:pt>
      </c:strCache>
    </c:strRef>
  </c:cat>

  <c:val>
    <c:numRef>
      <c:f>Sheet1!$B$2:$E$2</c:f>
      <c:numCache>
        <c:formatCode>General</c:formatCode>
        <c:ptCount val="4"/>
        <c:pt idx="0"><c:v>20.399999999999999</c:v></c:pt>
        <c:pt idx="1"><c:v>27.4</c:v></c:pt>
        <c:pt idx="2"><c:v>90</c:v></c:pt>
        <c:pt idx="3"><c:v>20.399999999999999</c:v></c:pt>
      </c:numCache>
    </c:numRef>
  </c:val>

</c:ser>

TOP

Pictures (DOCX)

Binary contents is saved as a file in "word/media/".

Image link saved into "word/_rels/document.xml.rels": <Relationship Id="rId6" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/image" Target="media/image1.png"/>

They are two ways to insert a picture.

1) DrawingML (the new way that includes 2D/3D effects)

(short synopsis)

The <w:drawing> element is always included into an dedicated <w:r> element, itself always contained into a <w:p> element.

This is true even if the picture is positioned as a character or over the text. And it doesn’t matter on what is anchored the picture.

<w:drawing> 
  <wp:inline ...>
    <wp:extent cx="1130400" cy="1512000" /> // this element gives the size of the shape box that contains the picture
    <wp:docPr   
  id="1"
name="Image 1"
 descr="My description" // Alternative Text for the picture
 title="My title" // Title coming with the Alternative Text, no more available with Ms Office 2019
 /> <wp:docPr id="1" name="Image 0" descr="2884414-my_picture.jpg"> <a:hlinkClick xmlns:a="http://schemas.openxmlformats.org/drawingml/2006/main" r:id="rId4" tooltip="My tooltip"/> // this optional element gives the URL and tooltip of the link if any. In Word 2007 you cannot customize the title and desciption </wp:docPr> <a:graphic ...> <a:graphicData ...> <pic:pic ...>

 <pic:nvPicPr> // optional element found in version in 2022, the description is duplicated from above <pic:cNvPr id="1" name="Image 1" descr="My description"/> <pic:cNvPicPr/> </pic:nvPicPr>

 <pic:blipFill> <a:blip r:embed="rId4" /> // this element contains the link to the picture internal file </pic:blipFill> <pic:spPr ...> <a:xfrm> <a:off x="0" y="0" /> <a:ext cx="1130400" cy="1512000" /> // this element gives the size of the picture inside its shape box, it can be rescaled to fit in the shape box </a:xfrm> <a:prstGeom prst="rect"><a:avLst /></a:prstGeom> <a:noFill /> <a:ln><a:noFill /></a:ln> </pic:spPr> </pic:pic> </a:graphicData> </a:graphic> </wp:inline> </w:drawing>

Special contents for an SVG picture:

<pic:blipFill>
    <a:blip r:embed="rId4"> // link to an PNG picture that is a cached picture of the actual SVG picture

 <a:extLst> <a:ext uri="{28A0092B-C50C-407E-A947-70E740481C1C}"> <a14:useLocalDpi xmlns:a14="http://schemas.microsoft.com/office/drawing/2010/main" val="0"/> </a:ext> <a:ext uri="{96DAC541-7B7A-43D3-8B79-37D633B846F1}"> <asvg:svgBlip xmlns:asvg="http://schemas.microsoft.com/office/drawing/2016/SVG/main" r:embed="rId5"/> // link to the actual SVG </a:ext> </a:extLst>

 </a:blip> <a:stretch> <a:fillRect/> </a:stretch> </pic:blipFill>

Tech note: if value rId4 is replaced with rId5 then document is valid for both Ms Office and LibreOffice.

2) VML (old way)

(short synopsis)

<w:pict>
  <v:shapetype ...>
  ...
  </v:shapetype>
  <v:shape style="width:89.25pt;height:119.25pt" ...> // this element contains the size of the picture
    <v:imagedata r:id="rId6" o:title=""/> // this element contains the link to the picture internal file
  </v:shape>
</w:pict>

TOP

Fields (DOCX)

Simple fields

    <w:p>
<w:fldSimple w:instr="AUTHOR">
<w:r>
<w:t>Marcel Proust</w:t>
</w:r>
</w:fldSimple>
</w:p>

Complex fields

    <w:p>

<w:r>
<w:t>Some texte here </w:t>
</w:r>

<w:r>
<w:fldChar w:fldCharType="begin"/> // marks the start of a complex field
</w:r>

<w:r>
<w:instrText xml:space="preserve"> IF 0 = 1 "Test is true." "Test is false." \* MERGEFORMAT </w:instrText> // Example of a IF field.
</w:r>

<w:r>
<w:fldChar w:fldCharType="separate"/> // specifies that the character is a separator character which defines the end of the field codes or function and the start of the field result.
</w:r>

<w:r>
<w:rPr>
<w:noProof/>
</w:rPr>
<w:t>Test is false.</w:t>
</w:r>

<w:r>
<w:fldChar w:fldCharType="end"/> // marks the end of a complex field
</w:r>
<w:r>
<w:t>Some other texte here </w:t>
</w:r>
</w:p>

TOP

Microsoft Excel spreadsheet (XLSX)

General

An Excel workbook can have one or several worksheets. The contents of cells are saved in worksheets. Worksheets files are named "xl/worksheets/sheet1.xml", and also sheet2.xml, sheet3.xml...

The file names are not the names defined in Excel by the user, they are internal names. But it seems that there is always at least a worksheet named "sheet1.xml".

All string values of cells are stored in the file "xl/sharedStrings.xml". The cells contains in fact the index of the string in the sharedStrings.xml file. This separation will probably make difficulties to merge an Excel sheet.

All sheets of the workbook are listed in the file "xl/workbook.xml".

Synopsis of a sheet file like "xl/worksheets/sheet1.xml" (XLSX)

<worksheet>
   ...
   <sheetData>

     <row r="2" spans="2:2" ht="90">
       // A range of one row in wich several cells are defined
       <c r="B2" s="1" t="s">
         /* Definition of a cell:
          * Attribute r is the address if the cell in the sheet (format A1). This attribute is optional.
          * Attribute s is the style of the cell (the format). Styles are saved into the file 'xl/styles.xml' but I have not found the link yet.
          * Attribute t is the type of data, by default it is numerical
          * t="s" means that the displayed value is a string, the saved value is the index if the string taken in file "sharedStrings.xml".
          * b: boolean, d: date, e: error, n: number, s: shared string, inlineStr: inlinde string, str: string as the result of a formula
          */
         <f>B13+B14</f> // The formula if any. If there is no formula, this tag is absent. The type of <c> is the type of the result.
         <v>0</v> // The inner value without formatting.
                  // If t="s"   then the value is in fact the index of the string in the "xl/sharedStrings.xml" file.
                  // If t="str" then the value is the string result of the formula.
       </c>
       <c r="C2" s="1" t="inlineStr">
         /* The type "inlineStr" is a special value that allows the string to be stored in the cell instead of in the file "sharedStrings.xml".
          * It is used by OpenTBS to transert string with TBS fields from "sharedStrings.xml" into the XML of the sheet.
          */
          <is><t>This is a string</t></is>
       </c>
    </row>

  </sheetData>
</worksheet>

Synopsis of the Shared String file "xl/sharedStrings.xml": (XLSX)

<sst>
  <si>
    <t>value or text</t>
  </si>
  <si>
    <t>value or text</t>
  </si>
</sst>

TOP

Headers and footers (XLSX)

Like as DCX, they can be up to 6 headers/footers for each sheet.
Header and footers are saved in the sheet file.

<headerFooter differentOddEven="1" differentFirst="1">
  <oddHeader>My header for odd page in this sheet</oddHeader>
  <evenHeader>My header for even page in this sheet</evenHeader>
  <firstHeader>My header for first page in this sheet</firstHeader>
</headerFooter>

TOP

Charts (XLSX)

Same as Charts for DOCX, but there is no embedded spreadsheet, data are referenced to cells in any worksheet of the current workbook.

As for DOCX, data which is referenced to cells can be manually modified into literal data in a chart.

TOP

Pictures (XLSX)

Binary contents is saved as a file in "xl/media/".

The presence of pictures in the sheet is mentioned with a single <drawing> entity at the bottom of the <worksheet> entity. The <drawing> entity carries a reference id, which is defined the Rels file of the sheet. All properties of all the pictures in a sheet are finally saved in a third XML file.

File "xl\worksheets\sheet1.xml"
<worksheet ...>
   ...
   <drawing r:id="rId2"/> // (only one entity for all pictures in the sheet)
</worksheet> 
File "xl\worksheets\_rels\sheet1.xml.rels"
<Relationship Id="rId2" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/drawing" Target="../drawings/drawing1.xml"/>
// (only one entity for all pictures in the sheet)
File "xl\drawings\drawing1.xml"

(short synopsis, one entity per picture in the sheet)

<xdr:twoCellAnchor ...>
  <xdr:pic>
    <xdr:nvPicPr>
      <xdr:cNvPr id="2" name="Image 1" descr="xxx my description" title="xxx my title"/>
    </xdr:nvPicPr>
    <xdr:blipFill>
      <a:blip xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships" r:embed="rId1">
      </a:blip>
    </xdr:blipFill>
  </xdr:pic>
</xdr:twoCellAnchor>

TOP

Microsoft PowerPoint presentation (PPTX)

General

Think to set all texts to "Tools » Language » No check" when you edit the PowerPoint presentation, otherwise some TBS fields can be split by XML tags about the language and spell checking.

Slides are listed in the file "ppt/_rels/presentation.xml.rels", where an internal id is affected to them.

The first slide is quite always corresponding to the file "ppt/slides/slide1.xml".

Synopsis of a slide file like "ppt/slides/slide1.xml": (PPTX)

<p:sld>
  <p:cSld>
    <p:spTree>
      <p:sp>
      
        <p:txBody>
          <a:p>
            <a:pPr eaLnBrk="1" hangingPunct="1"/>
            <a:r>
              <a:t>Some text here</a:t>
            </a:r>
          </a:p>
        </p:txBody>
      
      </p:sp>
    </p:spTree>
  </p:cSld>
</p:sld>

TOP

Headers and footers (PPTX)

They can be headers and footers for each slide, and aslo an header and footer for the "handout" view, wich is a set of several slides.

Header and footer of each slide is saved in the sub-file of the slide ("ppt/slides/slide1.xml" for example).

Header and footer of the handout is saved ins ubs-file "ppt/handoutMasters/handoutMaster1.xml".

TOP

Charts (PPTX)

Same as Charts for DOCX.

TOP

Pictures (PPTX)

Binary contents is saved as a file in "ppt/media/".

(short synopsis)

<p:pic>
  <p:nvPicPr>
    <p:cNvPr id="4" name="Image 3" descr="My description" title="My title">  // this optional element gives the description and title of the image
      <a:hlinkClick r:id="rId2" action="ppaction://hlinkfile" tooltip="My toolip"/>
    </p:cNvPr>
  </p:nvPicPr>
  <p:blipFill>
    <a:blip r:embed="rId2">
    </a:blip>
  </p:blipFill>
  <p:spPr>
    <a:xfrm>
      <a:off x="2667000" y="4365104" />
      <a:ext cx="3810000" cy="1200150" />
    </a:xfrm>
  </p:spPr>
</p:pic>