Categories > OpenTBS with XLSX >

Trouble merging with multiple worksheets

The forum is closed. Please use Stack Overflow for submitting new questions. Use tags: tinybutstrong , opentbs
By: Ken Tanzer
Date: 2014-07-09
Time: 00:24

Trouble merging with multiple worksheets

Hi. I'm having trouble merging with an xlsx document, and am also a little confused by the documentation.  There are three key blocks in the document, each in a separate worksheet.  This is the structure of the XLSX file:

    testing: [Content_Types].xml      OK
    testing: _rels/.rels              OK
    testing: xl/_rels/workbook.xml.rels   OK
    testing: xl/workbook.xml          OK
    testing: xl/worksheets/sheet4.xml   OK
    testing: xl/worksheets/sheet2.xml   OK
    testing: xl/worksheets/sheet3.xml   OK
    testing: xl/worksheets/sheet1.xml   OK
    testing: xl/styles.xml            OK
    testing: xl/theme/theme1.xml      OK
    testing: xl/sharedStrings.xml     OK
    testing: docProps/core.xml        OK
    testing: docProps/app.xml         OK

When I load the file using LoadTemplate($filename), only one of the blocks/sheets is merged, and the others are untouched.  When I instead tried LoadTemplate($filename.'#xl/workbook.xml;xl/worksheets/sheet1.xml;xl/worksheets/sheet2.xml;xl/worksheets/sheet3.xml;xl/worksheets/sheet4.xml'), then none of my blocks are detected or merged.

When I run OPENTBS_DEBUG_INFO, I see the sheets:

Sheets in the Workbook:
-----------------------
  - id: 4, name: [Building], state: visible, file: xl/worksheets/sheet1.xml
  - id: 1, name: [Clients], state: visible, file: xl/worksheets/sheet2.xml
  - id: 2, name: [Totals], state: visible, file: xl/worksheets/sheet3.xml
  - id: 3, name: [Sheet3], state: visible, file: xl/worksheets/sheet4.xml


Can someone explain what my mistake is, or what I'd have to do differently? 

Also, I'm working on code that should be able to merge with any document, so what is the good approach to make sure all content is loaded/selected such that a merge will work properly?

Thanks.

Ken
By: Skrol29
Date: 2014-07-09
Time: 14:03

Re: Trouble merging with multiple worksheets

Hi Ken,

See
http://www.tinybutstrong.com/opentbs.php?doc#opentbs_select_sheet

OpenTBS cannot merge several sub-file in the same time.
By: Ken Tanzer
Date: 2014-07-09
Time: 18:52

Re: Trouble merging with multiple worksheets

Thank you.  I had read that section, but what you said "OpenTBS cannot merge several sub-file in the same time" I didn't see anywhere, and is helpful.  A few follow-up questions:

Is there any conceptual reason why a merge couldn't be made to work over a whole document (including several sub-files) all at once, similar to the way LoadTemplate automatically loads the sub-documents?  I am working on code  that would work for any type of document, and if I have a block to merge I just want it merged regardless of which sub-file it's in.

For now, for a spreadsheet, it looks like my best option is to loop through all the sheets, select each one, test if the block is found, and if so merge the block.  So is there any call I can make to get back either the names or a count of the sheets/slides in a document?  I know I can display it with OPENTBS_DEBUG_INFO, but I'd like to get it returned instead so my program can work with it.

On a related note, is there any direct way to check if a block exists (within your selected document)?  I'm using GetBlockSource() and testing the result, but don't know if there's a more direct way.

Finally, just to be clear, are the SHEETS and SLIDES commands interchangeable, or just conceptually similar?  Several of the explanations refer to sheets/slides, but does for example OPENTBS_DISPLAY_SHEETS work on a PowerPoint, or will it throw an error?

Thanks!

Ken


By: Skrol29
Date: 2014-07-09
Time: 23:34

Re: Trouble merging with multiple worksheets

> "OpenTBS cannot merge several sub-file in the same time"

Ok, it is not that clear in the documentation, I will change that.

> Is there any conceptual reason why a merge couldn't be made to work over a whole document (including several sub-files) all at once,

Yes, they are several :
- Loading all XML sub-files in an archive can use too much memory and time.
- Merging a block can be long if you don't know in which sub-file it is.
- Knowing all XML sub-file to load can be complicated.

> names or a count of the sheets/slides in a document?

$TBS->PlugIn(OPENTBS_COUNT_SLIDES);

OPENTBS_COUNT_SHEETS is for the next released which is in stable beta.

> test if the block is found, and if so merge the block

TBS has an optimization: if MergeBlock() does not find the block, then the method ends without data processing and without error. That is SQL statements are not queried.
Otherwise you can use the method GetBlockSource(), but it is not faster.

> Finally, just to be clear, are the SHEETS and SLIDES commands interchangeable, or just conceptually similar?

They are designed to be conceptually similar but the code is totally different. They are not interchangeable.
They are also designed to work consistently between Ms Office and LibreOffice.
Ms Office has sheets and slides in different sub-file, while LibreOffice has them in the same sub-file, but your code can work indifferently with a XLSX or an ODS for example.

> for example OPENTBS_DISPLAY_SHEETS work on a PowerPoint, or will it throw an error?
Some will make an error, some will do nothing.


By: Ken Tanzer
Date: 2014-07-10
Time: 04:57

Re: Trouble merging with multiple worksheets

Thanks for the follow additional information.  I'm confused and/or curious about this:

> Is there any conceptual reason why a merge couldn't be made to work over a whole document (including several sub-files) all at once,

Yes, they are several :
- Loading all XML sub-files in an archive can use too much memory and time.
- Merging a block can be long if you don't know in which sub-file it is.
- Knowing all XML sub-file to load can be complicated.

When you say load here, is it in a different sense than LoadTemplate?  Because LoadTemplate figures out which XML sub-files to load, and "loads" them.  Or maybe to ask another way, what does "loading" with LoadTemplate but not selecting actually do?


> test if the block is found, and if so merge the block

TBS has an optimization: if MergeBlock() does not find the block, then the method ends without data processing and without error. That is SQL statements are not queried.
Otherwise you can use the method GetBlockSource(), but it is not faster.

Does MergeBlock return false or something identifiable if a block is not found?  At least if I use GetBlockSource I can merge only once, in the sub-file in which it is found.

Thanks!
By: Skrol29
Date: 2014-07-10
Time: 12:30

Re: Trouble merging with multiple worksheets

Hi,

> When you say load here, is it in a different sense than LoadTemplate?
>  Because LoadTemplate figures out which XML sub-files to load, and "loads" them.
> Or maybe to ask another way, what does "loading" with LoadTemplate but not selecting actually do?

I agree this is confusing. Thanks to point this out.

With OpenTBS, the method LoadTemplate() actually load the dictionary of the archive and then do some automatic actions according to the type of document.
After this, OpenTBS use a technical of barrel for loading and merging sub-file from the archive.
Loading a sub-file from the archive is the same as doing a LoadTemplate() from a usual file (HTML/XML/Texte).
You can do LoadTemplate('#my_sub_file.xml'), or you can use OpenTBS commands dedicated for such purpose and that avoid sub-file and XML considerations.

> Does MergeBlock return false or something identifiable if a block is not found?

Unfortunately not. In case of block not found, MergeBlock() return 0, as if the block was found but merged with empty data.
Returning "false" would be a nice enhancement.
By: Ken Tanzer
Date: 2014-07-13
Time: 00:48

Re: Trouble merging with multiple worksheets

Hi.  Thanks for your continued help.  I was still confused by your response though.  What is a "technical of barrel?"

Also, regarding these three items you posted:

1) Loading all XML sub-files in an archive can use too much memory and time.
2) Merging a block can be long if you don't know in which sub-file it is.
3) Knowing all XML sub-file to load can be complicated.

For #1, it seems like it does load all the sub-files and keep them in memory, yes?  (in TbsStoreLst)
For #3, doesn't it already figure out which sub-files to load as part of the auto-load?
#2 makes complete sense to me, and is a good explanation all by itself!

Now, if you can bear with me for a second here, as a bit of context I'm working on a report generator that executes SQL queries, loads them into arrays, and then merges them with templates.   (Or displays results on the screen.)  It's intended so people can write queries, and templates to match, and not have to know how to program or change the code.  So the code will never know in advance which blocks are in which sub-files of a template, and needs to handle this automatically.

Going back to your #2, I can see that MergeBlock's ability to merge the same data with multiple blocks makes this problematic.  If you had 5 blocks to merge, and 5 subsheets, that's 25 operations you have to do.  In my case though, at least for now, I'd only ever be merging one block at a time.  So 1 block times 5 sheets doesn't seem so bad.

What I'm thinking I want/need are 2 functions "MergeABlockAnywhere", and "MergeAFieldEverywhere".  MergeABlockAnywhere, for example, would take a single block name, and merge it in the sheet(s?) where that block exists.

I've got a preliminary version of this working in my application code:

if ($sub_sheets) { // MSOffice Handling
  foreach($sub_sheets as $ss) {
    $doc->PlugIn(OPENTBS_SELECT_SHEET, $ss);
    if ($doc->GetBlockSource($block_name)) {
      $doc->MergeBlock($block_name,$block_vals[0]);
    }
  }
} else { // OpenOffice Document handling
  if ($doc->GetBlockSource($block_name)) {
    $doc->MergeBlock($block_name,$block_vals[0]);
  }
}

For the moment, I had to hard-code $sub_sheets for my testing document.

To make this really work, I'd need the ability to get the sub-file/sheet/slide info from TBS.

The rest of this I could leave in my application code, but for simplicity or cleanliness I was thinking of adding this all into the opentbs class.    I'm figuring there would probably also need to be some doctype-specific handling in there.

I'm wondering what you think of this general approach, or if there's a better or easier way.   Also, do you think this functionality would be of more general use to others, and if I did put it together would you have any interest in incorporating it into opentbs?

Thanks.

Ken

By: Skrol29
Date: 2014-07-15
Time: 10:29

Re: Trouble merging with multiple worksheets

> For #1, it seems like it does load all the sub-files and keep them in memory, yes?  (in TbsStoreLst)

Yes

> For #3, doesn't it already figure out which sub-files to load as part of the auto-load?

Not really.
The OpenTBS auto-load does only load typical sub-files.
For a DOCX, it's the main content and the main header and footer (a docx can have lot of header and footers, and comments, and embedded worksheets, and charts, ...)
For an XLSX, it's the first sheet (an xlsx can have many worksheet, charts, ...)
For a PPTX it's the first slide (a pptx can have many slides, master slides, ...).

> Now, if you can bear with me for a second here, as a bit of context I'm working on a report generator
> So the code will never know in advance which blocks are in which sub-files of a template, and needs to handle this automatically.

I had such solution to build.

> I'm wondering what you think of this general approach, or if there's a better or easier way.

I think the GetBlockSource() is useless since MergeBlock() does the check.
You could also add:
  $tbs->MergeField('info', $my_info)
in case you'd like to merge single informations.
 
In my case I also added to possibility to merge single pictures and single charts.

> Also, do you think this functionality would be of more general use to others,

As I has the same solution to build, and as I saw it was very helpful, I guess such solutions are kind of a usual need.

> I'm figuring there would probably also need to be some doctype-specific handling in there.

Unfortunately yes.

>and if I did put it together would you have any interest in incorporating it into opentbs?

They are some difficulties :
- you may want to merge data in other special parts (comments, headers, master slides, document properties, ...)
- you may want to merge other data (several blocks, data items, pictures, charts, ...)
- some coders does no worry about optimization

I think a built-in feature won't cover enough need and will bring lazy coder to complain about time execution.
I would be better to give an example of such a solution. It could be part of the demo in the OpenTBS package.