Categories > OpenTBS with DOCX >

OpenTBS merging docx template with HTML formatted source

The forum is closed. Please use Stack Overflow for submitting new questions. Use tags: tinybutstrong , opentbs
By: Vincent T
Date: 2015-08-18
Time: 20:14

OpenTBS merging docx template with HTML formatted source

Hello,

I have a script generating text formatted with HTML markups (for ex "<strong>Hello</strong> World!")

I would like to merge this in a docx template. If I make a classic text merge then the HTML markups will not be interpreted (for ex a [onshow.message] field).... how can I merge so that HTML gets interpreted and produce bold,italic, etc.... features on the final .docx file ?

Thanks in advance for your help

EDIT : I have tried to use strconv=no but then the exported docx is not anymore readable in ms word :/

EDIT2 : I also tried with mergeblock :
1) $TBS->MergeBlock('block1', 'text', "simple block replace");   => works fine
2) $TBS->MergeBlock('block1', 'text', "simple <strong>bold</strong> replace");   => MS Word can't open the new document

EDIT3 : I had a look at the XML generated using $TBS->PlugIn(OPENTBS_DEBUG_XML_SHOW); Whats surprising is that "simple <strong>bold</strong> replace" is replacing the original block name in the xml, but the file won't open in ms word :/

EDIT4 : Trying different things, I left by mistake a "<" alone like that : "simple bold< replace"  and it failed, so it looks like this particular letter makes it fail

EDIT5 : Now I found the issue, but don't know how to solve it... I have tried the following : $TBS->MergeBlock('block1', 'text', 'simple </w:t></w:r><w:r><w:rPr><w:color w:val="800000"/><w:i/></w:rPr><w:t>bold</w:t></w:r><w:r><w:rPr><w:color w:val="800000"/></w:rPr><w:t> replace');
And it works!!! So it looks like when I try to merge text, it is expecting to receive Word XML formatting info ... so I guess I should merge differently.... I mean something else than text.... but what ?
By: Skrol29
Date: 2015-08-19
Time: 12:55

Re: OpenTBS merging docx template with HTML formatted source

Hi VincentT,

What to trying to do is to convert HTML to DOCX XML.
This a large problem that OpenTBS cannot do.

By default, OpenTBS replaces the special characters like < and >.
With parameter "strconv=no" special characters are not replaced so the content must be valid for DOCX XML.
With MergeBlock() and 'text' source then special characters are not replaced also, so the content must be valid for DOCX XML as well.

If you actually need to have an HTML to DOCX-XML conversion, then you have to use a parameter "onformat" and custom function for the conversion.
As it is described in this similar problem:
http://www.tinybutstrong.com/forum.php?thr=3502
By: Vincent T
Date: 2015-08-19
Time: 13:48

Re: OpenTBS merging docx template with HTML formatted source

Thanks a lot for the answer.
So basically I have to make a conv function for <p>, <br> <i> <b> <u> ...

I've read that it is quite much risky to merge a bunch of text this way as MS-Word usually slice paragraphs into different blocks, even words sometimes.... what's your opinion on that, knowing that I'll have to insert text that can consist of several paragraphs, with tabulation, bold, italic and underlined text.

Another possibility would be to convert the HTML I have into base64... would a merge work this way ?
By: Skrol29
Date: 2015-08-26
Time: 17:31

Re: OpenTBS merging docx template with HTML formatted source

The problem you have do joins the general problem of converting a rich text into one format to another format.
This problem is complicated for several reasons that is why some CMS have some solutions like BBCode or Textile.

So my first advice would be to not saved rich text in HTML unless the target is actually only and for ever HTML.

And you have to consider also how OpenXML (that is the XML format for DOCX) is complicated for managing display formating. Each combination of <i><b><u> must be stored into named format saved internally to the DOCX and apply its corresponding slice of text. So we cal tell that it's a big work.
By: Vincent T
Date: 2015-08-26
Time: 18:57

Re: OpenTBS merging docx template with HTML formatted source

Yes that's what I went into during my researches. And I finally decided to opt for another approach. I am generating formatted content through TinyMCE and then converting it to .doc files following this tutorial :
http://sebsauvage.net/wiki/doku.php?id=word_document_generation

And it works perfectly. I searched quite much and that's imo the best free way to convert rich HTML text to doc files.