Categories > OpenTBS with DOCX >

Insert formatted (styled) text into MS Word from html tags in database

The forum is closed. Please use Stack Overflow for submitting new questions. Use tags: tinybutstrong , opentbs
By: julesmim
Date: 2012-02-22
Time: 10:58

Insert formatted (styled) text into MS Word from html tags in database

Hello,
I have some text in a database tagged with <em> or <sup> or <span class=smallCaps> in some fields of my database.
Is it possible to have those strings of text inserted by TBS into the result docx file as text in italics or exponent or small Caps format. Another example would be <strong> for bold text.
Thanks for your help

Jules
By: Skrol29
Date: 2012-02-24
Time: 01:05

Re: Insert formatted (styled) text into MS Word from html tags in database

Hi,

There was a similar question on Stack Overflow.
Unfortunately, the answer is not cool.

http://stackoverflow.com/questions/9315531/opentbs-convert-html-tags-to-ms-word-tags
By: julesmim
Date: 2012-02-24
Time: 08:22

Re: Insert formatted (styled) text into MS Word from html tags in database

Hello,
Thanks a lot for this hint.
Since my data include only very simple html tags, I think the solutions offered are not uncool for me.
I think I will probably be able to extend the solution you gave for </br> tags for the other tags I use (since I never have combined tags like bold italic, for example).
Thanks a lot.
Jules
By: bmadeiro
Date: 2012-04-16
Time: 01:48

Re: Insert formatted (styled) text into MS Word from html tags in database

Hello,

I think possible to extract html and convert to docx using this class (http://htmltodocx.codeplex.com/) but it's so complicated. I need help.

Thanks.
By: julesmim
Date: 2012-04-16
Time: 10:32

Re: Insert formatted (styled) text into MS Word from html tags in database

Hello,
I had a similar need to convert some html tags present in my database to get the corresponding text style in Microsoft Word .docx document.
I did not know the existence of htmltodocx at codeplex.
So, with the help of Skrol, I built my own function.

Here it is, if it can help you.

In my Word template, I give the following parameters in my block tag: example for the blk_res.bibliografia tag:  [blk_res.bibliografia;onformat=f_html2docx;strconv=no]
And my function definition is as follow:

function f_html2docx($FieldName, &$CurrVal) {
//$CurrVal =utf8_encode( $CurrVal);
   $CurrVal = '<w:r><w:t xml:space="preserve">' . $CurrVal . '</w:t></w:r>';
  
   $CurrVal= str_replace('<w:t>', '<w:t xml:space="preserve">', $CurrVal);
  
   $CurrVal= str_replace('<em>', '</w:t></w:r><w:r><w:rPr><w:i/></w:rPr><w:t xml:space="preserve">', $CurrVal);
   $CurrVal= str_replace('</em>', '</w:t></w:r><w:r><w:t xml:space="preserve">', $CurrVal);
 
   $CurrVal= str_replace('<sup>', '</w:t></w:r><w:r><w:rPr><w:vertAlign w:val="superscript"/></w:rPr><w:t xml:space="preserve">', $CurrVal);
   $CurrVal= str_replace('</sup>', '</w:t></w:r><w:r><w:t xml:space="preserve">', $CurrVal);
  
   $CurrVal= str_replace('<span class="autore">', '</w:t></w:r><w:r><w:rPr><w:smallCaps/></w:rPr><w:t xml:space="preserve">', $CurrVal);
   $CurrVal= str_replace('</span>', '</w:t></w:r><w:r><w:t xml:space="preserve">', $CurrVal);
}

I left in this function definition a commented line that you might need to uncomment, depending on the encoding of the text that is passed to TBS.

While doing this, I learned that if you want your Word template to work well, you have to write the TBS tags right the first time. I mean not to try to correct them after you wrote them in the first place. If you have to correct a TBS tag, start to write it completely anew. This is because every time you correct text already written, Word adds internal tags that indicate that this is text that has been corrected. And those tags don’t mix so well with the tags that indicate formatting. This situation confuses TBS in some situations.

As you can see, my function deals only with a few styles (italics, superscript and small caps), but the same syntax can be applied to other styles. To know the wording and syntax of other properties, you should consult the specification for WordProcessor Open XML used by Microsoft Word. You can find it online.


Hope it can be useful.
By: Tim
Date: 2012-06-12
Time: 13:15

Re: Insert formatted (styled) text into MS Word from html tags in database

Hi everyone,

I'm also dealing with this problem. I'm trying to convert HMTL-data and parse that in my template.
Parsing the data is not the problem, converting the style to docx doesn't work.
In my template I've got the following [c.onderbouwing;onformat=f_html2docx], here is c.onderbouwing my tag. When I'm adding 'strconv=no' tot the tag, the file won't be able to open...
In my demo_merge I placed the function function f_html2docx. Is that the right place?
Anyone some ideas? And does someone have a function witch convert html to the docx (<b>,<img>,<a href> tags) ?

Kind regards!
By: Skrol29
Date: 2012-06-14
Time: 23:39

Re: Insert formatted (styled) text into MS Word from html tags in database

Tim,

I've not the solution, but I believe it would be a huge task to convert HTML <img> to Docx image.
By: Kim
Date: 2016-05-09
Time: 14:56

Re: Insert formatted (styled) text into MS Word from html tags in database

Hi,

I have an identical problem and I ended up using a simular function as Julesmim. But now the font size is changed starting from the part where the text is transfered to italic. Is there a way to force it to keep the template it's font size (and styling)?

Thanks.

Kind regards,
Kim