You are not logged in.

  • "lizardman" is male
  • "lizardman" started this thread

Posts: 578

Date of registration: Jun 3rd 2011

Language Team: Bulgarian

Focus Group: Translation Proofreader
LTI Administration Group

Location: Plovdiv, Bulgaria

Thanks: 26058 / 483

  • Send private message


Friday, November 30th 2012, 7:12pm

Transferring CSS/HTML to Open Office, layout preserved

Hi guys,

Yesterday we put up the 2012 version of the Activist Orientation Guide in Pootle for translation. It's all fine and good, but there's an impediment that will need to be bypassed. To explain it, let me first describe how we handle similar projects in Pootle.

If it's a PDF book like the AOG or Designing the Future, we usually obtain a HTML of the original - by using PDF to HTML converter or by getting a good HTML directly from the creator - and use that to convert it to the PO format that Pootle works with (so, HTML to PO). After the PO file is translated, we convert it back to HTML, but this time all the English words are replaced with the translation (PO to HTML). We then open this translated HTML file with a browser and copy/paste the content to an Open Office text document. This copy/pasting preserves the formatting, so little additional manual work would need to be done. Then, we simply export the Open Office document to PDF.

Having the ability to turn the HTML to a simple text document, while preserving the formatting, is a pretty big bonus since that text document can be modified by the language teams (everyone finds mistakes in translations).

Now what's the problem with the 2012 AOG? We got a somewhat messy HTML from the PDF , but it works. This HTML was converted to PO. After that, when testing the conversion from "translated" PO (actually just a copy of the original text on the place of the translation) to HTML, it works alright. But then the problem is with transferring this HTML file to an easily-editable Open Office document, with the formatting preserved. Because the html file contains a separate CSS section within the <head> - and I was told that for this reason the copy/pasting from a browser will not preserve the formatting (I tried it with FIrefox, Opera and IE).

So, somehow this inability to preserve the formatting (by this I mean the font size, placement, any bold, italics, links, etc.) when copy/pasting from the browser has to be overcome by either replacing the CSS inside the HTML file with something else that has the same function, or by converting the HTML to something else and then transfer that to Open Office, or by finding a copy/paste function somewhere that would preserve the formatting, or by whatever someone might think of. I'm just throwing some ideas here that I'm not sure have any viability. :)

But anyways, if someone can find a way to overcome this difficulty, that would be awesome since this is an integral part of putting our content out to the world in different languages.

I'm attaching here the original HTML file, if you need to test options with it and see how they work.
lizardman has attached the following file:
  • (57.67 kB - 94 times downloaded - latest: Jun 21st 2019, 8:35pm)

1 registered user and 82 guests thanked already.

Users who thanked for this post:


  • "Ray" is male

Posts: 1,696

Date of registration: May 23rd 2011

Language Team: Global

Focus Group: LTI Administration Group

Location: Michigan, US

Thanks: 52988 / 7375

  • Send private message


Sunday, December 9th 2012, 12:05am

A little more background: Prior to the launch of the TZM Blog, I was one of the original members of the TZM Newsletter Team which was comprised of six people, each handling different aspects of the overall process of article collection, collation, approval, proofreading, layout, etc.. We had developed a wonderful systematic approach that worked very well to knock out fresh PDF content as soon as we had enough appropriate material to support each new issue, although partial credit for that also goes to strong inter-communication across the members of the team. Gregory (Thunder) handled the original English newsletter layouts using Adobe InDesign, but even with such a dedicated app, we had difficulty with "templating" for the eventual translations that came back from here.

I believe the 'fix' for what's confronting us now lies somewhere in the original conversion from PDF approach to something that we can use for a variety of purposes (conversion to PO for Pootle, template development for each PDF project. The typical "PDF to text" loses all formatting and a standard "PDF to HTML" results in too wide a variety for the consistency that's needed here. So, is anyone able to shed some much needed light on some possible approaches that can be taken to mitigate all of this?
Signature from »Ray« Earth For Sale:
Slightly Used; inquire within

80 guests thanked already.
© Linguistic Team International 2019
Context In Motion