Using a PDF template for exporting a translated copy
Numerous teams are done or close to being done with PDF translations, so we need to build up our resources for handling the reconstruction of these. We need to handle a global web search for the best (possibly, but not necessarily, largest) version of each PDF (a webpage made from the info may also be an option) that we can find and then extract all of the images in the least destructive way to make them available for public viewing on the web for eventual uploaders to compare them with what they've found to prove it is worth to upload until we can get the publisher to send us the original files used.
To start with, the Greek team is in the process of putting together their own PDF of the Activist Orientation Guide and is in need of the best images we can find for it. I've posted about this also in a Facebook thread at https://www.facebook.com/groups/linguistictechteam/.
This post has been edited 1 times, last edit by "txetxo" (Nov 20th 2011, 10:38pm) with the following reason: The initial call for help in the subject is no longer necessary and now the thread will focus on the producing of a tutorial on a strategy for translating PDF documents and exporting the translations back to PDF documents out of a template, using certain techniques and tools, which is to be brainstormed and logged in this thread.
It is nice to mention that Greek team for some time now has done the AOG on their site. Big "Thanks" to everyone who helped to get the pictures and for anyone who is curious how does it look like here is the result: http://static.tzm.gr/filemanager/Ebooks/…nGuideGreek.pdf. In addition it is good to know that the process of putting together translated PDF documents out of a template is now the subject of this thread. The method Kostas described is as follows:
Quoted
i) Before the HTML extraction make sure all suggestions and fuzzy translations are resolved through the help of filtering in Pootle
ii) Extract the HTML and put all the files together with the images in the same folder
iii) Concatenate the HTMLs into a single file
iv) Correct the image file names so they appear in the single HTML when displayed in a browser.
v) Search through the document for any English paragraph that has not been extracted and/or any images that have to be translated (open the single HTML file in a browser)
vi) Copy everything from the browser to a Word or equivalent document.
vii) Open the document and review the file (comparing to the original English version) to put all the references in the bottom of each page).
viii) Make any necessary corrections to the style of some parts of the document such as bullet lists, center the headlines and add the first page and table of contents
I'm playing around with automating the PDF-generation. So it can be done automatically every night. It somewhat requires read-access to the backend po-files - or at least a raw dump of all translated text.
Who should I talk to to accomplish this. I have strong software development skills so it's just the access to the raw data I need.
I'm doing work on TZM Defined - Dnaish translation. But after posting the question I noticed that Pootle have an API. So either read-only access to .pro files or access to the Pootle API would be nice.