PDA

View Full Version : logistics question for programmers


dank
07-21-2001, 02:16 AM
For those of you who write or maintain code for programs, I'm curious what people do for staying on top of language file updates. Currently, I'm simply updating each file with new variables and keeping track of which ones are un-translated. This method is marginally bearable with a small number of language files, but will undoubtedly grow out of hand.

I haven't had any luck searching for a tool that could somewhat automate the process. I haven't found a combination of search terms that yields any results other than electronic dictionaries and generic file managers...

I'm thinking something could be done with a database. Enter a variable/string value in the default language (english, in this case) and it would be added in the same spot to each langauge file. There would either be a flag for each file's field indicating if it is un-translated, or a check could be run on corresponding fields of each file to the original (english) file -- if they have any matches, than those fields are un-translated. In which case, a summary list could be generated.

Doesn't sound too tough in concept, but pretty much all of my database experience is with web stuff. Haven't dealt with Access and the like too much. If something like this is ready-made, that would be perfect. Otherwise, I may have to tackle it on my own.

Thanks,
Dan

dank
07-23-2001, 12:00 PM
I take the silence to mean either no one knows of any such product or those that do want to keep their secrets to themselves? ;)

I found a couple Windows programs that came kinda close, but they were more geared toward automatic translation into pre-set languages for anything you enter, not so much organizing what you already have.

So, I've started putting together a MySQL/PHP thingee to see if it will accomplish the task. Storing that sort of thing on a remote server may not be the best idea (nor being dependent on a 'net connection to work on it), but it's not the sort of thing that is likely to change often (i.e. occasional backups should be more than sufficient) and the concept seems simple enough that it beats trying to figure out how to do it in a different and largely unfamiliar programming environment. I tried putting something together in Excel, but it would have required quite a bit of VB to automate it in any usable fashion...

Dan

Bruce
07-23-2001, 12:07 PM
What exactly do you mean by "language file updates"? Are you referring to updates to the programming language itself (which happens pretty rarely), or changes to the source code for the program being developed? If you are referring to the latter, there are many solutions available, the (arguably) best known one being CVS (concurrent versioning system). There are also other systems, such as bitkeeper, aegis, RCS (on which CVS was originally built), and more. You can find more information about CVS at http://www.cvshome.org/

dank
07-23-2001, 12:26 PM
Hmm, I guess I wasn't clear about that. I was referring to translations of a program's ouput text strings into different languages.

CVS was my first thought, but last I checked, its support for Win 2000 was very sketchy at best. Seemed to be "use at your own risk..." I do have an Xitami server setup on my *old* Win 95 computer, but that ol' geezer will keel over trying to handle the occasional firing up of a post NS3 browser, and I'd rather not risk losing my main computer by installing new OS type of stuff on it...

As I read BitKeeper (http://www.bitkeeper.com/), it appears to be a CVS-like system with no specific mention of what I'm looking for (mostly file comparison stuff listed), and in beta mode.

I assume this is the Aegis you are referring to (there are lots of matches for that name)?

http://www.pcug.org.au/~millerp/aegis/aegis.html

Doesn't sound like I'll have much luck with RCS. "The source distribution is intended primarily for UNIX systems. Some people have been successful in porting it to other systems as well."

http://www.cs.purdue.edu/homes/trinkle/RCS/

Dan

Bruce
07-23-2001, 12:49 PM
OK, internationalization (more commonly referred to as i18n) is the problem you're describing. No, no source code management system will help with this task. I've never had to deal with this (yet). For UNIX systems, the GNU gettext system is one way of dealing with the problem, but I have no suggestions for Windows development. Sorry.

dank
07-23-2001, 01:01 PM
internationalization (more commonly referred to as i18n)
Ah yes, I recall seeing them terms way back when. I suppose it helps to speak the same language, no pun intended. :)

I'm not too keen on the idea of relying on built in translation systems. I'm of the mind that no translation is better than a bad one. Nothing like reading instructions in the form of a really poorly dubbed martial arts movie...

I don't think this will be too difficult of a project; I've already got a few of the building blocks in place. Surely other people have a need for something like this, so maybe I'll polish it up (if and when I get it working) and release it as something more than a custom hack.

Dan

Arthur
07-23-2001, 01:27 PM
As Bruce said, GNU gettext would be a solution. PHP has handles for it, but Terra isn't going to add support for it unless there's demand for it.
The Windows version of PHP also looks like it has support for it, haven't tried it yet though.

Anyone else want to sign the petition? :)

Arthur
-- l10n, i18n....f9t? --

dank
07-23-2001, 01:34 PM
Wouldn't gettext be no different than running text through a translation dictionary at runtime?
a runtime library supporting the retrieval of translated messages
No matter how extensive the library might be, I don't see how it could take into account the context words and strings are used in. Without that human touch, some really shoddy translations with choppy grammar are often produced.

Dan

Rich
07-23-2001, 01:46 PM
Dank, perhaps you could list some of the featues/capabilities this type of application/utility would provide?

IMO, I don't think a database would help the task much unless you relied on some type of automatic translation utility, which you have said, and I agree, is not useful in practice. The only exception I would make to this is IF the end-user of the program (not the utility) had the capability to actually change the output text (which would be a very unique application).

It would seem that if you have 14 different languages to maintain and you change the output text, that there would be no other choice than to change 14 different text versions.

Rich

Bruce
07-23-2001, 01:58 PM
GNU gettext is not a translation program. It is a library for maintaining translations (which would be written by people), and using the most appropriate one automatically at run time. Indeed, if you wanted to support 14 languages, each change to a message or addition of messages would require changes to all 14 languages. Obviously, you would need to have a good translator as part of your support team...

dank
07-23-2001, 01:58 PM
The text itself definitely would have to be manually updated to keep any semblance of quality. What I'm trying to automate is not having to update each separate language file (and record which added items have yet to be translated, as I'm not exactly multi-lingual) whenever a language variable is added, removed, or modified. Here's what I'm working on:

- the database table contains a column for the variable name and a column for the text corresponding to that variable for each language.

- haven't fully decided yet how to display the interface, but it currently lists the translations for each language alongside the variable as rows in an html table. will probably add in options to display all variables/text for one language, all languages for one variable, and maybe other ways of narrowing it down.

- when you add a row to the database for a new variable, the text you add (in whatever you choose as your default language; english in this case) is added as the un-translated text for each of the other languages, eliminating the need to update individual files.

- edit forms that would be similar to the view options.

- output: 2 or 3 options. 1) print a language's translations to the browser in $variable = "text"; format, which can then be copied into whatever language file you are using, 2) print to file in tab delimited format in the form of $variable [tab] text for each inclusion in a spreadsheet, database, or whatever, and possibly 3) print #1 to a file (risky if you overwrite an existing language file of the same name that is in use, but no more so than #2).

- part of the output would do a check for the selected language of un-translated variables (anything that matches the default (english) entry) and print a summary report

I think that's the bulk of it.

Dan

Stephen
07-23-2001, 02:20 PM
well, because i happen to be facing the same problem right now here's my approach to this. but first, the background: mine is a perl-based web application (allowing customer reviewers in the web context) and all the text within the HTML needs to be translated for the required language. so i fashioned a templated editor which reads the template and allows one to customize (or translate) the text. one possible template design (this one for the HTML displayed when the review section is temporarily closed) is something like this:

{

'template_type' => {

'comment' => q~required for template
identification purposes. do not alter!
must remain as closed_page~,
'tags' => q~~,
'language' => { 'en' => q~closed_page~,
'fr' => q~closed_page~
}
},

'page_layout' => {

'comment' => q~the page displayed by
dynamically accessible scripts when your
review area has temporarily been closed.~,
'tags' => q~<!--SITE_NAME-->~,
'language' => { 'en' => q~<!--SITE_NAME-->
Review Section Closed for Maintenance...~,
'fr' => q~Translated
version of "<!--SITE_NAME--> Review
Section Closed for Maintenance"~
}
}
}

so this is just a hash reference which can be assigned to a variable using the do operator. you have your variables, e.g. page_layout (which is a simplified version of the entire output page), with the different languages assigned to the two-letter language code. here, i've shown 'en' and 'fr' and have not tried to do the actual french translation! you could add lots of languages this way, and all the translations would appear in the one file for that particular translated string (or set of strings). you'd have qualified people do the actual translations (read customers).

actually, the approach i've taken is slightly different. instead of putting all languages into the one template, i'm going to be copying the one set of english templates to folders named after the two-letter codes. that'll keep file sizes down and put less burden on the server (because it has to read in the entire template). but it reduces maintainability. can't have it both ways unless your template variables are few, or you only have a couple of languages you want to support.

to get out the french version of the page_layout string you'd first assign the hash to a variable like $closed_page and specify:

$closed_page->{'page_layout'}->{'language'}->{'fr'}

your template structure could be simpler, but having attributes like 'comment' and 'tags', etc, allows you to attach more general information to each string in the hash.

the trick to this, of course, is writing a decent template editor, but it doesn't take to long if you base your templates on hashes--sure makes reading/assigning of variables easy!

dank
07-24-2001, 02:17 AM
you would need to have a good translator as part of your support team...
Eh? Team, what's that? :)

Stephen, I think I follow what you're describing, but I'm not sure about one thing. Are you saying the script "owner" customizes the text or the end user does? Maybe I'm just tired, but every other time I read it, I interpret it the opposite way as previously...

instead of putting all languages into the one template, i'm going to be copying the one set of english templates to folders named after the two-letter codes. that'll keep file sizes down and put less burden on the server ... but it reduces maintainability.
That's exactly my current system, sans abbreviated language folder names. I haven't seen any approaches that make the actual language variable usage any simpler or faster than plain ol' variable assignments. An overall array of language variables would reduce having to pass individual variables throughout functions, but would just complicate other aspects. Is there an advantage to your hash system? It's not immediately obvious to me...

Aside from the step for preparing the final language file (either print to file or to the browser for copying into a file), I've got most of the other pieces working nicely. Makes adding the same variable for each language a snap, and editing/viewing/deleting is a breath of fresh air after having to go through each language file (only 4 of them for the main project, but 400 entries in each).

Dan

Stephen
07-24-2001, 01:34 PM
dan,

i tend these days to construct hashes like these as the easiest way to bundle related pieces of info and keep track of it. you could tidy up the implementation by going a step further and turning it into OO code, but i'm kinda lazy and don't usually take that step. to me, this approach seems the easiest.

what i intend to do is have customers initially translate the English templates themselves using the template editor. in exchange for a copy of the program the early ones provide me with their translations, which i then add to my set of language folders to be bundled with the program. the second generation of customers have access to the first generation translations which they can refine as needed. so yes, the "script owner" ultimately customizes the language as they see fit.

400 entries in each file. yikes, you definitely want to let your customers do the work!

dank
07-24-2001, 01:44 PM
what i intend to do is have customers initially translate the English templates themselves ... in exchange for a copy of the program
Precisely what I do. Actually, a couple of people offered early on to translate the program if I set it up (moving all the text to language files) to easily do so. Ends up being a great deal for everyone involved. They get a free copy of the program in their native tongue, and I achieve a wider audience with the only trade-off being initial setup of the language files and maintenance (which this new program should help with greatly).

400 entries in each file. yikes
399 now. I found 4-5 duplicates while sorting things to import into the db for the new program. :)

One thing I did previously with language files which will be more difficult with the new program (if not entirely impossible/implausible, given the work to payoff ratio) is grouping the variables by the page(s) they are used in. Of course, this also leads toward potential duplicates...

Dan