Applications Google
Menu principal

Post a Comment On: Backreaction

"Lost in Translation"

26 Comments -

1 – 26 of 26
Anonymous Anonymous said...

Because the different languages are not isomorphic.

10:29 AM, March 30, 2008

Blogger Bee said...

Not if you translate worde-wise, no. But if you consider strings of words, context etc, shouldn't it at least come close to being invertible? I mean, the above is an almost complete loss of information within 5 iterations (except possibly for names or other words that weren't translated). I don't find this very convincing.

10:32 AM, March 30, 2008

Blogger William said...

Interesting that "string" becomes "series of the characters". Seems like an unfortunate side-effect of the fact that software is written by computer programmers.

10:50 AM, March 30, 2008

Blogger Bee said...

Yes. One would think it can't be so difficult to check the next word after 'string', which is 'theory' and pick the correct translation?

10:53 AM, March 30, 2008

Blogger Skavookie said...

The phrase "series of the characters" has a completely different meaning in mathematics (vs computer sci), and is definitely related to string theory, providing an alternative explanation, although not a very plausible one.

As for the "homology" of languages, the problem is far more complex than one might think. We are far from having a adequate model of syntax, let alone semantics, even for European languages, and even if we did, I would not expect languages to be even close to isomorphic. An interesting classic example is color: different languages classify colors differently, to the extent that native speakers of one language are unable to distinguish between colors that a native speaker of another language would. Even at a high level, languages are not even quasi-isomorphic.

And yes, I am both a mathematician and linguist.

11:27 AM, March 30, 2008

Blogger Bee said...

See also Disambiguation

Didn't I read somewhere that the study which claimed color-distinguishing is encoded in the language was either faulty or doubtful?

11:33 AM, March 30, 2008

Blogger Phil Warnell said...

Hi Bee,

This is exactly why in the past scholars only wrote and published their writings in Latin. The laws of gravity and motion are thus known as “Principia Mathematica”. I would contend that the natural solution lies not with algorithms, yet rather with standardizing human language. However, as with most things human there is little chance of this happening. When I was in High School Latin was still required to be studied for at least two years although I must admit that won’t take one very fair :-)

Best,

Phil

11:52 AM, March 30, 2008

Blogger Skavookie said...

http://en.wikipedia.org/wiki/Character_theory

A recent article about color perception (yah, I know, Wired is not exactly a scholarly source - but the comment are often amusing in a sad way!): http://blog.wired.com/wiredscience/2008/03/babies-see-pure.html

Perhaps my choice of the color example was not the best, as it is slightly controversial (but only slightly). The idea that we perceive reality through the filter of language is an old idea, and the color example has been revisited many times, with the same results.

12:11 PM, March 30, 2008

Blogger Frank said...

I remember playing around with this precise thing and have it work away at shakespeare sonnets, etc when online translators first became available.

Some musings:

You agree that it is obvious that a translation based on words can not be isomorphic. Translations are one to many depending heavily on context. But then what should be the unit that is isomorphically mapped? In many ways language is engineered for fault tolerance, that means that whatever "unit of meaning" you come up with that should be isomorphically mapped between languages it would by necessity be encoded redundantly in many subtle relationships within a text (and even outside the text itself).

You know this yourself I'm sure, from translating texts between English and German, the problem doesn't factorize. Sometimes you need to take the whole sentence or even the whole paragraph and basically completely re express it.

But then language is ambiguous due to the redundancy mentioned above so there will be many different texts that you would classify as equally good translations of one particular text, so even at the level of the whole text it is not clear that it can be isomorphic.

And a computer starting to build a translation from using the one to many maps of a dictionary and trying to chose according to context will obviously lose all the coherence between the words first, because it is not really translating this structure to begin with, it only is expressed in the dictionary choices made. Because this structure is so incredibly hard to model it seems natural that it's lost very quickly.

12:12 PM, March 30, 2008

Blogger stefan said...

Dear Bee,

that's a cool toy :-)... I've played around a bit with the first paragraph of your text, and I have the impression that the quality of the translations to the different languages involved is quite uneven.

For example, the first translation to French seems very reasonable to me (apart from the "chaîne de caractères", which didn't make sense to me at all - thanks for the hint about computer science), and the backconversion to English is still quite understandable.

But the second translation, from still quite reasonable English to German, is awful: "in more than four dimensions of spacetime" becomes "in mehr als vier Maßen von spacetime", and "Among the reasons for which it should be interesting to study this extension of the theory of Einstein, and in particular its solutions of black hole" is converted to "Unter den Gründen, aus denen es interessant sein sollte, diese Extension der Theorie von Einstein zu studieren und insbesondere seinen Lösungen der schwarzen Bohrung," ... it seems that a lot of sense gets lost in the step English-German and back ;-). And the translation to Italian then introduces further confusion ("space time" → "tempo dello spazio" → "time
of the space")... It would be interesting to see the performance of the to and fro translation starting with the same text for the five languages separately.

The context when a string is a "chain of characters" and a, hum, "string" may still be possible to figure out automatically, and perhaps also when a hole is a hole or a drilling - but I always think about "field theory", where the to and fro via "Feldtheorie" or "Körpertheorie" may result in funny errors, and where the right context may be much harder to establish automatically ;-)

Best, Stefan

1:36 PM, March 30, 2008

Blogger Bee said...

Hi Phil,

Problem is I receive a lot of papers I am supposed to referee which sound like the second version of above example rather than the first. If that's the way English will be 'standardized' we will indeed end up like the tower of Babel. Latin might maybe be better in the sense that it's more precise when it comes to grammatic constructions, but it's also much more complicated to learn. Say about English what you want, at least it's easy (if one neglects the pronounciation problem). Best,

B.

1:38 PM, March 30, 2008

Blogger Phil Warnell said...

Hi Bee,

“Problem is I receive a lot of papers I am supposed to referee which sound like the second version of above example rather than the first.”

One day your honesty is going to have the better of you :-) Now can you imagine what it’s like when a novice such as I tries to understand what they contain? Of course I’m spared what it looks like before the referees have had their effect. On the other hand many scientists wonder why they are so misunderstood by the general public. I can tell you based humbly as one who has held science as an important hobby all of my life, there is nothing to wonder about.

“Latin might maybe be better in the sense that it's more precise when it comes to grammatic constructions, but it's also much more complicated to learn. Say about English what you want, at least it's easy (if one neglects the pronounciation problem).”

First I was not proposing we return to a dead language to solve the problem and yet am somewhat surprised that you have such a high regard for English. Not that I would object for it would spare me from being further left in the dust then I currently feel I am, yet I would do my best to learn anything that had been mandated and enforced to be a standard even Latin. Until then the scientist can only believe:

“Mathematics est verum”

Regards,

Phil

2:35 PM, March 30, 2008

Blogger Bee said...

Well, yeah, don't misunderstand me. Not being a native speaker myself I understand that it is difficult to argue in a foreign language, but there are just limits to what one can guess together from the equations that are provided, even with the best intentions.

2:43 PM, March 30, 2008

Blogger Phil Warnell said...

Hi Bee,

One should not be left to believe that your English serves to be in any way a handicap. Trust me, as being one outside the halls of academia, your comprehension and communication far exceeds that of more then just the average native’s ability (including my own). When it comes to understanding, I’m afraid the only form of communication currently more incomprehensible then science is that of diplomacy. I am then glad that you and Stefan not only continue to struggle to understand this for yourselves, yet also grateful you find time to aid others to perhaps sort some of it out.

“verum est verus”

Best,

Phil

3:50 PM, March 30, 2008

Blogger William said...

Bee,

While the software you linked to doesn't seem to be able to pick out the context, here's a sample from google translate:

piece of string -> Bout de ficelle
string theory -> La théorie des cordes
ascii string -> Chaîne ASCII

I'm impressed. I wonder how they do that.

3:56 PM, March 30, 2008

Anonymous Uncle Al said...

1) Time flies like an arrow, fruit flies like a banana.

2) "out of sight, out of mind" => "invisible idiot."

3) Given Geshwindigkeitsbegrenzung, where is there room on the sign for the number? Hence natural evolution of the Autobahn with no speed limit (and presumably all the signs at the end).

4:56 PM, March 30, 2008

Blogger Phil Warnell said...

Hi Willaim,
“I'm impressed. I wonder how they do that.”

As I understand it’s basically a combination of statistical comparison and what I would call a evolutionary element explained as follows:

“In order to make this happen, Google plans to bring some drastic reforms to the algorithms working behind scene. As is expected of a technology pioneer, they've devised a new approach to the whole issue - namely, "statistical machine translation" which differs from any of the past efforts in that it forgoes language experts who program grammatical rules and dictionaries into computers."

The new process involves feeding pre-translated parallel text in various languages into computers and then relying on them to discern patterns for future translations. At the moment the quality offered by this mechanism isn't perfect either - but there's a distinct advantage to such a statistical analysis engine. With time, "the more data we feed into the system, the better it gets" said Franz Och, the head of Google's translation effort at its Mountain View, California headquarters”

So one might ask how do I know? Simple, I of course searched Google. Oh Sergey and Brin are clever fellows. Dam, they are also rich:-)

Best,

Phil

6:55 PM, March 30, 2008

Blogger Phil Warnell said...

Hi Willaim,

Sorry, that should read "Sergey and Larry" are clever fellows not "Sergey and Brin", since they are Sergey Brin and Larry Page. This also is to remind that while they are clever I am not. Simply a practiced Googler :-)

Best,

Phil

7:07 PM, March 30, 2008

Blogger William said...

Thanks, Phil!

8:23 PM, March 30, 2008

Blogger Arun said...

Dear Bee,

Part of the meaning of language does not reside in the language itself, it resides in our heads.

That is, there is not an absolute intrinsic meaning to a text; any reading involves interpretation. (We get nasty forms of religious fundamentalism when people forget this fact when they read ancient texts.)

This is why even when linguists,archaeologists have been able to decipher a dead language, the meaning of the texts still can be ambiguous.

Perhaps only the driest and most pedantic of scientific texts has an intrinsic meaning independent of the observer.

So until a computer can have some or all of our mental models, it cannot do a good job of translation.

----

An old joke is "The vodka is good, but the meat is rotten" as the English-Russian-English translation of "the spirit is willing but the flesh is weak".

----

Since I'm sure I'll be challenged on this, let me find a simple way of proving my main assertion above.

Best,
-Arun

10:06 PM, March 30, 2008

Anonymous Anonymous said...

Hi, I did my own research with Babelize. I tried "I had a dream". It goes back and forth perfectly thru German, French and Portuguese. I mean after the 3rd pass it still says in English "I had a dream". So far so good.Then it goes forward to translate into spanish and returns back to english. The final pass. And this is the result:

"I was a sleepy"

Try yourself. I can't imagine rev. Martin Luther saying "I was a sleepy"

Btw, other translators do a perfect translation english-spanish-english of the same sentence. Curious.

7:33 AM, March 31, 2008

Blogger Arun said...

http://syrcom.cua.edu/Hugoye/Vol6No1/HV6N1PRPhenixHorn.html

(Not a easy read, I understand only the gist of it.)

An example which makes me wonder what the meaning of text is. For more than a thousand years, millions of men interpreted a text to mean nubile women would be available to them in heaven. In today's world, men are willing suicide bombers based on this belief. Now this guy comes along and tells them sorry, it was really white grapes all along.

Notice that one needs a theory of how the text was produced and how it was transmitted. I think in general there are whole hosts of assumptions we don't state when we read something. Fortunately for most part the assumptions are right and so the exercise works.

8:29 AM, March 31, 2008

Blogger Bee said...

Well, taken together it seems to me the translation algorithms used for the babelizer aren't quite the best, maybe deliberately so. I've used Google translate various times (e.g. to check trackbacks to blogposts written in languages I don't speak), and though it isn't perfect it is usually sufficient to understand what is written about.

11:57 AM, March 31, 2008

Anonymous Anonymous said...

Indeed an interesting program:

Original English Text:
I love you

Translated to French:
Je t'aime

Translated back to English:
I love you

Translated to German:
Ich liebe Dich

Translated back to English:
I love you

Translated to Italian:
Ti amo

Translated back to English:
I love to you

Translated to Portuguese:
Eu amo-lhe

Translated back to English:
I love to it

Translated to Spanish:
Amo a él

Translated back to English:
Master to him

3:30 PM, March 31, 2008

Blogger Arun said...

google tools:

This and that

English to German

Dies und Das

German to French

Si l'on y ajoute L'

French to English

If one adds'

-----------------

This and that

English to French

Que ce

French to German

Was

German to English

What

----

:)

10:19 PM, March 31, 2008

Blogger Oliver said...

This reminds me of that masterpiece of the English language, `English as She is Spoke', by Pedro Carolino. He had the worthy idea of creating a Portuguese - English phrase book for the particular benefit of the Portuguese youth. Thankfully for us, he was not deterred by the fact that he didn't speak English. However, he did have to hand both a Portuguese - French phrase book and a French - English dictionary.

There's an entry about the book on Wikipedia, which also has a link to some of Carolino's `Idiotism's and Proverbs' (you couldn't make that up).

4:59 AM, April 01, 2008

You can use some HTML tags, such as <b>, <i>, <a>

Comment moderation has been enabled. All comments must be approved by the blog author.

You will be asked to sign in after submitting your comment.
OpenID LiveJournal WordPress TypePad AOL