What romanization scheme is used by programmers?

Question

I noticed that in this book on bioinformatics with Ruby (the programming language!) (Rubyではじめるバイオインフォマティクス―生物系のためのプログラミング入門), the authors sometimes used romanized Japanese in code examples. For example it sometimes uses hairetsu_1 for the variable name of an array (that'd translate as array_1).

When not programming in English, do Japanese programmers use a consistent romanization scheme, and if so, which one?

Perhaps we could expand this question to include all Japanese on what's the most commonly used scheme they use when forced / choose to type in romaji on computer? — Lukman, Oct 03 '11 at 07:05
really not sure what you mean by "in code" (comments? variable names?), but the answer lies more in encoding and compiler issues (the vast majority of languages only support ascii character in parsable code). Little to do with Japanese usage at any rate. — Dave, Oct 03 '11 at 07:19
Somehow, the amazon link is not working correctly. The method suggested by YOU in meta http://meta.japanese.stackexchange.com/questions/256/ does not seem to be working here. Can someone fix this? — , Oct 03 '11 at 07:54
@sawa I think he's referring to naming variables, functions and classes in romanized Japanese, for example: `gakusei.denwaBangou` instead of `student.phoneNumber` — Lukman, Oct 03 '11 at 09:05
@Dave: Actually, Unicode identifiers are supported in some of the more recent popular languages like Java and C#. There are thousands of programming languages out there, but most code is written in just a few... — Zhen Lin, Oct 03 '11 at 09:25
@ZhenLin: which is why I said "the vast majority", not "all". Java is the only major language I know that supports unicode variable names well (I don't know enough of C# to know about that, but seeing how it's, errm, heavily inspired by Java, it's not all that surprising). The question was tied to Ruby, which only partially supports unicode in source code (as of version 1.9) but whose reliance on capitalisation-dependent typing would make it a nightmare when used with full kanji variable/function names. — Dave, Oct 04 '11 at 03:54
Rewriting of amazon links for foreign amazon sites is now disabled, but you need to make an edit to your post to regenerate the URL. I don't have the rep to make a one-character change. — Troyen, Dec 20 '11 at 20:17

ento · Accepted Answer · 2011-10-03T17:31:51.373

I agree with Matt that there's no fixed standard about which romanization scheme to use. My guess is that it depends on the project, author, term and the author's swing of mood at the moment, just as in any other context of Japanese romanization.

[Personal point-of-view] If I were to use a Japanese variable name, I'd use Hepburn-style romanization, because it feels more phonetically consistent. However, when typing hiragana/kanji text through an IME, I mostly use Nihon-shiki to save keystrokes. [/Personal point-of-view]

Now be warned, my point-of-view can be biased, especially the first part, according to my work-partner. Let me share his story here.

I first asked him why his code is entirely in English, although he's not very fluent in it. He answered:

That's because at the first company I worked for, about ten or more years ago, the predominant attitude toward Japanese naming in source code was "embarrassing." (はずかしい)

To be precise, there were two groups of programmers in the game industry then: (1) Those who habitually used Japanese. (2) Those who thought it embarrassing. I don't know but there seemed to be a trend going against using Japanese among the younger programmers around that time.

Me: Were there any fixed way to romanize those Japanese variables? Like tsu vs tu?

Yeah, they used tu [Nihon-shiki] exclusively. I think that's all they knew, what they learned at school. [*]

Me: But what about fuga, as in hoge, fuga, piyo? (Common metasyntactic variable names among Japanese programmers.) Shouldn't it be huga, if you want to be consistent?

Ah, these came from an entirely different class of programmers, those who'd been in the field for ages, near-bilingual, programming-language-lovers. They know English, they're very careful about spelling. We [game programmers] didn't know or care about these meta-vars. We'd just use a if we needed a placeholder.

So to recap, any of the following can affect the choice of romanization scheme: perceived phonetical consistency, keystroke efficiency, local culture, school curriculum, or convention. (Again be warned that this nice-looking summary is the result of a survey with a sample count of only 2.)

[*] Kunrei-shiki, a variant of Nihon-shiki, in fact, is the one taught in elementary schools. ref: wikipedia

What are the literal meanings of the Japanese metasyntactic variables btw? Or are they as meaningless as the English ones? :) — Karl Knechtel, Oct 04 '11 at 11:00
@KarlKnechtel I don't think any of them has a concrete meaning, but `fuga` and `piyo`, when repeated, are valid imitatives (擬態語 [ぎたいご] / 擬音語 [ぎおんご] ): fugafuga, piyopiyo. `pakeratta` probably comes from a fictional character's stock phrase (ref: [chiebukuro](http://detail.chiebukuro.yahoo.co.jp/qa/question_detail/q1123854901)) — ento, Oct 04 '11 at 16:34
@KarlKnechtel "onomatopoeia" - Exactly. Japanese has separate terms for sound imitation words (ぎおんご) and sight imitation words (ぎたいご) and I hoped to capture both by "imitative". But all the English definitions of "imitative word" I can find seem to lean towards sound imitations. hummm — ento, Oct 05 '11 at 01:41
I can't imagine how a word can imitate the visual appearance of something, so... maybe it's a cultural thing? — Karl Knechtel, Oct 05 '11 at 03:52
@KarlKnechtel: Much more common in Japanese than in English, but Wikipedia cites "bling" as an example - it refers to the light glinting off something. In English-language comics I've occasionally seen words like "glint", "flutter" used in the same way that someone might use sound effect words like "bang" or "plop". — nkjt, Oct 05 '11 at 10:28
@KarlKnechtel in fact, you do imagine it, it's *very* common: http://en.wikipedia.org/wiki/File:Booba-Kiki.svg — Axioplase, Oct 07 '11 at 11:17
I've always wondered what `foo`, `bar`, `baz` meant too. But I always come across `hoge`, `fuga` in CGI-bin tutorials, so probably the first adopters of CGI-bin in Japan was among those *old-timers*, or maybe they were in turn influenced by the C tutorials written by the old timers. — syockit, Dec 31 '11 at 13:54

score 4 · Answer 2 · answered Oct 03 '11 at 08:53

4

I don't think that there is an absolute industry standard ("programmers" can't even agree on the best way of indenting code...), but in my admittedly limited experience, Word-processor-style, influenced by Nihon-shiki, is most common. Thus, 東京 is "toukyou" and "情報" is "zyouhou", "普通" is "hutuu".

Pure speculation: This might be because if you romanize things this way, it's exactly the same as entering the words in your IME (without the IME step, of course), and so the amount of effort required is minimal.

(Incidentally, I am talking about function and variable names here, in languages where these can be set relatively freely, like C++ and Ruby.)

answered Oct 03 '11 at 08:53

Matt

10,004
43
57

1

I agree that the romanization in code probably comes from romanization used in IMEs. However, IMEs accept both “zyouhou” and “jouhou,” and therefore I do not know whether/why many people choose Nihon-shiki romanization. – Tsuyoshi Ito Oct 03 '11 at 10:31
@TsuyoshiIto That's a good point. In general I concur with ento's (superior) answer: the people I'm thinking of are relatively young and don't care about English at all, so it doesn't bother them that English speakers would find "joho" easier to read than "zyoho". They just go with the logical system that they learned in school (at least in the years they weren't taught by a teacher who preferred Hepburn or whatever). – Matt Oct 03 '11 at 23:48

What romanization scheme is used by programmers?

2 Answers2