0

I'm curious as to what the percentage of native Japanese words that contain digraphs, or to put it another way, the average number of digraphs (or individual kana) in a Japanese word would be. I'm a big math nerd, so as I'm learning Japanese this popped into my head.

John
  • 134
  • 7
JShoe
  • 103
  • 2
  • 3
    I am curious what the term “digraph” refers to in the context of hiragana, which is a syllabary and not an alphabet. Things like きゃ, きゅ, きょ, etc? – aguijonazo Mar 15 '22 at 06:34
  • 3
    The term digraph is usually used for sequences that together tend to represent a sound, for example , , in English. I'm not aware that Japanese consider sequences such as きゃ, しゃ, ちゅ to be digraphs though. Perhaps you can clarify what you intend by "digraph". – jogloran Mar 15 '22 at 07:33
  • 1
    @jogloran You are both on the right track, in that I do mean things likeりゃ、りゅ、りょ – JShoe Mar 15 '22 at 13:54
  • For that you would need to get a comprehensive list of words from somewhere and somehow extract only "native" words. Excluding recent loan words might be easy because they use katakana. Do you also need to exclude 漢語? – aguijonazo Mar 15 '22 at 22:20

1 Answers1

2

From a 68,000 word dictionary, I counted 22,000 words whose readings include one, or more, of 「っ, ゃ, ゅ, ょ」. Unfortunately, I triple-counted unusual words like 出張(しゅっちょう). 「ょ」was in 11,000 of the words while「ゃ」was in just 2,200. I ignored all katakana.

davewp
  • 2,252
  • 1
  • 16
  • 30