I am looking for statistics (raw data) on the most common readings for kanji used in names (family name or first name) and the most common gender for a name. Existing dictionaries tend to just spit out all possible readings for a name even when only 1 or 2 are likely; I'd like to improve that.
There are loads of websites that offer this on a per-lookup basis, e.g. https://namegen.jp/yomikata - so the raw data must be out there. I checked Census data but could not find it. I googled for research papers and visited websites. http://www.myj7000.jp-biz.net/1000/0100f.htm looked promising, but is broken? Or I can't figure out how to use it.
I have searched for combinations like 名字調査 統計 データ etc. but to no avail. Would really appreciate any help.
I also considered crawling wikipedia, which would work (if I can figure out how to remove fictional characters) but won't be as granular or accurate as the data these other sites are using.