I'm dealing with a site that was hacked a while back. Google indexed thousands of pages using JAPANESE results. I have used my robots file to disallow everything except the pages that actually exist on my site and used htaccess to create 404s for pages that don't exist.
Google continues to show sitelinks (in JAPANESE) to pages on my site. If I check webmaster tools, there are still thousands of pages indexed and content keywords show mainly JAPANESE terms.
There is no JAPANESE version or text on the website.
What is different about this issue is that google is showing the site link text in JAPANESE and now linking to my top pages that exist. I can't disallow these pages. I also need to change the fact that google has all this foreign info in its index and still contains URLs that don't exist on the site.
The snippets served by google all return a 404 but they are still in the index.
How are they still indexing this content?