3

I have a pretty good background in data analysis and statistics in the social sciences, including both frequentist and Bayesian paradigms, and I have recently been introduced to computational linguistics.

I think this would be a really good thing for me to learn, especially as I'm trying to come up with computational ways to analyze political discourse and political arguments. However, I've come across maybe 5 different terms for what I think I'm trying to learn, all with their own books and websites.

Might someone be able to point be in the direction of a good starting place for someone just learning about this field?


EDIT: Thank you so much to those of you that have responded!

RickyB
  • 951
  • 1
  • 10
  • 21

3 Answers3

3

I have found the works of R.H. Baayen and K. Johnson invaluable on the application of statistical methods in Linguistics. They have respectively written the following books on this matter:

  1. Analyzing Linguistic Data - A Practical Introduction to Statistics Using R. by R.H. Baayen.
  2. Quantitative Methods In Linguistics by K. Johnson.

Both books are "easy reads" in terms of Stats for person with a basic Stats background. Their value is that they contextualize standard statistical concepts within Linguistics (eg. shrinkage in terms of a linear mixed effects model or the Zipf's law in terms of distributional assumptions). They should put you on good standing on how to apply Statistics in a linguistic problem.

Having said that no list of Computational Linguistics books is ever complete without a nod to Jurafsky and Martin's Speech and Language Processing. It comes from a firm CS view of the whole issue (so expect Hidden Markov Models, Context Free Grammars, Machine Translation, etc.).

I mention this book because you briefly mention that you have came across different terms referring to the same notion. My advice would be to use Jurafsky and Martin's book as a reference. It is widely accepted as one of the best textbooks on Computational Linguistics and it won't fault you on definitions and/or distinctions between different terms. Its reference list is simply humongous if you are interested in following up something in more detail; please note that the book does not focus on Statistics though.

usεr11852
  • 33,608
  • 2
  • 75
  • 117
1

First and foremost, I would recommend an excellent introductory book "Computational Linguistics: Models, Resources, Applications", which is available in printed format as well as in free online or downloadable format. Despite some examples leaning toward Spanish language, most contents of general nature and, thus, is universally applicable to any human language and subject domain.

In addition to the above, I'd recommend to review information, scattered across various pages on the website of The Stanford Natural Language Processing Group, which, among other information, also contain references to research papers and relevant open source software, developed at The Stanford NLP Group.

While not focused on computational linguistics per se, I think that book "Introduction to Information Retrieval", available both in a commercial printed format and as free online and downloadable formats, contains some information, relevant to the question. If you are interested in text classification, one of the corresponding sections in the above-mentioned book has an alternative and much more detailed presentation of the topic in an excellent blog post by Sebastian Raschka.

Finally, another interesting and useful resource (but, less introductory) is an open access research journal Computational Linguistics, published by MIT Press. All resources, mentioned above (with the exception of the first one), are shared as examples. They just barely scratch the surface of the field of computational linguistics and do not represent rich diversity of various research streams.

Aleksandr Blekh
  • 7,867
  • 2
  • 27
  • 93
1

I liked this book very much:

Manning, C. (1999). Foundations of Statistical Natural Language Processing. MIT Press.

Tim
  • 108,699
  • 20
  • 212
  • 390