The aim of the module is to equip students with a fundamental understanding of automated methods for processing linguistic data in textual form (natural language processing) from different sources (newswire, web, social media, academic publications) and associated challenges. The module will also provide students with the skills to analyse textual data and familiarise them with state of the art tools and applications.
By the end of the module the student should be able to:
- Demonstrate knowledge of the fundamental principles of natural language processing.
Demonstrate understanding of methods and algorithms used to process different types of textual data as well as the challenges involved.
- Demonstrate understanding of the state of the art in the core areas of Natural Language Processing as well as related applications.
- Show a working knowledge of state of the art tools available for analysing linguistic data.
- Demonstrate computational skills to create NLP processing pipelines using existing NLP libraries, retrain models and extend existing NLP tools.
- Regular expressions, word tokenization, stemming, sentence segmentation
- N-grams and language models
- Part-of-speech Tagging
- Hidden Markov models and maximum entropy models
- Semantics: lexical semantics, distributional semantics, word sense disambiguation and vector space models
- Spelling correction
- Text classification
- Sentiment analysis
- Information extraction: Named entity recognition, relation extraction
- Information retrieval
- Syntactic parsing
- Semantic parsing
- Question answering and summarisation
- Text processing in social media