This course covers foundational concepts in computational linguistics and is designed for students with a strong background in formal linguistic methods but little or no programming experience. Topics include introductory formal language theory and probability theory, finite state phonological and morphological analysis, generative and discriminative approaches to shallow syntactic parsing (e.g. part-of-speech tagging, chunking) and shallow semantic parsing (e.g. semantic role labeling), and bottom-up and top-down algorithms for syntactic and semantic parsing. Major focus is placed on deploying techniques used in computational linguistics to advance linguistic theory and developing students’ ability to implement these techniques.
|Instructor||Aaron Steven White|
|Classroom||513 Lattimore Hall|
|Time||Tuesday & Thursday 9:40-10:55am|
|Office||511A Lattimore Hall|
|Office hours||Tuesday & Wednesday 11am-12pm (or by appointment)|
There will be a substantial amount of reading per week. A tentative schedule is provided below. This tentative schedule will almost certainly change as the semester goes forward. Consult the table below for the most current list.
Homeworks will be assigned weekly or biweekly throughout the semester, except during the first three weeks, during which there will be a short programming homework due every class. Most homeworks after the first three weeks of the course will involve both a programming component and a written component, though the relative balance between the two will vary. A template to guide you through both components will be provided through Overleaf.
All written work must be produced in LaTeX (using the provided template), and it must be submitted through Overleaf (where templates will be hosted). Do not hand in handwritten work or email me electronic documents. You will not receive credit for that assignment unless you subsequently turn the assignment in through Overleaf (subject to late penalties).
All coding work must be done in Python, and it must be submitted through Blackboard. Do not submit coding work by email unless there is some issue with Blackboard.
There will be a take-home midterm, which will be made available on Thursday, March 1 and will be due one week later (Tuesday, March 8).
For LIN224 students, there will be a take-home final exam, which will be made available on Tuesday, May 1 and will be due one week later (Tuesday, May 8).
For LIN424 students, there will be a final paper, which will be due on Tuesday, May 8. A 500-word prospectus for this paper will be due Tuesday, April 10. More information will be made available closer to that date.
LIN224 students may opt to write a final paper in place of the final exam, but they should do so with the knowledge that this paper will be graded relative to a rubric designed for graduate student papers.
One major aim of this class is to familiarize you with the computational linguist’s tools of the trade. The two we will be working with are LaTeX, for all written work, and Python, for programming assignments.
You will be expected to learn LaTeX on your own. There are plenty of tutorials online for doing this, including a few targeted specifically at linguists. Please do not ask me if you can use a word processor instead. The answer will be no.
You will not be expected to learn Python on your own; we will spend the first two full weeks of class working through a Python tutorial. But because there is a lot of content to cover in this course, this section of the course will be relatively brief and fast-paced. If it turns out that you need extra help on some basic programming concept, it is your responsibility to seek out help as soon as that becomes clear to you. This is especially important because programming assignments will build in complexity, and so if you get stuck early, you have trouble throughout the rest of the course. Please do not wait to get help.
Note for Windows 10 users: I strongly recommend that you install the Ubuntu Linux subsystem available through the Windows Store and install Python in the subsystem, following the instructions for Ubuntu Linux. This will require you to gain a basic competence in using the command line. You are responsible for building this competence if you do not already possess it.
The grading breakdown is: (bi)weekly assignments (50%), midterm (15%), final (25%), participation (10%). (Percentages represent percent of total grade.)
Assignments should be submitted by 11:59pm the day they are due. An automatic 10% deduction will be applied after this time. Starting from the end of class, assignments will lose 10% per day late according to the UTC time-stamp of submission that Overleaf reports. Late assignments may not be turned in for credit after a week unless explicit permission was sought before the due date.
Students will not be penalized because of important civic, ethnic, family or religious obligations, or university service. You will be have a chance, whenever feasible, to make up within a reasonable time any assignment that is missed for these reasons. Absences for these reasons will count as excused for the sake of the participation grade. But it is your job to inform me of any expected absences in advance, as soon as possible.
All assignments and activities associated with this course must be performed in accordance with the University of Rochester’s Academic Honesty Policy. More information is available at: http://www.rochester.edu/college/honesty/.
Any student who needs special accommodations due to a disability should let me know privately, at the start of the semester.
|Jan. 18||Introduction to Computational Linguistics||-||-|
|Jan. 23||First steps in Python||Downey 2015, Ch. 1-4||HW0|
|Jan. 25||Control flow in Python||Downey 2015, Ch. 5-9||HW1|
|Jan. 30||Collections in Python||Downey 2015, Ch. 10-12, 14||HW2|
|Feb. 01||Classes in Python||Downey 2015, Ch. 15-18||HW3|
|Feb. 06||Finite state automata||Sipser 2013, Ch. 1||-|
|Feb. 08||Finite state transducers||Jurafsky & Martin 2009, Ch. 3||-|
|Feb. 13||Finite state morphology||Kaplan & Kay 1994, Sec. 1-3||-|
|Feb. 15||Finite state phonology||Kaplan & Kay 1994, Sec. 4-8||HW4|
|Feb. 20||Context free grammars||Sipser 2013, Ch. 2||-|
|Feb. 22||Bottom-up and top-down parsers||Jurafsky & Martin 2009, Ch. 13.1-13.4.2|
|Feb. 27||Shift-reduce parsing||Jurafsky & Martin 2018, Ch. 14.1-14.4||-|
|Mar. 01||Mildly context sensitive formalisms||Clark 2014||-|
|Mar. 06||Minimalist Grammars||Stabler 2010||-|
|Mar. 08||Combinatory Categorial Grammars||Steedman & Baldridge 2011, Sec. 1-6||HW6|
|Mar. 20||Basic probability theory||Manning & Schütze 1999 Ch. 2.1||-|
|Mar. 22||N-gram models||Jurafsky & Martin 2018, Ch. 4||MT|
|Mar. 27||Basic information theory||Manning & Schütze 1999 Ch. 2.2||-|
|Mar. 29||Collocation measures||Manning & Schütze 1999 Ch. 5||HW7|
|Apr. 03||Naïve Bayes||Jurafsky & Martin 2018, Ch. 6||-|
|Apr. 05||Probabilistic Topic Models||Steyvers & Griffiths 2007||HW8|
|Apr. 10||Hidden Markov Models||Jurafsky & Martin 2018, Ch. 9.1-9.4||(P)|
|Apr. 12||The Forward-Backward algorithm||Jurafsky & Martin 2018, Ch. 9.5||-|
|Apr. 17||Probabilistic context free grammars||Jurafsky & Martin 2018, Ch. 13.1-13.7||-|
|Apr. 19||The Inside-Outside algorithm||Manning & Schütze 1999, Ch. 11||HW9|
|Apr. 24||Logistic regression||Jurafsky & Martin 2018, Ch. 7||-|
|Apr. 26||Conditional random fields||Sutton & McCallum 2011, Sec. 1-2||HW10|
|May 01||Estimation and inference for CRFs||Sutton & McCallum 2011, Sec. 3-4||-|
- Clark, A. 2014. An introduction to multiple context free grammars for linguists. ms.
- Downey, A.B. 2015. Think Python: How to Think Like a Computer Scientist 2nd ed. Green Tea Press.
- Heinz, J. 2011. [Computational Phonology – Part I: Foundations]. Language and Linguistics Compass. Blackwell.
- Manning, C. & H. Schutze. 1999. Foundations of Statistical Natural Language Processing. MIT Press.
- Joshi, A.K., and Y. Schabes. Tree-adjoining grammars. Handbook of Formal Languages 3: 69-124.
- Jurafsky, D., & Martin, J. H. 2009. Speech and Language Processing 2nd ed. Pearson.
- Jurafsky, D., & J.H. Martin. 2018. Speech and Language Processing 3rd ed.
- Kaplan, R. & M. Kay. 1994. Regular models of phonological rule systems Computational Linguistics 20:3. 331–378
- Sipser, M. 2013. Introduction to the Theory of Computation. 3rd ed. CENGAGE Learning.
- Sutton, C. & A. McCallum 2011. An Introduction to Conditional Random Fields. Foundations and Trends in Machine Learning 4:4. 267–373.
- Stabler, E. 2010. Computational perspectives on minimalism. In C. Boeckx, ed. Oxford Handbook of Linguistic Minimalism. 616-641. Oxford University Press.
- Steedman, M. & J. Baldridge. 2011. Combinatory Categorial Grammar. In R. D. Borsley and K. Börjars (eds), Non-Transformational Syntax: Formal and Explicit Models of Grammar. Wiley-Blackwell, Oxford, UK.
Steyvers, M. & T. Griffiths. 2007. Probabilistic Topic Models. In T. Landauer, D. McNamara, S. Dennis, and W. Kintsch (eds), Latent Semantic Analysis: A Road to Meaning. Laurence Erlbaum.