Description

This course covers foundational concepts in computational linguistics and is designed for students with a strong background in formal linguistic methods but little or no programming experience. Topics include introductory formal language theory and probability theory, finite state phonological and morphological analysis, generative and discriminative approaches to shallow syntactic parsing (e.g. part-of-speech tagging, chunking) and shallow semantic parsing (e.g. semantic role labeling), and bottom-up and top-down algorithms for syntactic and semantic parsing. Major focus is placed on deploying techniques used in computational linguistics to advance linguistic theory and developing students’ ability to implement these techniques.

Logistics

Instructor Aaron Steven White
Classroom 513 Lattimore Hall
Time Tuesday & Thursday 9:40-10:55am
Office 511A Lattimore Hall
Office hours Tuesday & Wednesday 11am-12pm (or by appointment)
Email aaron.white@rochester.edu

Readings

There will be a substantial amount of reading per week. A tentative schedule is provided below. This tentative schedule will almost certainly change as the semester goes forward. Consult the table below for the most current list.

Homework

Homeworks will be assigned weekly or biweekly throughout the semester, except during the first three weeks, during which there will be a short programming homework due every class. Most homeworks after the first three weeks of the course will involve both a programming component and a written component, though the relative balance between the two will vary. A template to guide you through both components will be provided through Overleaf.

All written work must be produced in LaTeX (using the provided template), and it must be submitted through Overleaf (where templates will be hosted). Do not hand in handwritten work or email me electronic documents. You will not receive credit for that assignment unless you subsequently turn the assignment in through Overleaf (subject to late penalties).

All coding work must be done in Python, and it must be submitted through Blackboard. Do not submit coding work by email unless there is some issue with Blackboard.

Midterm

There will be a take-home midterm, which will be made available on Thursday, March 1 and will be due one week later (Tuesday, March 8).

Final

For LIN224 students, there will be a take-home final exam, which will be made available on Tuesday, May 1 and will be due one week later (Tuesday, May 8).

For LIN424 students, there will be a final paper, which will be due on Tuesday, May 8. A 500-word prospectus for this paper will be due Tuesday, April 10. More information will be made available closer to that date.

LIN224 students may opt to write a final paper in place of the final exam, but they should do so with the knowledge that this paper will be graded relative to a rubric designed for graduate student papers.

Tools

One major aim of this class is to familiarize you with the computational linguist’s tools of the trade. The two we will be working with are LaTeX, for all written work, and Python, for programming assignments.

You will be expected to learn LaTeX on your own. There are plenty of tutorials online for doing this, including a few targeted specifically at linguists. Please do not ask me if you can use a word processor instead. The answer will be no.

You will not be expected to learn Python on your own; we will spend the first two full weeks of class working through a Python tutorial. But because there is a lot of content to cover in this course, this section of the course will be relatively brief and fast-paced. If it turns out that you need extra help on some basic programming concept, it is your responsibility to seek out help as soon as that becomes clear to you. This is especially important because programming assignments will build in complexity, and so if you get stuck early, you have trouble throughout the rest of the course. Please do not wait to get help.

Note for Windows 10 users: I strongly recommend that you install the Ubuntu Linux subsystem available through the Windows Store and install Python in the subsystem, following the instructions for Ubuntu Linux. This will require you to gain a basic competence in using the command line. You are responsible for building this competence if you do not already possess it.

Final grades

The grading breakdown is: (bi)weekly assignments (50%), midterm (15%), final (25%), participation (10%). (Percentages represent percent of total grade.)

Late work

Assignments should be submitted by 11:59pm the day they are due. An automatic 10% deduction will be applied after this time. Starting from the end of class, assignments will lose 10% per day late according to the UTC time-stamp of submission that Overleaf reports. Late assignments may not be turned in for credit after a week unless explicit permission was sought before the due date.

Exceptions

Students will not be penalized because of important civic, ethnic, family or religious obligations, or university service. You will be have a chance, whenever feasible, to make up within a reasonable time any assignment that is missed for these reasons. Absences for these reasons will count as excused for the sake of the participation grade. But it is your job to inform me of any expected absences in advance, as soon as possible.

Honesty

All assignments and activities associated with this course must be performed in accordance with the University of Rochester’s Academic Honesty Policy. More information is available at: http://www.rochester.edu/college/honesty/.

Personal needs

Any student who needs special accommodations due to a disability should let me know privately, at the start of the semester.

Schedule

Date Topic Reading Due
Jan. 18 Introduction to Computational Linguistics - -
Jan. 23 First steps in Python Downey 2015, Ch. 1-4 HW0
Jan. 25 Control flow in Python Downey 2015, Ch. 5-9 HW1
Jan. 30 Collections in Python Downey 2015, Ch. 10-12, 14 HW2
Feb. 01 Classes in Python Downey 2015, Ch. 15-18 HW3
Feb. 06 Finite state automata Sipser 2013, Ch. 1 -
Feb. 08 Finite state transducers Jurafsky & Martin 2009, Ch. 3 -
Feb. 13 Finite state morphology Kaplan & Kay 1994, Sec. 1-3 -
Feb. 15 Finite state phonology Kaplan & Kay 1994, Sec. 4-8 HW4
Feb. 20 Context free grammars Sipser 2013, Ch. 2 -
Feb. 22 Bottom-up and top-down parsers Jurafsky & Martin 2009, Ch. 13.1-13.4.2 HW5
Feb. 27 Shift-reduce parsing Jurafsky & Martin 2018, Ch. 14.1-14.4 -
Mar. 01 Mildly context sensitive formalisms Clark 2014 -
Mar. 06 Minimalist Grammars Stabler 2010 -
Mar. 08 Combinatory Categorial Grammars Steedman & Baldridge 2011, Sec. 1-6 HW6
Mar. 20 Basic probability theory Manning & Schütze 1999 Ch. 2.1 -
Mar. 22 N-gram models Jurafsky & Martin 2018, Ch. 4 MT
Mar. 27 Basic information theory Manning & Schütze 1999 Ch. 2.2 -
Mar. 29 Collocation measures Manning & Schütze 1999 Ch. 5 HW7
Apr. 03 Naïve Bayes Jurafsky & Martin 2018, Ch. 6 -
Apr. 05 Probabilistic Topic Models Steyvers & Griffiths 2007 HW8
Apr. 10 Hidden Markov Models Jurafsky & Martin 2018, Ch. 9.1-9.4 (P)
Apr. 12 The Forward-Backward algorithm Jurafsky & Martin 2018, Ch. 9.5 -
Apr. 17 Probabilistic context free grammars Jurafsky & Martin 2018, Ch. 13.1-13.7 -
Apr. 19 The Inside-Outside algorithm Manning & Schütze 1999, Ch. 11 HW9
Apr. 24 Logistic regression Jurafsky & Martin 2018, Ch. 7 -
Apr. 26 Conditional random fields Sutton & McCallum 2011, Sec. 1-2 HW10
May 01 Estimation and inference for CRFs Sutton & McCallum 2011, Sec. 3-4 -

References