# Description

This course covers foundational concepts in computational linguistics and is designed for students with a strong background in formal linguistic methods but little or no programming experience. Topics include introductory formal language theory and probability theory, finite state phonological and morphological analysis, generative and discriminative approaches to shallow syntactic parsing (e.g. part-of-speech tagging, chunking) and shallow semantic parsing (e.g. semantic role labeling), and bottom-up and top-down algorithms for syntactic and semantic parsing. Major focus is placed on deploying techniques used in computational linguistics to advance linguistic theory and developing students’ ability to implement these techniques.

# Logistics

Instructor |
Aaron Steven White |

Classroom |
513 Lattimore Hall |

Time |
Tuesday & Thursday 9:40-10:55am |

Office |
511A Lattimore Hall |

Office hours |
Tuesday & Wednesday 11am-12pm (or by appointment) |

Email |
aaron.white@rochester.edu |

# Readings

There will be a substantial amount of reading per week. A tentative schedule is provided below. This tentative schedule will almost certainly change as the semester goes forward. Consult the table below for the most current list.

# Homework

Homeworks will be assigned weekly or biweekly throughout the semester, except during the first three weeks, during which there will be a short programming homework due every class. Most homeworks after the first three weeks of the course will involve both a programming component and a written component, though the relative balance between the two will vary. A template to guide you through both components will be provided through Overleaf.

All written work *must* be produced in LaTeX (using the provided
template), and it must be submitted through
Overleaf (where templates will be hosted). Do not
hand in handwritten work or email me electronic documents. You will not
receive credit for that assignment unless you subsequently turn the
assignment in through Overleaf (subject to late
penalties).

All coding work must be done in Python, and it must be submitted through Blackboard. Do not submit coding work by email unless there is some issue with Blackboard.

# Midterm

There will be a take-home midterm, which will be made available on Thursday, March 1 and will be due one week later (Tuesday, March 8).

# Final

For *LIN224 students*, there will be a take-home final exam, which will
be made available on Tuesday, May 1 and will be due one week later
(Tuesday, May 8).

For *LIN424 students*, there will be a final paper, which will be due on
Tuesday, May 8. A 500-word prospectus for this paper will be due
Tuesday, April 10. More information will be made available closer to
that date.

LIN224 students may opt to write a final paper in place of the final exam, but they should do so with the knowledge that this paper will be graded relative to a rubric designed for graduate student papers.

# Tools

One major aim of this class is to familiarize you with the computational linguist’s tools of the trade. The two we will be working with are LaTeX, for all written work, and Python, for programming assignments.

You will be expected to learn LaTeX on your own. There are plenty of tutorials online for doing this, including a few targeted specifically at linguists. Please do not ask me if you can use a word processor instead. The answer will be no.

You will **not** be expected to learn Python on your own; we will spend the
first two full weeks of class working through a Python tutorial. But
because there is a lot of content to cover in this course, this section
of the course will be relatively brief and fast-paced. If it turns out
that you need extra help on some basic programming concept, it is your
responsibility to seek out help as soon as that becomes clear to you.
This is especially important because programming assignments will build
in complexity, and so if you get stuck early, you have trouble
throughout the rest of the course. Please do not wait to get help.

*Note for Windows 10 users:* I strongly recommend that you install the
Ubuntu Linux
subsystem
available through the Windows Store and install Python in the subsystem,
following the instructions for Ubuntu Linux. This will require you to
gain a basic competence in using the command line. You are responsible
for building this competence if you do not already possess it.

# Final grades

The grading breakdown is: (bi)weekly assignments (50%), midterm (15%), final (25%), participation (10%). (Percentages represent percent of total grade.)

# Late work

Assignments should be submitted by 11:59pm the day they are due. An automatic 10% deduction will be applied after this time. Starting from the end of class, assignments will lose 10% per day late according to the UTC time-stamp of submission that Overleaf reports. Late assignments may not be turned in for credit after a week unless explicit permission was sought before the due date.

# Exceptions

Students will not be penalized because of important civic, ethnic, family or religious obligations, or university service. You will be have a chance, whenever feasible, to make up within a reasonable time any assignment that is missed for these reasons. Absences for these reasons will count as excused for the sake of the participation grade. But it is your job to inform me of any expected absences in advance, as soon as possible.

# Honesty

All assignments and activities associated with this course must be performed in accordance with the University of Rochester’s Academic Honesty Policy. More information is available at: http://www.rochester.edu/college/honesty/.

# Personal needs

Any student who needs special accommodations due to a disability should let me know privately, at the start of the semester.

# Schedule

Date |
Topic |
Reading |
Due |
---|---|---|---|

Jan. 18 | Introduction to Computational Linguistics | - | - |

Jan. 23 | First steps in Python | Downey 2015, Ch. 1-4 | HW0 |

Jan. 25 | Control flow in Python | Downey 2015, Ch. 5-9 | HW1 |

Jan. 30 | Collections in Python | Downey 2015, Ch. 10-12, 14 | HW2 |

Feb. 01 | Classes in Python | Downey 2015, Ch. 15-18 | HW3 |

Feb. 06 | Finite state automata | Sipser 2013, Ch. 1 | - |

Feb. 08 | Finite state transducers | Jurafsky & Martin 2009, Ch. 3 | - |

Feb. 13 | Finite state morphology | Kaplan & Kay 1994, Sec. 1-3 | - |

Feb. 15 | Finite state phonology | Kaplan & Kay 1994, Sec. 4-8 | HW4 |

Feb. 20 | Context free grammars | Sipser 2013, Ch. 2 | - |

Feb. 22 | Bottom-up and top-down parsers | Jurafsky & Martin 2009, Ch. 13.1-13.4.2 | |

Feb. 27 | Shift-reduce parsing | Jurafsky & Martin 2018, Ch. 14.1-14.4 | - |

Mar. 01 | Mildly context sensitive formalisms | Clark 2014 | - |

Mar. 06 | Minimalist Grammars | Stabler 2010 | - |

Mar. 08 | Combinatory Categorial Grammars | Steedman & Baldridge 2011, Sec. 1-6 | HW6 |

Mar. 20 | Basic probability theory | Manning & Schütze 1999 Ch. 2.1 | - |

Mar. 22 | N-gram models |
Jurafsky & Martin 2018, Ch. 4 | MT |

Mar. 27 | Basic information theory | Manning & Schütze 1999 Ch. 2.2 | - |

Mar. 29 | Collocation measures | Manning & Schütze 1999 Ch. 5 | HW7 |

Apr. 03 | Naïve Bayes | Jurafsky & Martin 2018, Ch. 6 | - |

Apr. 05 | Probabilistic Topic Models | Steyvers & Griffiths 2007 | HW8 |

Apr. 10 | Hidden Markov Models | Jurafsky & Martin 2018, Ch. 9.1-9.4 | (P) |

Apr. 12 | The Forward-Backward algorithm | Jurafsky & Martin 2018, Ch. 9.5 | - |

Apr. 17 | Probabilistic context free grammars | Jurafsky & Martin 2018, Ch. 13.1-13.7 | - |

Apr. 19 | The Inside-Outside algorithm | Manning & Schütze 1999, Ch. 11 | HW9 |

Apr. 24 | Logistic regression | Jurafsky & Martin 2018, Ch. 7 | - |

Apr. 26 | Conditional random fields | Sutton & McCallum 2011, Sec. 1-2 | HW10 |

May 01 | Estimation and inference for CRFs | Sutton & McCallum 2011, Sec. 3-4 | - |

# References

- Clark, A. 2014. An introduction to multiple context free grammars for linguists. ms.
- Downey, A.B. 2015.
*Think Python: How to Think Like a Computer Scientist*2nd ed. Green Tea Press. - Heinz, J. 2011. [Computational Phonology – Part I: Foundations].
*Language and Linguistics Compass*. Blackwell. - Manning, C. & H. Schutze. 1999. Foundations of Statistical Natural Language Processing. MIT Press.
- Joshi, A.K., and Y. Schabes. Tree-adjoining grammars.
*Handbook of Formal Languages*3: 69-124. - Jurafsky, D., & Martin, J. H. 2009.
*Speech and Language Processing*2nd ed. Pearson. - Jurafsky, D., & J.H. Martin. 2018.
*Speech and Language Processing*3rd ed. - Kaplan, R. & M. Kay. 1994. Regular models of phonological rule systems
*Computational Linguistics*20:3. 331–378 - Sipser, M. 2013.
*Introduction to the Theory of Computation*. 3rd ed. CENGAGE Learning. - Sutton, C. & A. McCallum 2011. An Introduction to Conditional Random Fields.
*Foundations and Trends in Machine Learning*4:4. 267–373. - Stabler, E. 2010. Computational perspectives on minimalism. In C. Boeckx, ed.
*Oxford Handbook of Linguistic Minimalism*. 616-641. Oxford University Press. - Steedman, M. & J. Baldridge. 2011. Combinatory Categorial Grammar. In R. D. Borsley and K. Börjars (eds),
*Non-Transformational Syntax: Formal and Explicit Models of Grammar*. Wiley-Blackwell, Oxford, UK. -
Steyvers, M. & T. Griffiths. 2007. Probabilistic Topic Models. In T. Landauer, D. McNamara, S. Dennis, and W. Kintsch (eds),

*Latent Semantic Analysis: A Road to Meaning*. Laurence Erlbaum.