Skip to main content

Linguistic Data Consortium

The Linguistic Data Consortium is an open consortium of universities, companies and government research laboratories. It creates, collects and distributes speech and text databases, lexicons, and other resources for research and development purposes. The University of Pennsylvania is the LDC's host institution. The LDC was founded in 1992 with a grant from the Advanced Research Projects Agency (ARPA), and is partly supported by grant IRI-9528587 from the Information and Intelligent Systems division of the National Science Foundation.

Please visit Linguistics Data Consortium at University of Pennsylvania for more information and a catalog of corpora.

The Linguistics Department at Northwestern University has access to many LDC corpora online and maintains a collection of physical copies. Any graduate student, faculty member or researcher at Northwestern University may request access. If you are interested in obtaining any corpora, please first go to LDC and create a new account. Department staff will add new members with a northwestern.edu address to the set of authorized users. Once you get notification that your status has changed you can go to LDC Catalog to download corpora.

If you are interested in corpora that are not available for download, check whether we have a physical copy via LDC Inventory Then, send an email from your NU email account to the Linguistics Department containing the following information: