Resources and Links
Linguistic corpora
Jointly with the NU Library and related departments, the Linguistics Department holds a membership in the Linguistic Data Consortium (LDC), which provides access to a wide variety of text and speech corpora in many languages. In addition, a number of proprietary and specially licensed corpora are in use for research purposes. For instructions on accessing and using currently installed corpora, or for questions about obtaining corpora, follow this link.
The following corpora are currently available on Box (formerly Babel). Please contact Chun Chan for access.
Directory |
Corpus Name |
Description |
|
---|---|---|---|
ANC |
American National Corpus |
"Large, multi-genre American English text collection" |
|
Aix-Marsec |
Aix-Marsec database |
"Spoken British English, annotated from phone to intonation" |
|
brown-untagged |
Brown Corpus |
"Francis & Kucera (1979) corpus, English written texts" |
|
BU_Radio |
Boston University Radio News |
"Read speech, annoted from phone to intonation" |
|
buckeye |
Buckeye Corpus |
"Spontaneous American English speech, orthographically transcribed, labeled from phone to intonation" |
|
Celex |
CELEX (Release 2) |
"Lexical information for English, German, Dutch" |
|
celex-old |
CELEX (Release 1) |
"Lexical information for English, German, Dutch (archival)" |
|
challenger-raw |
Challenger-Raw |
transcript of space shuttle Challenger commission |
|
challenger-tagged |
Challenger-Tagged |
Syntactically annotated transcript of Challenger |
|
childes |
CHILDES |
CHILDES database (n.b. not most current version) |
|
elberfelder-bible |
Elberfelder Bible |
text of Revidierte Elberfelder Bible (German) |
|
gutenberg |
Project Guttenberg |
Classic texts (1993) |
|
helsinki |
Helsinki Corpus |
Old/middle/early modern English texts |
|
hist-docs |
Historical documents |
"Text of historical documents (e.g., magna carta)" |
|
hkust_mandarin_1 |
HKUST Mandarin Telephone Transcripts |
Orthographically transcribed Mandarin conversational speech |
|
hoosier |
Hoosier Mental Lexicon |
Lexical database for American English |
|
IndianEng |
Indian English |
Indian English speech database |
|
kolhapur |
Kolhapur Corpus |
Indian English written texts |
|
oxford-text-archive |
Oxford Text Archive |
||
nxt_switchboard_ann |
NXT Switchboard Annotations |
Linked annotations for Switchboard corpus: syntactic structure, disfluencies, phonetic transcripts, noun phrase animacy, word timing information, focus/contrast and prosodic structure, phone/syllable alignment, information structure. |
|
ppcme |
Penn-Helsinki Parsed Corpus of Middle English |
Syntactically parsed Middle English texts |
|
rnc |
Russian National Corpus |
Russian texts |
|
rst_discourse_treebank |
Rhetorical Structure Theory Discourse Treebank |
"WSJ articles, annotated with discourse structure" |
|
spanish |
Madrid Corpus |
Annotated Spanish text |
|
susanne |
SUSANNE |
Geoffrey Sampson's annotated corpus of written english |
|
Treebank3 |
"Penn Treebank, release 3" |
Syntactically annoted written and spoken English texts |
|
Zalizniak |
Zalizniak Dictionary |
Russian-English dictionary |
Software
A variety of software for the analysis of spoken and written texts is available in the department's research labs. This includes Praat, a program for speech analysis and synthesis; a variety of part-of-speech taggers, morphological analyzers, parsers, and programs for the statistical analysis of written texts; site-licensed packages for large-scale modeling, visualization and numerical analysis, such as Mathematica, Matlab, SAS and SPSS. The use of these computational tools in student work is strongly encouraged at Northwestern.
In addition to these externally-developed packages, several software applications for audio processing and experimental design have been developed at Northwestern. For more information please see Chun Chan's website.
Useful links
Institutional support
- NU Office for Research
- Office for the Protection of Research Subjects, Institutional Review Board
- NU Information Technology
- NUIT Research Computing Services
- Humanities and Social Science Research Guide to Technical Resources
- FormsPal: Free legal templates online