LenCor, a Corpus of Lenin's Works
LenCor is a text corpus based on the 5th edition of Complete Works by Lenin. (This edition includes almost all works by Lenin, however, a) there is a considerable number of speeches and documents not included into this edition, b) there are suspicions that some works were censored and the 2nd edition is more reliable then the later editions.)
The documents were collected from the website leninism.su by means of web-scraping.
Size of the corpus: 2300 documents, 4,3 M running words
Metadata includes titles of the works, volume, in which the document was included.
Hosting of the corpus: Linux server, Tampere University
Corpus manager: NoSketch Engine (non-commercial version of Sketch Engine)
Morpho-syntactic parsing: Turku neural parser pipeline
Parsing of corpus files: Python script developed by Juho Härme
To obtain access to the corpus contact Mikhail Mikhailov (mikhail.mikhailov(at)tuni.fi)