Lenin in Translation: The Global Journey of Revolutionary Discourse

LenCor, a Corpus of Lenin's Works

LenCor is a text corpus based on the 5th edition of Complete Works by Lenin. (This edition includes almost all works by Lenin, however, a) there is a considerable number of speeches and documents not included into this edition, b) there are suspicions that some works were censored and the 2nd edition is more reliable then the later editions.)

The documents were collected from the website leninism.su by means of web-scraping.

Size of the corpus: 2300 documents, 4,3 M running words

Metadata includes titles of the works, volume, in which the document was included.

Hosting of the corpus: Linux server, Tampere University

Corpus manager: NoSketch Engine (non-commercial version of Sketch Engine)

Morpho-syntactic parsing: Turku neural parser pipeline

Parsing of corpus files: Python script developed by Juho Härme

To obtain access to the corpus contact Mikhail Mikhailov (mikhail.mikhailov(at)tuni.fi)

← Back to main page