ParVlen, a Parallel Corpus of Lenin's Works
ParVlen is composed of original Lenin's works aligned with translations of these works in other languages. The original works are collected from the website leninism.su. The translations were mainly collected from the website marxists.org. Older translations were scanned from paper editions and processed with OCR software by the members of our research group. Aligning was performed with LF Aligner.
Currently, ParVLen consists of three subcorpora: Russian-English, Russian-Finnish, and Russian-Spanish.
ParVLen.En, Russian-English
39 pairs of documents, 600,000 X 2 running words
Translations from Collected Works (Progress) (marxists.org)
ParVLen.Fi, Russian-Finnish
468 pairs of documents, 1,900,000 X 2 running words
Translations from Collected Works (Progress) + other editions of important works, including early translations of the 1920s
ParVLen.Es, Russian-Spanish
22 pairs of documents, 600,000 X 2 running words
Translations of important works from three different publishers: Progress (Moscow), Akal (Madrid), Cartago (Buenos Aires)
All corpora include rich metadata: titles of the works, publishers, translators, years of publication.
Hosting of the corpus: Linux server, Tampere University
Corpus manager: NoSketch Engine (non-commercial version of Sketch Engine)
Morpho-syntactic parsing: Turku neural parser pipeline
Parsing of corpus files: Python script developed by Juho Härme
To obtain access to the corpus contact Mikhail Mikhailov (mikhail.mikhailov(at)tuni.fi)