ParVlen, a Parallel Corpus of Lenin's Works

ParVlen is composed of original Lenin's works aligned with translations of these works in other languages. The original works are collected from the website leninism.su. The translations were mainly collected from the website marxists.org. Older translations were scanned from paper editions and processed with OCR software by the members of our research group. Aligning was performed with LF Aligner.

Currently, ParVLen consists of three subcorpora: Russian-English, Russian-Finnish, and Russian-Spanish.

ParVLen.En, Russian-English

39 pairs of documents, 600,000 X 2 running words

Translations from Collected Works (Progress) (marxists.org)

ParVLen.Fi, Russian-Finnish

468 pairs of documents, 1,900,000 X 2 running words

Translations from Collected Works (Progress) + other editions of important works, including early translations of the 1920s

ParVLen.Es, Russian-Spanish

22 pairs of documents, 600,000 X 2 running words

Translations of important works from three different publishers: Progress (Moscow), Akal (Madrid), Cartago (Buenos Aires)

All corpora include rich metadata: titles of the works, publishers, translators, years of publication.

Hosting of the corpus: Linux server, Tampere University

Corpus manager: NoSketch Engine (non-commercial version of Sketch Engine)

Morpho-syntactic parsing: Turku neural parser pipeline

Parsing of corpus files: Python script developed by Juho Härme

To obtain access to the corpus contact Mikhail Mikhailov (mikhail.mikhailov(at)tuni.fi)

← Back to main page