Current Challenges in Web Corpus Building

Investor logo

Warning

This publication doesn't include Faculty of Medicine. It includes Faculty of Informatics. Official publication website can be found on muni.cz.
Authors

JAKUBÍČEK Miloš KOVÁŘ Vojtěch RYCHLÝ Pavel SUCHOMEL Vít

Year of publication 2020
Type Article in Proceedings
Conference Proceedings of the 12th Web as Corpus Workshop
MU Faculty or unit

Faculty of Informatics

Citation
Web článek ve sborníku
Keywords Web corpora; corpus building
Description In this paper we discuss some of the current challenges in web corpus building that we faced in the recent years when expanding the corpora in Sketch Engine. The purpose of the paper is to provide an overview and raise discussion on possible solutions, rather than bringing ready solutions to the readers. For every issue we try to assess its severity and briefly discuss possible mitigation options.
Related projects:

You are running an old browser version. We recommend updating your browser to its latest version.

More info