SEGRE: An automatic tool for grapheme-to-allophone transcription in Catalan




Segre is a rule-based automatic phonetic transcription system for Catalan, jointly developed by the Universitat Politècnica de Catalunya, the Universitat Autònoma de Barcelona and the Universitat de Barcelona in the framework of the Catalan Reference Centre for Language Engineering (CREL, Centre de Referència en Enginyeria Lingüística).

The syntax of the rules has been designed to obtain phonetic transcriptions for four major dialects of Catalan: the Central or Eastern dialect, spoken in the East of Catalonia, the North-Western or Western dialect, spoken in the West of Catalonia (including the South), the Balearic, spoken in the Balearic Islands, and finally the Valencian, spoken in the Valencian Community.

The accuracy of transcriptions of new texts, when compared with human expert generated transcriptions, is of 99.1% for isolated words and 99,39% for running text.
Segre can be considered a useful tool to model how coarticulation modifies the isolated transcription of words in real sentences. So, it is helpful not only to build speech syntesis systems but also to train subword-based speech recognition systems.

