NannyRecord: Speech recording platform


This tool is used to record read-style speech databases. A text file with the prompts is provided to the tool. Each line of the prompt file represents a sentence or a short paragraph that has to be uttered by the speaker and stored in a separate file. The tool allows to select the sampling frequency, sample format (bits) and file format (RIFF/WAV or RAW/PCM). Several channels can be synchronously recorded. Two versions of the tool are provided: the first one uses the standard sound drivers (for windows os), and allows to record mono and stereo signals; the second version uses ASIO drivers and allows to set a multichannel configuration. For each utterance, a label file is produced including information about the prompt and recording time and date. The graphical user interface (GUI) presents the prompt text and also signal information (level for each channel, clipping). The tool also allows to use acoustic prompts: in this configuration the speaker talks after the prompt signal is played. This can be used to record mimicking a given style, or record question/answer databases. In professional recordings, an operator controls the tool: start/stop recording, repeat, advance. Graphical aspects of the tool, as font size, can be controlled to facilitate reading of the prompts from a distant position. The tool has been intensively used to produce large speech synthesis databases. For instance, the Spanish TC-STAR synthesis database, and the Catalan Festcat synthesis database, both recorded using 3 channels (2 microphones and one laryngograph), 96kHz and 24 bits/sample used NannyRecord. Also, the Catalan Speecon, a multichannel database, designed to train speech recognizers, was recorded using this tool.
The tool is implemented using C++. Different threads are used to attend the user/operator input that controls the tool and the speech acquisition and analysis. The input speech samples are stored in a memory buffer with a dedicated thread to store the signal in the files. The state of the buffer is monitorized to ensure that no samples are lost, even for the more demanding transfer rates (high sampling frequencies, several channels).

You don’t have the permission to edit this resource.