Voice input - Setup details, hints and tricks

microphone

Be hard on your computer. When it says "speak loudly" speak very loudly, but do not shout. When it says "be silent" don't wait for a particularly silent moment (e.g. until your fridge turns off). Rather - make it calibrate in natural environment, in which you will provide answers.

Keep your microphone as close to your mouth as possible. But if you don't have a sponge on the microphone, be careful not to blow directly on it or your sound will be terribly misshapen. For me keeping the microphone by my chick worked best: I can record clear, loud sound, while I never blow directly onto the microphone.

On some cards microphone calibration does not recognize igain possibility. In such a case it doesn't change this settings. So, you can start an external mixer (e.g. kmix), and experiment with calibrating the microphone with different igain values set by the external mixer, until you get good quality recordings. I-gain values of about 10% are reasonable.

Note, that as I am NOT the author of the voice recognition engine used here (praise Daniel Kiecza, please), some of hints below are based on my guesses/experimenting, rather than solid knowledge. For example, I recommend making i-gain changes only before running microphone calibration, because it might influence other microphone settings which will be setup by microphone calibration. Because of the same reason direct hand-editing ~/.cvoicecontrol/config is not recommended either (different settings in this file may depend one upon another). Rather: run microphone calibration again, just speak e.g. louder/quieter until you get good quality recordings afterwards.

what to record?

I don't care. You have to understand, that this program does not "recognize" your speech by comparing it to any known language. It simply checks if you say what you said during voice-model recording.

You should think about this as about "clicking" with your voice. I don't care if you click on the namebutton "C" with sound "see", "see-see", "note C" or "uba-buba" (although the last possibility is not very educational, I must say). YOU just choose with which sound/phrase you want to "click" on the namebutton "C", you record it 4 or more times in voice-model editor, and then when you say that phrase, computer will understand it as "clicking" on the namebutton "C".

This makes things very flexible. For example you may use your language expressions (rather than English ones). And you may change the expression to something easier for the computer, if it keeps misrecognizing the expression (see common problems below).

wrong recognitions

"But I did say 'B', and this stupid machine again clicked on 'D'!" Sorry - this will happen from time to time. You can minimize such situations (see common problems below), but you should get mentally prepared for being misunderstood.

Using your voice to answer is very convenient. But it means, from time to time your right answer will be misunderstood and reported as a wrong one. If this happens once every 50 questions - please, just ignore it. :) If the program says you should try better - just say to yourself "It's not me who erred, it is this stupid software". And don't get disappointed because of this.

You use this program to learn notes' names. It doesn't matter what computer thinks - if you answered right, you should be glad, even if computer misunderstood it and counted your answer as a mistake! :)

multiple model files

Using multiple voice-model files may be useful for example

To record another model, you simply run voice model editor from Menu Options, start a new model or load an old one, modify it and save with a new name.

Then, whenever you want to switch model files you just open the voice model editor, load a model and click OK (without even opening 'Notes' and 'Commands' tabs).

program commands

You have probably noticed by now, that voice model editor has additional tab: "Program commands". Yes, madam (sir), you can now start/stop a test, turn sound on/off etc. with your voice! Just record minimum of four samples for each program command you want to use.

Note, that always the more options to choose - the more misrecognitions. But for me this works fine. :) Note: you may want to have two voice-model files: one with only notes, and one with both notes and commands. You can then use either of these depending on your needs.

The following commands correspond directly to some menuitems and have quite obvious meaning:

The other commands are:

QUIETER

makes midi output 10% quieter; note that this modifies only Midi Volume, and not Master Volume

LOUDER

makes midi output 10% louder; note that this modifies only Midi Volume, and not Master Volume

set all NOTES NOT ACTIVE

makes all namebuttons inactive, by checking all name checkboxes off

set all NOTES ACTIVE

makes all namebuttons active, by checking all name checkboxes on

The last two commands: 'NO' and 'YES' might be used in future for providing answers to some dialogs. Right now unused.

German/English notation

First of all: voice-model editor doesn't modify in any way things in the model file which it does not understand.

Consider this: if you choose English notations there are notes "ABCDEFG". Say, you recorded samples for all these.

Now you switch to German notation. The notes are "AHCDEFG". The note B is not there, and will not be accessible in voice model editor. You will see a model with no samples for note H recorded. You may now record and use your samples for this note.

But the samples for note B were not erased! They are still there. You have now model with 8 notes names: "ACDEFG" and both "B" and "H". If you choose German notation, "B" will be ignored; if you choose English notation "H" will be ignored. Nice? :)

The drawback is that voice recognition sees all samples. If you use German notation and say "B", it will be recognized and reported to the main program, just the main program will not consider it a possible answer and ignore it. This should not be a problem, unless you have a "B-D" problem described below. OK: you use German notation, the note is D, so you answer "D", but this may be recognized as "B", but this is not a possible answer, so this is ignored - nothing happens. So you answer "D" again, and again it may happen to be recognized as "B" and so on.

This shouldn't be a problem for most users, but if it is for you: simply make a model without a note you are not going to use. In the above example: choose English notation (otherwise you would not have access to "B" in voice-model editor), load your model, choose note "B" and delete all samples (the status should change from "(OK)" to "4 lacking" or "-"), and save such a model as, say, "model_without_b.cvc". Then, when you choose German notation again B-D misrecognition cannot happen, because the note B (which is not used in German notation) is not present in the model at all. And you have ALL 7 notes ("AHCDEFG") well recognized.

In some languages (e.g. Polish) there may be similar problems for "A-H" misrecognition which can be easily turned off in English notation by simply deleting all samples for "H". Anyway, this is not a helpful hint for Poles, since they (we!) use German notation and both H and A are desirable answers ... Other solutions listed below may apply in such situation.