So, take a look at the simplest scheme vocoder:

The signal from the microphone (speech) is fed to a bank of bandpass filters, each of which passes only a small part of the frequency band of the speech signal. The greater the number of filters - the better speech intelligibility. At the same time, the carrier signal (e.g. ramp) is also passed through the same filter bank. Filter output speech signal is fed to envelope detectors which control modulators and outputs a filter carrier signal passes to the other input of the modulator. As a result, each band speech signal adjusts the level of the corresponding band carrier (modulates it). Further, output signals from all modulators are mixed and sent to the output. Further, all signal modulators are mixed and sent to the output. In order to improve speech intelligibility also apply additional blocks, such as the detector "sizzling" sound. So, to begin development necessary to determine the source signals, where they will take. It is possible for example to capture data from a file or directly processed in real-time from a microphone or line input. To test very easy to use file, so we will do and so and so. As the carrier will use an external file looped in a circle, to adjust the tone simply add the ability to change the playback speed, which will change the tone. To capture the sound of the file will use Audio Compression Manager (ACM), with it very convenient to make conversion between formats (because the file can be in any format, you would have to write some functions to different formats). It may be that to convert to the desired format will not correct ACM drivers, then play this file will not be available (although you can try to do it in 2 stages). As input files will use the wav - files, because to work with them in the system has special features to facilitate retrieving data from them.