# Java – Voice segmentation

Voice segmentation… here is a solution to the problem.

## Voice segmentation

I’m helping a farm group roosters based on their croaks so that roosters with similar calls live together. The farmer said he wanted to know if the chickens would learn anything from other people, and if so, he would put it in a good flock every time he caught a chick, hoping to make some good impact on the new chicks. My job is to record the similarity of each group and compare the results after a few weeks and see if there is any increased similarity in the group.

The idea is to write a program that gives similarity scores to the two input wav files, so each rooster can find the most similar roommate and pair it into groups, then group similar groups and finally divide into groups.

I have 3 roosters crowing and analyzing it with a spectrogram (each rooster crowing twice):

Rooster A:

Rooster B:

Rooster C:

Before calculating the similarity, I want to divide the crow into segments so that each segment retains a certain frequency (which will be used later when calculating the similarity). My current solution is:

Step 1: When the intensity line is not continuous, the sound will be divided by the gap;
Step 2: When there is a critical change in frequency, that time is considered the boundary of the segment

I’m wondering if the steps above are enough. I hope others have better suggestions on how to improve the splitting. Is there any method or algorithm that suits my situation? Thanks!

### Solution

The best way to do this is to use some speech recognition technology. I used it for projects that identify birdsong. For my part, I used HTK (Hidden Markov Toolkit) to build HMMs that can recognize the singing of birds.
You can change the Mel scale to be more similar to your situation.
The Mel scale (from MFCC) is related to vocals. If you search in Google, you’ll find that some bird-related papers change the Mel or Bark (PLP) scale to match the animal vocal tract.

You will need a large number of samples to robustly train HMM parameters and analyze the optimal number of states. I recommend at least 100 samples for each of these three songs and use 3 emitting HMM states.