I have a list of diseases for my research. For each disease, I have a list of ages for the diseases. "Breast Carcinoma" may be a list of [1,2,2,3,4,5,5,5,5,5] while "Adrenal Cortex Neoplasms" maybe be a different list with a thousand elements, BUT with the same general shape in a bincount (high number of 5s, a few 2s). I would like to stratify these diseases based on the shape of the bincount distributions. I am quite new to machine learning, and I honestly have no clue as to how I can begin. However, if you can give me a general approach that I can research more in detail, then I could code a python algorithm for what I wish for. By stratification, I imply a clustering of the data using the distributions.
Thank you for taking the time to help a beginner!