At this moment I'm trying to predict adverse events for the next 8 hours in hospital patients receiving a certain type of treatment, using python and pandas. Every row in my dataset represents one treatment and contains features like blood values and settings of the machines being used. The outcome of the model is a ratio of two blood values, which can be regressed of classified in a binary way (the event is adverse if this ratio exceeds 2.5).
The problem I'm having is that there are only 437 rows (treatments) available and one patient has received this treatment 54 times.
I'd prefer to validate my models on unseen data and keep patients separate in every split, but I'm not sure if this is feasible given the little amount of data.
At this moment I'm splitting my data 70/30 and making sure that patients cannot be in both the train and test set. This approach does feel rather weak, as I'd rather like to cross-validate my models. Therefore my questions are:
How can I improve this validation approach in general, given this little amount of data and imbalance in patient representation?
Would it hurt to let go of the 'keeping patients separate in every split' approach in this kind of setting?