I would like to compare different measurements in regard to agreement. Quite often this is done using a Bland-Altman plot (also called Tukey mean difference plot) in medical literature.
What I would like to do is compare more than 2 devices (namely, 4 devices). 3 of the devices are commercially available and one has been newly developed. I am eager to find out how well the newly developed device performs compared to the other ones. None of the commercially available devices can be considered a "gold standard". What would be the best way to do so?
I am currently considering the following two options (but I am open to any other solutions):
Calculating a new variable "commercially_available" that is the mean of all commercially available devices and then plot the mean of the "commercially_available" and the new device vs. the difference between the "commercially_available" and the new device. (So basically, combining the measurements obtained be the commercially available devices into one measurement using the MEAN function)
Creating three different plots, each comparing one commercially available device with the newly developed device.