This general area is called "continual", "incremental" or "life-long" learning, and it's quite an active area of research.
There are many approaches to continual learning, including different forms of regularization which penalize forgetting, dynamically expanding architectures, and explicit model memories. For more detail, I would refer you to this survey paper: Continual Lifelong Learning with Neural Networks: A Review
Methods such as fine-tuning and learning without forgetting seem to
require all objects in the new images annotated though.
Well, that is not always the case. Let me give an explicit continual learning approach to make this clear: you can train a classifier with a binary output for whether the image has pear or not, and apple or not. You can add a new binary output for whether it has orange or not. Then when you start training on your orange dataset, impose a regularization loss which penalizes the pear and apple predictions from changing much (freeze a copy of your model from before you started training on oranges, and penalize the MSE or cross-entropy between your old and new pear and apple predictions).