Two questions on Cross Validated that contain answers:
What is the difference between "testing of hypothesis" and "test of significance"?
Is the "hybrid" between Fisher and Neyman-Pearson approaches to statistical testing really an "incoherent mishmash"?
Papers that explain in depth with historical context:
Goodman, Toward evidence-based medical statistics. 1: The P value fallacy. https://pubmed.ncbi.nlm.nih.gov/10383371/
Hurlbert, S., & Lombardi, C. (2009). Final collapse of the Neyman-Pearson decision theoretic framework and rise of the neoFisherian. Annales Zoologici Fennici, 46(5), 311–349. (Link to paper)
Lew, M. J. (2012). Bad statistical practice in pharmacology (and other basic biomedical disciplines): you probably don't know P. British Journal of Pharmacology, 166(5), 1559–1567. doi:10.1111/j.1476-5381.2012.01931.x (Link to paper)
A paper that explains the difference and also puts it into the context of scientific inference:
Lew M.J. (2019) A Reckless Guide to P-values. In: Bespalov A., Michel M., Steckler T. (eds) Good Research Practice in Non-Clinical Pharmacology and Biomedicine. Handbook of Experimental Pharmacology, vol 257. Springer, Cham. https://doi.org/10.1007/164_2019_286