The doc2vec implementation in Python from the gensim library works the following way:
It basically trains word vectors like word2vec, but trains document vectors at the same time.
That is, if you run just word2vec, every observation is a sample text=document and you learn the word vectors for all words that occur in the sample texts (minus the ones you exclude, e.g. common words like "the"). This is done by iterating over all observations one by one multiple times.
If you run doc2vec, every observation is a sample text like above, and you learn word vectors for all words that occur in the sample texts. But in addition, you learn one vector for the observation=sample text=document itself. That is, you still iterate over all observations multiple times, but in every step when updating your word vectors with the data from one observation, you update the document vector corresponding to this particular observation=document, too. Basically the document itself is treated as a word that only occurs in this document. See Figure 2 in the paper that introduced the doc2vec algorithm: here.
In essence, you get word vectors AND document vectors when running doc2vec, but of course, it can take longer than just running word2vec. And in theory, the word vectors should be the same between word2vec and doc2vec (or rather, hold the same information, since they get randomly initialized and no two runs will ever be the same).