Highest Voted 'doc2vec' Questions - Statistical Analysis Stack Exchange

6

votes

0 answers

Understanding Object2Vec

AWS released an interesting feature as part of the SageMaker service called Object2Vec that lets you make an embedding for search out of pretty much anything: documents, users, products, recommendations, time series data, DNA, etc. The official…

asked Jan 29 '20 at 18:21

Ryan Zotti

5,927
6
29
33

5

votes

1 answer

Why have a tanh layer, max pooling layer and then another tanh layer

I have been reading a Facebook paper, read here, and am confused about certain features of the architecture. I do not understand why they have a tanh layer, max-pooling layer, and then another tanh layer. I understand what each layer does, but I…

machine-learning neural-networks word2vec word-embeddings doc2vec

asked Jun 21 '17 at 14:42

Callum Kift

51
1

4

votes

1 answer

Word2Vec vs. Doc2Vec Word Vectors

I am doing some analysis on document similarity and was also interested in word similarity. I know that doc2vec inherits from word2vec and by default trains using word vectors which we can access. My question is: Should we expect these word vectors…

natural-language word2vec doc2vec

asked Jan 20 '21 at 13:51

Tylerr

1,225
5
16

4

votes

1 answer

How to train sentence/paragraph/document embeddings?

I'm well aware of word embeddings (word2vec or Glove) and I know of four papers treating the subject of more general embeddings : Distributed Representations of Sentences and Documents - Quoc V. Le, Tomas Mikolov …

machine-learning natural-language word2vec word-embeddings doc2vec

asked Jun 21 '17 at 16:25

Pierre L.

100
1
9

4

votes

2 answers

Doc2Vec for large documents

I have about 7000000 patents that I would like to do find the document similarity of. Obviously with a sample set that big it will take a long time to run. I am just taking a small sample of about 5600 patent documents and I am preparing to use…

machine-learning python natural-language doc2vec

asked Aug 01 '16 at 14:54

www3

601
8
16

2

votes

1 answer

Generating Sentence Vectors from Word2Vec

I know that I can use doc2Wec and other resources to get sentence vectors. But I am very curious to generate sentence vectors using Word2Vec. I read lot of materials and found that Averaging the embeddings is the baseline architecture but it is not…

natural-language word-embeddings word2vec doc2vec

asked May 05 '20 at 23:48

chaitanya

31
4

2

votes

1 answer

Pre-processing: lemmatizing and stemming make a better doc2vec?

I have a project which I will turn documents of a corpus into doc2vec. I was reading that when people convert a document to bag of words they try to improve the bag of words by removing stopwords, lemmatizing, and stemming. I was going to do this…

natural-language doc2vec

asked Oct 29 '18 at 08:52

zipline86

235
2
11

2

votes

1 answer

how to improve doc2vec model

I would like to do some sentence embedding on around 500 sentences. The purpose is to find for new sentences, the most similar ones within the 500 sentences. Unfortunately, for now its definitely not working. Indeed, to test my model I simply looked…

python model doc2vec

asked May 22 '18 at 13:10

miki

212
2
10

1

vote

0 answers

Using doc2vec embeddings as model input our perhaps similarity comparison?

Doc2vec is an extension of word2vec, which creates vector representations of documents. One can use this representations as input to some classifier/regression(Logistic Regression, XGboost, LightGBM ...). What about using the similarity as a…

natural-language word2vec doc2vec

asked Jan 03 '22 at 18:41

Borut Flis

221
1
8

1

vote

0 answers

NLP for customer reviews and summaries

I'm trying to develop a model in R that will compare a customer review with a summary of that review that is completed by an employee. The purpose is to ensure that the employee is accurately tagging and summarizing the customer review. In more…

machine-learning natural-language word2vec doc2vec

asked Sep 16 '20 at 20:10

pr478

11
1

1

vote

1 answer

How should I formalize Doc2Vec Matrix Dimension?

Below, I have a simple diagram explaining the matrix dimension of word2vec. My goal is to expand this graph to incorporate document vectors for doc2vec. However, I'm having trouble understanding the original paper, specifically about how to…

neural-networks matrix word2vec doc2vec

asked Oct 17 '17 at 23:49

alpaca

163
1
6

0

votes

1 answer

Normalizing Topic Vectors in Top2vec

I am trying to understand how Top2Vec works. I have some questions about the code that I could not find an answer for in the paper. A summary of what the algorithm does is that it: embeds words and vectors in the same semantic space and normalizes…

python natural-language topic-models dbscan doc2vec

asked Feb 16 '22 at 13:56

Ahmed Elashry

146
5

0

votes

0 answers

document subsimilarity matching

I'm looking to classify subsections of "full" documents based on their similarity to a set of subsections that have been manually curated and assigned labels (let's call these short documents). There are about 50 categories with 5-10 short documents…

natural-language text-mining tf-idf doc2vec

asked Jul 17 '21 at 23:44

COM

101
2

0

votes

1 answer

Doc2vec Corpus Size Recommendation

I'm trying to make a semantic search engine with Doc2Vec where you query the model a document and it returns N most similar documents from its training corpus. I'm having trouble pushing accuracy past 60% when the model is given a document it's…

natural-language rule-of-thumb doc2vec

asked Mar 11 '21 at 15:39

fpt

123
3

0

votes

0 answers

What to make of high R-squared and non-significant p-value of a linear model?

I am using doc2vec to produce $\mathbb{R}^{50}$ vector representations of short bits of text. I am then using those vectors in a linear model to predict a continuous outcome variable. The R^2 is .25 which I believe is good considering what I am…

regression r-squared word-embeddings doc2vec

asked Jan 18 '20 at 21:35

Ashish

296
1
3
12

Questions tagged [doc2vec]