How do I analyze the effects of pandemic on tone of online comments?

Question

I am looking at analyzing the effects of Covid pandemic on online communications. I am hypothesizing that user comments on discussion forums are significantly more negative in tone during the Covid pandemic period (i.e., Mar 01 2020 -Apr 15, 2020) than before (Jan 2019- Feb 2020). I am new to time series analysis and don't know how to approach this question.

I have text data for the period from January 2018 to April 15, 2020. Each record in my data has the following fields:

User_ID;
Comment_NegativeTone (number of words in the comment that belong to negative tone dictionary);
Comment_Timestamp;
Comment_InResponseTo_UserID (null in case it is a new comment);
DiscussionForum_ID;
DiscussionForum_NumberofRegisteredMembers;
DiscussionForum_AverageCommentsPerDay;
DiscussionForum_AveragePageViewsPerDay

How do I test the hypothesis while controlling for the fact that comments are nested in users, parent comments, and discussion forums? How do I control for discussion forum level variables (i.e., number of registered members, average comments per day, and average page views per day)?

I was thinking of the following mixed effect, negative binomial model in R:

glmmTMB(Comment_NegativeTone ~ 
CovidTimePeriod + 
(1|User_ID) + 
(1|Comment_InResponseTo_UserID) + 
(1|DiscussionForum_ID) + 
DiscussionForum_NumberofRegisteredMembers+ DiscussionForum_AverageCommentsPerDay+
DiscussionForum_AveragePageViewsPerDay + offset(log(wordCountInComment)), 
data,  family=nbinom2)

where CovidTimePeriod=1 for Comment_Timestamp between Mar-Apr, 2020 and 0 for earlier.

I am not sure if this model is the right way to do such an analysis of timeframe differences. How do I address potential concerns that in the non-Covid period too there might have been an increase/decrease in negative tone and thus any observed increase in the CovidTimePeriod is not meaningful? Should I be using alternative models to test such a time-based hypothesis (e.g., latent change score model)?

Thank you!

I think it's always good to start with a plot. If you can visualize what's happening in a plot, the test result may not even be that important. — Tim Mak, May 04 '20 at 04:06
With your research question -- contrasting the two periods -- does it even matter what trend may have occurred within either period? I think you can thus simplify your model. — rolando2, May 08 '20 at 17:26

How do I analyze the effects of pandemic on tone of online comments?

0 Answers0