3

I am looking at analyzing the effects of Covid pandemic on online communications. I am hypothesizing that user comments on discussion forums are significantly more negative in tone during the Covid pandemic period (i.e., Mar 01 2020 -Apr 15, 2020) than before (Jan 2019- Feb 2020). I am new to time series analysis and don't know how to approach this question.

I have text data for the period from January 2018 to April 15, 2020. Each record in my data has the following fields:

  1. User_ID;
  2. Comment_NegativeTone (number of words in the comment that belong to negative tone dictionary);
  3. Comment_Timestamp;
  4. Comment_InResponseTo_UserID (null in case it is a new comment);
  5. DiscussionForum_ID;
  6. DiscussionForum_NumberofRegisteredMembers;
  7. DiscussionForum_AverageCommentsPerDay;
  8. DiscussionForum_AveragePageViewsPerDay

How do I test the hypothesis while controlling for the fact that comments are nested in users, parent comments, and discussion forums? How do I control for discussion forum level variables (i.e., number of registered members, average comments per day, and average page views per day)?

I was thinking of the following mixed effect, negative binomial model in R:

glmmTMB(Comment_NegativeTone ~ 
CovidTimePeriod + 
(1|User_ID) + 
(1|Comment_InResponseTo_UserID) + 
(1|DiscussionForum_ID) + 
DiscussionForum_NumberofRegisteredMembers+ DiscussionForum_AverageCommentsPerDay+
DiscussionForum_AveragePageViewsPerDay + offset(log(wordCountInComment)), 
data,  family=nbinom2)

where CovidTimePeriod=1 for Comment_Timestamp between Mar-Apr, 2020 and 0 for earlier.

I am not sure if this model is the right way to do such an analysis of timeframe differences. How do I address potential concerns that in the non-Covid period too there might have been an increase/decrease in negative tone and thus any observed increase in the CovidTimePeriod is not meaningful? Should I be using alternative models to test such a time-based hypothesis (e.g., latent change score model)?

Thank you!

carlo
  • 4,243
  • 1
  • 11
  • 26
SanMelkote
  • 621
  • 5
  • 20
  • 3
    I think it's always good to start with a plot. If you can visualize what's happening in a plot, the test result may not even be that important. – Tim Mak May 04 '20 at 04:06
  • 1
    With your research question -- contrasting the two periods -- does it even matter what trend may have occurred within either period? I think you can thus simplify your model. – rolando2 May 08 '20 at 17:26

0 Answers0