Data structure issue with plm: multiple observations with the same year-country pair

Question

I am trying to run fixed effects and random effects regressions on data which is not pure panel data but rather independent cross-sections over several years. In addition, the yearly cross-sections are of different sizes. The dataset consists of rows of syndicated loans with loan-level variables and borrower country-level variables. I'd like to run country fixed and random effects models while also controlling for year and industry fixed effects. In the end I aim to compare their results together and with results from a pooled OLS model.

The question is how can I apply pdata.frame() so that I can regress using plm() when my data has multiple observations (loans) per country-year pair? If I try to use year and country I get the following error:

Warning in pdata.frame(df, index = c("borrower_country", "year")) :
duplicate couples (id-time) in resulting pdata.frame
to find out which, use, e.g., table(index(your_pdataframe), useNA = "ifany")

I know that it is caused by duplicate id-time pairs, but since it's a central feature of the data I don't know how to fix the issue. I know that panel data methods have been used in studies with a similar data structure, but I don't understand how to apply the methods to my data. I've tried to find an answer but it seems as if there is a gap in this area. I've searched online, stackexchange, stackoverflow, textbooks I've listed below etc. but I can't seem to find a solution.

Any help would be much appreciated.

Woolridge (2012). Introductory Econometrics: A Modern Approach.
Woolridge (2010). Econometric Analysis of Cross Section and Panel Data.
Baltagi (2005). Econometric Analysis of Panel Data.
Tsionas (2019). Data Econometrics Empirical Applications.

PS. Is this an issue that would be easier to solve with Stata? I'm used to working with R but I am ready to try to do it with Stata if it would help. Although, based on previous questions I've read I suspect I'd face the same issue there.

Try to use lm4 like in this answer https://stackoverflow.com/a/49168979/2824732 — Robert, Sep 07 '21 at 23:19
I'm not sure if that is directly applicable to my issue since it uses maximum likelihood and not GLS, [link](https://stats.stackexchange.com/questions/446361/plm-in-fixed-effects-model-doesnt-work-with-id-and-time). I'm not experienced with ML methods and studies on similar datasets have used GLS so ideally I'd like to do the same. Multiple studies say they use country random/fixed effects in their models but don't expand on the topic, e.g. "Creditor Rights, Enforcement, and Bank Loans" by Bae & Goyal (2009). It makes my issue seem trivial but I can't find information on how to solve it. — Moz, Sep 08 '21 at 08:51
(extended) crosspost https://stackoverflow.com/questions/69087503/data-structure-issue-with-plm-multiple-observations-with-the-same-year-country — Helix123, Sep 09 '21 at 07:14

Data structure issue with plm: multiple observations with the same year-country pair

0 Answers0