0

My DataFrame consists of 2919 rows.

Now, for example I have this column "2ndFlrSF"

2ndFlrSF: Second floor's Area in square feet

and these are the values in it after I run my Pandas command

conc1['2ndFlrSF'].value_counts()

where conc1 is my DataFrame

Output:

0       1668
546       23
728       18
504       17
672       13
600       13
720       13
896       11
886       10
756        9
780        9
862        8
601        7
702        7
840        7
754        6
462        6
676        6
744        6
804        6
630        6
878        6
739        6
567        6
689        6
858        5
741        5
704        5
684        5
678        5
        ... 
605        1
591        1
1150       1
1152       1
1158       1
1160       1
1074       1
1072       1
1066       1
1060       1
956        1
966        1
679        1
980        1
673        1
990        1
992        1
994        1
998        1
1000       1
1004       1
1008       1
661        1
1028       1
659        1
1036       1
1038       1
1042       1
1048       1
1721       1
Name: 2ndFlrSF, Length: 635, dtype: int64

As you can see it's mostly filled with 0's as values which is irrelevant. I have many more such columns. What should I do with such columns? And how should I impute NaN values in them accordingly?

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
  • It seems that the values of 0 in `2ndFlrSF` could come from buildings with only 1 floor, so that the value is necessarily 0. If so, [this answer](https://stats.stackexchange.com/a/6565/28500) should cover your situation, also. If you still have an outstanding question about your particular situation, please edit your question to specify what is still at issue. – EdM Jul 22 '18 at 22:05
  • Possible duplicate of [80% of missing data in a single variable](https://stats.stackexchange.com/questions/6563/80-of-missing-data-in-a-single-variable) – EdM Jul 22 '18 at 22:06

0 Answers0