I'm doing a time series analysis. I'm doing most of my analysis in R, where I can use "NA" to represent "not available" (e.g. a missing data point). But I'm doing some data preparation in OpenOffice; currently, I'm leaving cells blank for missing data. Is there a better way to "declare" that a cell is NA?
-
3NA is not the same thing as NaN. See http://stats.stackexchange.com/questions/5686/what-is-the-difference-between-nan-and-na. – onestop Jan 13 '11 at 16:56
-
Why are you doing "data preparation" in openoffice? Do you really have to? I try and load things into R as soon as possible and make all changes within R itself. – csgillespie Jan 13 '11 at 17:44
-
4If you save a file as csv and import it to R, blank cells will be represented as NA. – mpiktas Jan 13 '11 at 18:20
-
@csgillespie I find it convenient to edit some data in a spreadsheet UI as opposed to a text editor UI. – David J. Jan 13 '11 at 18:24
-
@onestop Thanks for pointing out that NA is different from NaN. I edited the question accordingly. – David J. Jan 13 '11 at 18:24
-
@mpiktas Thanks, I think your response is useful as an "answer" (not just a comment) -- want to submit it? – David J. Jan 13 '11 at 18:26
-
@David, ok, I've posted it as an answer and included additional comments. Hope it helps. – mpiktas Jan 13 '11 at 19:39
2 Answers
This won't be specific to OpenOffice, but I have found the ways different spreadsheet software and more traditional stat packages (such as R or SPSS) handle missing data are not always intuitive or even uniform within the software. Anytime I have missing data I typically check certain functions with toy data to see how they are handled.
In Excel IMO it is a waste of time to make explicit missing data declarations as it makes it more difficult to use the built in functions (especially if you use integers to represent missing data). Hence if you asked for Excel I would just say leave the cells blank (although I can't say for sure if this translates directly to OpenOffice).
As always, you will probably be well served to learn to do the data manipulation in R that you are currently doing in spreadsheet software.

- 15,245
- 8
- 69
- 191
If you want to import the data to R leave the cells blank. If file is saved as csv and imported to R, blank cells will be represented as NA automatically.
If you want to do some analysis in OpenOffice, I think you will find that @Andy W advice useful. Built-in OpenOffice functions may behave weirdly if you use some custom NA declaration.
Finally as @cgillespie pointed out it is better to do all data preparation in R. Even if it is slightly harder. The foremost reason for that is that this way you will be able to track changes to original data, which is highly desirable for debugging purposes. Furthermore if you work out the preparation with care it will be very easy to include new data. For this reason alone I now always do my data preparation only in R. I advise to look into package reshape, it saves a lot of time for me.

- 33,140
- 5
- 82
- 138
-
Thank you so much for the advice about R import of csv data with blank cells. I didn't know that, and have been spending a lot of time putting NA's into my spreadsheets. – richiemorrisroe Jan 15 '11 at 14:06
-
@richiemorrisroe, you're welcome. I usually find if something would be convenient, R implements it this way :) – mpiktas Jan 15 '11 at 16:19