10

I've got R running on amazon EC2, using a modified version of the bioconductor AMI. Currently, I am using putty to ssh into my server, starting R from the command line, and then copying and pasting my script from notepad++ into my putty session.

The thing is, I hate cut and pasting. It feels stone-age and I occasionally get weird buffering issues that screw up my code. I can't use RStudio, because it doesn't support multicore, which I heavily depend on.

What's the more elegant way to do this?

/Edit: Thanks for all the great suggestions. For now, I've switched over to using foreach with the doRedis backend, which works great on my Mac, my PC, and on amazon through RStudio. This switch was pretty easy once I learned how to write a function that emulates "lapply" using "foreach." (Also, doRedis is awesome!)

Zach
  • 22,308
  • 18
  • 114
  • 158
  • 2
    BTW Such question would be very on-topic on yet to be created [Computational Science SE](http://area51.stackexchange.com/proposals/28815/computational-science?referrer=4pEy7Pj-D8kbaDTh4NmFiQ2). –  Jun 02 '11 at 09:53
  • http://www.r-statistics.com/2013/07/analyzing-your-data-on-the-aws-cloud-with-r/ – JohnRos Oct 15 '15 at 13:40
  • Imo, [screen](https://www.gnu.org/software/screen/) + [vim](http://www.vim.org/) is the winning combo, but that might not appeal to everyone. – Marc Claesen Oct 15 '15 at 13:56

6 Answers6

13

The most convenient way is just to install VNC server and some light environment like XFCE and make yourself a virtual session that you can use from wherever you want (it persists disconnects), i.e. something like this: enter image description here

Additional goodies are that you can use your local clipboard in the virtual desktop and see R plots way faster than via X11 forwarding or copying image files.

It takes some effort to setup everything right (X init, ssh tunnel), but the internet is full of tutorials how to that.

  • 1
    Even better is NX, which has generally much improved performance characteristics. – scw Jun 02 '11 at 10:42
  • Could you like to a tutorial you like, or should I just look at the the top couple results on google? – Zach Jun 09 '11 at 16:58
12

I can think of a few ways. I've done this quite a bit and here are the ways I found most useful:

  1. Emacs Daemon mode. ssh into the EC2 instance with the -X switch so it forwards X windows back to your remove machine. Using daemon mode will ensure that you don't lose state if your connection times out or drops
  2. Instead of using the multicore package, use a different parallel backend with the foreach package. That way you can use RStudio, which is fantastic. Foreach is great because you can test your code in non-parallel, then switch to parallel mode by simply changing your backend (1 or 2 lines of code). I recommend the doRedis backend. You're in the cloud, might as well fire up multiple machines!
JD Long
  • 5,669
  • 1
  • 16
  • 18
  • +1 for Emacs Daemon mode. I've been doing quite a bit of ssh lately, and this looks really useful. – richiemorrisroe Jun 01 '11 at 22:55
  • Is there an easy way to emulate lapply using foreach? I've written a lot of code that leverages apply, and I like the multicore package because I can simple replace lapply with mclapply. Is there a 'foreachlapply,' or am I going to have to re-write a lot of code? Thanks! – Zach Jun 02 '11 at 00:43
  • 1
    Well, `doRedis` can only do redis stuff; huge input is not the only reason for HPC calculations. –  Jun 02 '11 at 08:31
3

I don't know how Amazon EC2 works, so maybe my simple solutions don't work. But I normally use scp or sftp (through WinSCP if I'm on Windows) or git.

Thomas Levine
  • 3,001
  • 1
  • 16
  • 16
3

I'd use rsync to push the scripts and data files to the server, then "nohup Rscript myscript.R > output.out &" to run things and when finished, rsync to pull the results.

Martin
  • 31
  • 1
  • 2
    `screen` or `tmux` are better than `nohup` -- they also detach the script so it won't be killed by logout, but allow to reattach the session and start over, even from other client computer. `tmux` can be even used as a kind of text-mode window manager. –  Jun 02 '11 at 08:28
0

I use R Studio on EC2 all the time thanks to the AMIs created by Louis Aslett. You don't have to know any SSH or anything (other than R, of course). You just need an EC2 account. As mentioned in one of the other answers, R Studio does support parallel computing, via the foreach package for instance. This really enables harnessing the power of EC2. By using a compute-optimized instance (32 cores), I was able to significantly cut down training time for my ML models at almost no cost (a few bucks an hour).

Antoine
  • 5,740
  • 7
  • 29
  • 53
0

VIM + tmux + VIM Slime. You get the greatest text editor and ability to send code from editor to R command line (just like in Rstudio).

bdeonovic
  • 8,507
  • 1
  • 24
  • 49