Deploying web applications on the cloud with github and heroku

 

Introduction and Background

Do today’s web applications truly exploit the facilities provided by the cloud?  Not many do. Even though increasingly, web applications are being deployed on the cloud, they do not exploit the full power of the cloud. Most of them use traditonal (in premise) apis and infrastructure for development as well as deployment.

Meanwhile, the world has moved on, and exciting new cloud based apis and deployment options are becoming available. Heroku is a cloud deployment platform built on top of Amazon Web services. It enforces standard development practices for developing and deploying applications on the cloud. Many projects use cloud based APIs to develop, and Heroku for deployment .


Experienced programmers who have not used Heroku and Github may require a quick background to the architecture and interplay between these systems. This document attempts to provide that.

Structure of this page

Git is a distributed version control system (DVCS). It is different from earlier Version control systems in the sense that it can work in a disconnected mode and the entire repository is stored in the client machine.

This section takes material from http://git-scm.com/book, with references to corresponding chapters. Where some point is not completely clear, reader can check the corresponding chapter in the book.


Structure of Git  and Differences with conventional version control systems

Entire repository is copied

Conventional systems store all the different source code versions and metadata on the server (called the repository). Git stores the entire repository on each client as shown in the diagram below.


distributed repository                                              

Because of this, clients do not checkout code from the server they clone code from the server. When a repository is cloned on the local machine

  1. A directory with name of the repository is created
  2. A hidden subdirectory called .git is created , the entire repository with version differences is stored here
  3. A checkout is performed into the top level directory and the latest version of the source code is available here.

Versions are stored as snapshots and not as deltas

As the following diagrams illustrate, versions are stored as snapshots in Git, not as deltas

versions in cvs                                                    
Versions in CVS
                                                      

git deltas

Versions in Git


As can be seen from the pictures above, whenever a new version is created in a conventional version control system the delta, or difference with previous version is stored. With Git, an entirely new directory structure is created for each version, all unchanged files are stored as links to the previous versions, while for changed files a copy is made.

Checkins are local

In Git all checkins done via git commit are local and do not change the remote server. Since all checkins, checkouts, modifications and commits are local, the central Git server is usually termed a remote

States of git files

                                


A file is in state (untracked) when it is first locally created. After adding the file to the local git repository using the command git add, the file goes to unmodified state. If changes are made to the file, it goes to modified state. Using the git command git add <filename>, moves it into a special state called staged. On committing the file using the git commit command, the file goes into unmodified state and changes become part of the local repository.


All commits are local. For changing the common remote server, git remote push command is required. To pull latest versions from remote server, git pull command is required.

The standard practice for using git is

  1. Developer first initializes the repository, adds files and pushes to git remote server(usually github.com)
  2. Other developers create a clone of the remote repository using the clone command
  3. Developers make changes
  4. When a developer has finished with changes, the code needs to be pushed to the remote server. For this
    1. Developer first commits changes locally
    2. Pulls latest changes from remote server using git pull (not clone), in case other developers have made changes to the same files. In cases of conflict, developer resolves them.
    3. Developer pushes the changes to the remote server.

Git Server

Git can be installed as a server, which can be used by multiple developers to collaborate. Github.com is the most popular git server, but Git can be installed as a server on any machine. When directories are pushed to Git server, only the .git directory is stored, the current working copy is not stored.

Git Server Hooks

Git server provides a facility to run server side scripts when a push is initiated or after a push is complete. This is very important in the context of heroku because heroku uses this feature to run deployment scripts.

 

Working with Git – common commands

  1. Cloning a repository
  2. Making changes
  3. Updating server with changes
  4. Reverting
  5. Branching
  6. Merging

Heroku Essentials

This section contains excerpts from the book “Hacker’s guide to Heroku”, an excellent guide to Heroku. Please refer this book for more detailed information on Heroku.


Heroku is a cloud based deployment system which abstracts best practices for deploying SAAS applications. It runs on top of Amazon Web Services and provides facilities to easily deploy  and massively scale applications.

Heroku components

The following diagram illustrates Heroku components.



Users can create accounts in Heroku. Once an account is created, they can create applications on Heroku. Whenever an application is created, a corresponding git repository is automatically created. This repository is created within a Git Server running on the heroku system.


Whenever an application is created, a virtual server running ubuntu (512 MB RAM, 4 CPU Cores) is created along with it. This virtual server is termed a Dyno (short for Dynosaur). Heroku provides the facility to create more Dynos on demand. Heroku takes care of load balancing and directing requests to corresponding dynos.
Whenever an application is deployed to Heroku, Heroku runs a deployment script called the Slug compiler, and creates a binary version of the deployment – this is termed a Slug.  


Whenever a new Dyno is created/provisioned, Heroku automatically deploys the slug on the Dyno and makes it ready to run.


Heroku provides a set of infrastructure components – The database component creates instances of database servers, Cloudmail provides the facility to receive emails using specified email addresses and forward them to the application, whereas Memcache provides a cache facility.  Heroku provides the facility to attach infrastructure components to applications.

Interacting with Heroku

Users would need to interact with Heroku to

  1. Configure application
  2. Attach infrastructure components to application
  3. Push and pull source code from/to Heroku
  4. Deploy application to Heroku
  5. Configure runtime Dynos
  6. Dial into a runtime Dyno to check health, logs etc

Heroku provides a set of client tools called the heroku toolbelt for interacting with heroku. When the heroku toolbelt is created, the heroku command is available.
The following are the most important commands

  1. heroku login – to login to heroku. A heroku account is required
  2. heroku accounts – to switch between accounts. Plugin heroku:accounts needs to be installed
  3. heroku apps – app related commands. Most important app commands are create and destroy
  4. heroku config – to set environment variables in the Dynos
  5. heroku run bash  - this is to open a shell terminal to the virtual server (Dyno). A file system consisting only of the application files are displayed, and many shell commands can be executed here
  6. heroku run console – A rails console is opened and debugging can be done from this console

Heroku Git Interplay

How are applications pushed to the git repository within Heroku?  After pushing an app to Heroku, how does it get automagically deployed?


Whenever an application is created in Heroku, it automatically creates a corresponding git repository for it, and the url to that repository is returned. After the developer creates the first version of the code in the local git repository, this application can be pushed to the heroku git repository using a git push heroku command.

Heroku intercepts a git push by using the Githooks mechanism, by hooking up a language specific script called a Buildpack to Git. As soon as push completes, the buildpack is run. It essentially

  1. runs a language specific slug compiler and creats a slug (or binary)
  2. deploys it in all the Dynos (virtual machines) configured for the application
  3. Deployment can be customized by providing a Proc file in the root directory of the application. An example Procfile could contain the line web: bundle exec unicorn -p $PORT -c ./config/unicorn.rb -E $RACK_ENV

Deployment Steps

First Time

The following steps (and commands) need to be executed for the first time. A heroku account needs to be created by visiting heroku.com and the heroku client toolbelt installed.

  1. Clone existing code from github.com
  2. Login to heroku
  3. Create an application
  4. Push code to heroku
  5. Set configuration variables
    1. Environment – staging or production
    2. Gmail credentials
  6. Import data into the application database
  7. Change some parameters in configuration files for css (once system is finetuned, this should go away)

System is ready to go!

Subsequent times

  1. Pull code from heroku
  2. Make changes
  3. Push code to heroku
  4. Import data into the application database (if data has changed)

 

References

Git book - http://git-scm.com/book
Heroku book - http://www.theherokuhackersguide.com/
Article on Heroku push deployment - http://www.jamesward.com/2012/07/18/the-magic-behind-herokus-git-push-deployment
Buildpacks - https://devcenter.heroku.com/articles/buildpacks