Software consulting – Why project managers and product managers are so important

The company that I work for has been hiring engineers and has been pretty aggressive at it. We’ve spared no costs and efforts in finding talented and interested people. Unlike a lot of companies we value enthusiasm and attitude over sheer brilliance. The ‘why’ (do we prefer enthusiasm over attitude) has been covered several times and my reasons align with most of such articles and would be tangential to the point of my post.

I had been oblivious to the stigma that is associated with software consulting firms, especially with the sudden boom in product startups in all major cities. During the course of one of the interviews I was told that an engineer we were interested in speaking with wasn’t interested in working with a services company because he didn’t want to be tied down, checking off tickets assigned to him. While this isn’t uncommon at consulting firms the way he put it brought back a lot of repressed memories of my time at one of my first jobs at an MNC.

This also brought me to a more recent turn of events where I chose to give up on an assignment because it stopped feeling right. I had recently stepped off one my longest engagements (and one of my personal favourites) to take up a new project which after 2 months still seemed odd. It felt a lot like my first job. In fact my first job had projects which kept me going and interested, but this had brought me to a point where it seemed wiser to step down.

I feel it all boils down to the product/project managers. 

As a project/product manager its vital to embrace your teammates/engineers in the vision of your product/project. Its important to ensure there is no disconnect between what you envision as the goal for your product and what your teammates ideas are.

So what made one project a personal favourite while I couldn’t wait to get off the other?

The better product manager ensured that I was always involved in the day-to-days of the project. Things like keeping me involved in customer feedbacks, issues the accounting team faced and what the company aimed to accomplish gave me a sense of purpose. It defined my role in the success of the company and made my work seem purposeful. Every commit/fix seemed to validate my contribution to the success of the business.

The other, just ensured that I knew about the issues that needed completion and how the module had to work. This was very literally checking off tickets. I had very little vision of how it fit into the grander scheme of things. There always seemed a level of disconnect that I couldn’t seem to get around. I wasn’t aware of what the project was planning next and why it was doing that. So, my interests were solely aligned in getting my features and tests working. I was rarely aware of when my features were merged and was generally notified when there was an issue.

While some may argue that its impractical to have everyone involved in the big picture. I feel its up to the people managers in each team to ensure they carefully abstract the information to the right level to keep everyone feeling like their contributions are purposeful.

So what if the information was already abstracted to the level that I needed to know?

This is where great product managers shine through. If you’ve made it this far its worth getting into this one.

As part of my first job (at the MNC) I was part of a 4 person team that was responsible for automating all and any kind of tests that needed to be run in the company’s QA boxes. At this point, I was a clueless, fresh grad who wasn’t sure how I could be of any use to this project.

My project manager (one of the most experienced engineers at the company) told me that I had to look into a set of tools that would convert any kind of user activity on any web page into a set of reproducible scripts. He then went on to let the other guys know what they would be working on. It seemed like a fairly straightforward task given that the tool did most of the stuff and I had to pretty much just follow the documentation to get things working.

At the end of the day when we all had dug into the nuggets of information that he had distributed, he came in to tell us how each of the modules that the 4 of us had been working on, would interact with each other, to ensure that all the tests were run and reports generated. There have been very few moments in my life where I was hit with such clarity of thought. The beauty of it all was that If he shared this bit of information at the onset it would’ve seemed a lot more daunting. This way he broadened the scope of our work and vision at the right time.

So, my point is that being a project/product manager is hard. Being able to understand the motivations and passions of each of your team mate is tough but important. If the vision and the goals of the uppers management is not conveyed correctly down the chain your pretty much running on zombie legs.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Mongodb – Understanding how journal mode works with j:true

One of things that has recently confused me with Mongodb while using Mongoid with the Moped driver is the use of the journal mode and the j:1 write concern.

Journal Mode

The journal mode ensures that your writes are written to a redo/write ahead log after its written to the memory but before its pushed to disk. It provides a level of durability where if the server were to crash before certain data is written to disk, on restart this data is read from the journal logs and written to disk. Thus making your data more resilient.

Journaling is enabled on all 64 bit installations since 2.0 as a default but not on the 32 bit ones.

What kind of stuff does journaling record

Journaling takes care of

  • Create/update commands
  • Index creation
  • namespace changes.

Journal Commit Interval

The journal commit interval (journalCommitInterval) is the time interval between subsequent writes to the journal files and is set 100 ms as a default. This is true if your journal files and data file are on the same server. Its 30 ms if they aren’t on the same machine.

This means you can lose data that has not yet been journaled in the last 100ms. You can update the journalCommitInternval setting in your mongodb.conf file to anywhere between 2ms to 300ms. A lower value will increase the frequency with which data is journaled but at the cost of disk performance.

To force mongodb to commit sooner to the journal without updating the journalCommitInterval setting one could use the j:1 write concern. As per the docs

When a write operation with j:true is pending, mongod will reduce journalCommitInterval to a third of the set value.

So a low (whatever works best for you) journalCommitInterval with saves called with j:1 would ensure that the system remains consistent.

To verify that your data was written to the journal you could call the

Most drivers run this for you so this isn’t something that you need to worry about.

Mongoid

The mongoid documentation for versions 3.0 and above does not talk about the j:1 write concern for some reason but here is what i found based on this discussion.

This is equivalent to calling it with

This implies that the write will be acknowledged once the data is written to the primary’s memory but not to disk. If you want to trigger a journal write as well then you would have to do it with

Most of the data was collected and posted as an answer on stackoverflow by mnemosyn and is available on the mongodb docs. My goal was to create a quick reference.

References:

  1. http://docs.mongodb.org/manual/reference/write-concern/
  2. http://docs.mongodb.org/manual/core/journaling/
  3. http://docs.mongodb.org/manual/tutorial/manage-journaling/
  4. http://docs.mongodb.org/manual/reference/configuration-options/#journalCommitInterval
  5. http://docs.mongodb.org/manual/core/journaling/
  6. http://docs.mongodb.org/manual/core/write-concern/#write-concern
  7. http://rsmith.co/2012/11/05/mongodb-gotchas-and-how-to-avoid-them/
  8. https://groups.google.com/forum/#!msg/mongoid/Bybccfc0gLI/Aei0VkMAZqoJ

Mongodb – not evil, just misunderstood

Lately I’ve been reading a lot about Mongodb and posts dissuading you from ever using it. Some of these articles are seriously outrageous and make me wonder what got the team to actually start using Mongodb in the first place. Sarah Mei’s recent article was one such that upset me a lot, especially since the title was so inflammatory.

My post however, aims at highlighting the areas where Mongodb works and how it performed brilliantly for us. As someone leading the engineering efforts for a shipping and logistics company I wasn’t too happy initially to see Mongodb being used as the primary datastore but after 2 years I’m more than sure that this was definitely the datastore for us. I’ve outlined areas that confused me when i first encountered them only to learn that they were actually invaluable features that were available to me.

“No migrations” is that all you have?
The advantages of schemaless documents are priceless. Not having to migrate is just one of the perks. Our schemas were largely in the form of Orders (having many) Shipments (going_from) ShipPoint (to) ShipPoint

We rarely used most of these entities without the other and it just served us extremely well to manage them as self contained documents embedding the other.

Mongodb writes are fire and forget? WTF?
This doesn’t always have to be the case, though it significantly contributes to Mongodb’s fast writes. Mongodb’s write concerns configurations allow you to configure the precise level of persistence that needs to be achieved to call it a successful write. So if the write fails you know its failed. The fact that you could know if your write has migrated to replicas or has been journaled is a pretty neat feature.

How can the default writes be fire and forget?
(Version – 2.4.8 changes this however this is valid till version 2.2.6)
It just made sense, given all the information to configure it the way you prefer I would always go with this approach. We add a lot of Notes to each shipment as it gets reviewed at different levels by the sales, accounts and other teams. These notes generally serve as a reminder or a single line indicating that its been viewed – though it doesn’t critically affect the business workflows of the application. Its just seemed logical that these were fire and forget operations and could be stored as quickly as possible.

Another place where this is extremely handy for us is during Tracking. We track several hundreds of shipments each day logging every tracking status, location and time the shipment has been, while in transit. This information is handy for customers to keep an eye on where their shipment has reached. Chances are when fetching this information some of the information is not saved the first time – but we expect that it would be obtained during a second fetch 30 minutes later. The default write concern works brilliantly then.

Read locks and write locks – don’t they slow you down
They do but since most of the stuff is memory mapped this doesn’t affect you in a major way. However, I did notice people always working the primary of a replicaSet and never querying the secondaries for fear of inconsistent data. I think if you have sufficient memory your replication lag would be pretty small and besides if you don’t need the data to be consistent every instant querying secondary is a sensible option to reduce the load off your primary. Which brings me to the PrimaryPreferred Read preference. This allows you to query a secondary in your replica set when your primary is not available. It’s a fairly safe choice in my opinion.

We began querying secondaries for ShipPoints which didn’t change that often.

All the memory usage is killing me!
This is one of the things I that took me time to accept. Mongodb expects that your working set fits into RAM along with the indexes for your database. Your working set is the data that is frequently queried and updated. Since mongodb works with memory maps most of your working set data is mapped to the memory. When this data is not available in memory a page fault occurs and your data needs to be fetched from disk. This results in a performance penalty but as long as you have some swap space you can safely load the data back in.

While our working set was fairly small our reporting application needed access to the entire shipment records to generate reports. This resulted in Mongo running out of memory and spitting OperationFailure errors on a regular basis.

Our initial approach was naive and we started using Redis(which is another datastore thats pure gold.) to store snapshots of information but soon realised we could just use Mongodb to make it work.

So can I never generate reports without having my dataset fit in memory?
Rollups to the rescue. Rollups are pre-aggregated statistical information that help you speed up your aggregation process. This makes life significantly easier as you query for short time ranges to generate micro-reports.

Here is a simplified snapshot on how we generated daily and monthly aggregates with mapreduce.

So you mean this can’t be realtime?
Yes it can – through atomic updates. Just like we generated rollups to speed up reporting we can generate pre-aggregated snapshots of aggregated information like this.

Once this is in place you can update your aggregates by simply incrementing the right counter with something like

I haven’t even touched upon the replication and sharding features that Mongodb offers which I will reserve for another post. To summarise I feel Mongodb is awesome and is a lot like the kid in class who you dismissed because your friends thought he was weird – till you got to know him.

Disclaimer: I don’t claim to be an authority on Mongodb and everything that I have written about is stuff that I’ve learnt while working with Mongodb. I recommend reading the documentation and going through the talks available on the Mongodb website.

Simple Quick SSH Tunneling to expose your web app

I’ve been a localtunnel user for quite some time now and I really love the fact that its a free service, quick to install and easy to expose your development app to the world. There are quite a few worthy alternatives now with showoff.io and pagekite that pretty much do the same thing. 

But at times it gets annoying (especially when in the middle of other work) that I’m unable to access localtunnel because its down or I outran the free usage limit for Pagekite

I generally end up using localtunnel when I have to wait for an IPN from Paypal or Authorize.net (relay response) while working on my development environment. So here is a quick way to roll out a basic version of the service for your own needs.

Before we move to the “how to” here is a quick intro to ssh tunneling

Now, I assume that you have a staging server (or some server you have ssh access to).

The following terminal command does pretty much the same thing we do with the Ruby code that follows:

The -R indicates that you’ll be forwarding your local app running on port 3000 to the remote port 9999 on the remote host abc.example.com so that everyone can access it. Now your application running on localhost:3000 is accessible at abc.example.com:9999

We do the same thing now using the ruby net-ssh library. The following code snippet is customised to my defaults but its simple enough to change those settings.

Hope this helps

Mongoid – Comparing attributes belonging to the same document

I spent an insane amount of time trying to figure out how to elegantly compare attributes belonging to the same document and stumbled upon a few stackoverflow questions that helped. However the solution still doesn’t appear elegant enough but does the job.

scope :expired, any_of({:completed_date.ne => nil}, 
                       {:expiry_date.lte => Date.today.to_time}, 
                       {:usage_count => {"$gte" => "this.max_usage_count"}})

Certainly not noteworthy but this post is to help me document this bit.

Goal

To build a named scope to allow me to query expired documents. The model has the following fields


	field :code, :type => String
	field :allowed_domain, :type => Integer, :default => DOMAIN_ALL_SITES
	field :description, :type => String
	field :discount_type, :type => Integer, :default => PERCENTAGE_DISCOUNT
	field :dollar_discount_cents, :type => Integer, :default => 0 
	field :percentage_discount, :type => Integer, :default => 0
	field :max_usage_count, :type => Integer, :default => 1
	field :usage_count, :type => Integer, :default => 0
	field :expiry_date, :type => DateTime, :default => (DateTime.now + 1.week)
	field :completed_date, :type => DateTime	

	index :code
	index :expiry_date
	index :allowed_domain

The precondition for an expired record is that its completed_date should be nil , the expiry_date should be less than the present date and usage_count must be less than the max_usage_count

While the above block of code does that, what I hate is the mixing JSON and Ruby. While mongoid for most purposes does a good job in making life easier (as every ORM should), there are occasions where I’d prefer it let me write pure Mongodb queries or use the Ruby wrappers entirely.

Other Resources
http://stackoverflow.com/questions/3795044/mongoid-named-scope-comparing-two-time-fields-in-the-same-document

Further bulletins as events warrant.

My Favorite Ted Talks

I’ve been going through some of my favorite Ted Talks, here’s the list.

1. Sir Ken Robinson on Creativity.

2. Simon Sinek, People don’t buy what you do, they buy why you do it

I love this talk just for that one line which just resonates with everything I believe in.

3. Dan Ariely

Lastly, to my favorite speaker Dan Ariely. I’ve been a big fan of his books and talks. I strongly recommend you read Predictably Irrational. Here are his talks

I can’t believe I missed this the first time around.

4. Derek Sivers – How to start a movement.

Async Responses in Thin and a Simple Long Polling Server

I’ve been playing with Thin, the Ruby web server which packages the awesomeness of Rack, Mongrel and Eventmachine (\m/). However, the thing that blew me away completely was the James Tucker’s Async Rack that was integrated to Thin. The combination opens a whole new world of realtime responses, something that I’ve been constantly switching to NodeJS and SocketIO for.

Async Rack

A simple rack application would look like this


# my_rack_app.ru

class MyRackApp
  def call(env)
    [200, {'Content-Type' => 'text/html'}, ["Hello World"]]
  end
end

run MyRackApp.new

# thin start -R my_rack_app.ru -D -p3000

This simple rack app on hitting port 3000 responds with the a status code (200), the content-type and some body. Nothing special. With async rack we’d be able to send the response head back while we build the response. Once the response is built we send the body and close the connection.

A quick glance at Patrick’s Blog (apologies for not being able to find the last name of the author of this blog) should give you an excellent understanding of what I’m trying to explain

A simple async app


# my_async_app.ru

AsyncResponse = [-1, {}, []].freeze

class MyAsyncApp
  def call(env)
    Thread.new do
      sleep(10)
      body = [200, {"Content-Type" => "text/html"}, ["<em > This is Async </em >"]]
      env['async.callback'].call(body)
    end

    AsyncResponse
  end
end


The AsyncResponse is a constant which returns a -1 status which tells the server that the response will be coming later asynchronously. It, as defined in the examples provided in thin an “Async Template” Async Template

On a request the code initially returns an AsyncResponse while the thread waits on the sleep request. Once the thread is active again it builds the response and sends the response keeping the connection alive.

An async app where we build the response and close the connection

From here on, it would really help to have the thin code open in your text editor. We would be looking at the request.rb, response.rb and the connection.rb methods from /thin folder.


require 'rubygems'

class DeferrableBody
	include EventMachine::Deferrable

	def each &blk
		puts "Blocks: #{blk.inspect}"
		@callback_blk = blk
	end

	def append(data)
		puts " -- appending data --"
		data.each do |data_chunk|
			puts " -- triggering callback --"
			puts @callback_blk.call(data_chunk)
		end
	end	
end

class AsyncLife
	AsyncResponse = [-1, {}, []].freeze
	
	def call(env)
		body = DeferrableBody.new

		EM.next_tick {
			puts "Next tick before async"
			sleep(3)
			puts "#{env['async.callback']}"
			env['async.callback'].call([200, {'Content-Type' => 'text/plain'}, body])
			sleep(5)
		}

		puts "-- activated 5 second timer --"
		EM.add_timer(5) {
			puts "--5 second timer done --"
			body.append(['Mary had a little lamb'])

			puts "-- activated 2 second timer --"
			EM.add_timer(2){
				puts "--7 second timer done --"
				body.append(['This is the house that jack built'])
				puts "-- succeed called -- "
				body.succeed
			}

		}


		AsyncResponse
	end
end

run AsyncLife.new

This example uses Eventmachine Deferrable which allow us to determine when we’re done building our response. But the tricky part I struggled to get my head around was the strangle looking each method and how it uses the append method to build our responses.

Walking through the code

When the request comes in from the browser the application renders the AsyncResponse constant which tells the server that the response body is going to be built over time. Now on request eventmachine’s ( checkout thin/connection.rb ) post_init method creates a @request and @response object. This triggers a post_process method which on the first call returns without setting the header, status or body and prepares for the asynchronous response.

On next_tick we begin to create our header. We initialize a EM::Deferrable object which is assigned as the response body and this ensures the header is sent ahead of the body (because we don’t have anything to iterate over in the each block where the response is sent) The env['async.callback'] is a closure for the method post_process which is created by method(:post_process) checkout the pre_process method in the thin/connection.rb

The each method in our Deferrable class overrides the default each implementation defined by the response object. So now our each block is saved in an instance variable @callback_blk which is call ed when we call the append method. So essentially we are calling send_data on each of the data blocks we’re sending back when we call the append method.

Once that’s done we call succeed which tells Eventmachine to trigger the callback to denote we’re done building the body. It ensures the request response connection is closed.

The default each implementation


 # Send the response      
      @response.each do |chunk|        
        trace { chunk }
        send_data chunk
      end

Thats pretty much what I gathered by going through the code on async response with thin and rack. Another useful module is the thin_async bit written by the creator of thin @macournoyer available here . This pretty much abstracts the trouble of overriding the each block.

Here’s an example of a simple long polling server I built using thin available here

Hope this is helpful

Tagging with Redis

Its been a long time since my last post and this one is about Redis. So I’ve been working on this project for a bit now and I had this story to implement where I had to associate posts with other posts which were identified to be similar based on a tagging system that already existed. The trouble here was that the existing tagging system was closely tied to a lot of other functionalities and couldn’t be easily (quickly) re-written. The feature needed to be rolled out quickly for several reasons which I won’t get into.

The Problem

The tags associated to each posts were poorly organized where one record in the Tag model would hold all the tags associated to the post as a whitespace separated string (ouch!)


posts.tags.first.name # business money finance investment

So to find posts that had 2 tags in common from a database of approximately 2000 posts with at least 4 tags each took a significant amount of time. Here is a basic benchmark of just finding the matches on each request.


Benchmark.measure do
 similar_posts = []
 Post.tagged_posts.each do |post|
   similar_tags = (post.tags.first.name.split & current_post.tags.first.name.split)
   similar_posts << post if similar_tags.size >= 2
  end
end

Here is what benchmark returns


 => #Benchmark::Tms:0x111fd8cb8 @cstime=0.0, @total=2.25, 
@cutime=0.0, @label="", @stime=0.19, @real=4.79089307785034, @utime=2.06

Not Great.

So the next option was to pre-populate the related posts for every post and store it in the database as similar_posts. So all that was required was to fetch the post with its ‘similar_posts’ at runtime. This seemed like an acceptable solution considering the tags were not changed on a regular basis but if changed it would require the similar_posts table to be rebuilt again(which took a long time). Here is the benchmark for fetching the pre-populated data from the database


Benchmark.measure { p.similar_posts }
=> #Benchmark::Tms:0x104d153f8 @cstime=0.0, @total=0.0100000000000007, 
@cutime=0.0, @label="", @stime=0.0, @real=0.0333230495452881, @utime=0.0100000000000007

Nice! But this came at the cost of having to rebuild the similar_videos every time something had to be changed with tags or videos.

Redis

Redis is this awesome in memory key-value, data-structure store which is really fast. Though it would be wrong to think of it as a silver bullet it does a lot things which are really awesome. One is the ability to store data structures and perform operations on them, PubSub and a lot of other cool stuff. It even allows you to persist the information using a snapshotting technique which takes periodic snapshots of the dataset or by “append only file” where it appends to a file all the operations that take place and recreates the dataset in case of failure.

In my case I didn’t need to persist the data but maintain a snapshot of it in memory. So assuming some basic understanding of Redis and the redis gem I’ll describe the approach.

We created a SET with each tag name as key in REDIS so that every tag contains a set of the post_ids of all posts that have that tag. Inorder to identify a post having two tags in common all that was needed was the intersection of tags sets and REDIS provides built methods for these operations. Thats it!

 
{"communication" => ['12','219', .. , '1027']}  #sample SET

Fetching the similar posts


def find_similar_posts 
  similar_posts = []
  if self.tags.present? && self.tags.first.present?
    tags = self.tags.first.name.split
    tags_clone = tags.clone
    tags_clone.shift
    tags.each do |tag| 
      tags_clone.each do |tag_clone|  
        similar_posts << REDIS_API.sinter(tag.to_s, tag_clone.to_s)
      end
      tags_clone.shift        
     end
  else
    puts "No matching posts."
   end                          
 similar_posts -= [self.id.to_s]
 Post.find(similar_posts)  

Benchmark


 >> Benchmark.measure { p.find_similar_posts }
Benchmark::Tms:0x1100636f8 @cstime=0.0, @total=0.0100000000000007, @cutime=0.0, @label="",
@stime=0.0, @real=0.163993120193481, @utime=0.0100000000000007
 

Which is pretty good considering that we do all the computation within this request and nothing is pre-populated.

MongoDB with Mysql and ActiveRecord.

I’ve been playing with mongodb for a little while now and here some basic issues that I faced and some setup help.

Setting up mongo is really straight forward in OS X. You may either download it from here or use homebrew to install it. If your using brew

brew install mongodb

should do it.

Skip over to the bin folder and launch it with

./mongod

You’ll now see that the console is available for access. It can be launched by

./mongo

and you should see a irb like shell.

Using MongoDB with Mysql while continuing to use Mysql as the primary datastore.

I have been looking into Mongo purely because I was told its highly performant but I still need my primary datastore to be mysql because I see the need for transactions in the future. I decided to go with mongoid (was recommended because its maintained and is unlikely to vanish soon) which only require you to add the gems to the Gemfile.

gem "mongoid", "~> 2.3"
gem "bson_ext", "~> 1.4"

# Run bundle install

If you decide to use Mongoid with MongoDB as your primary datastore you would have to follow these steps completely. However, since I needed to still retain my mysql configuration I only ran this step

 rails g mongoid:config

This should generate a mongoid.yml file similar to your database.yml file in the config folder. A very well written blog post on how to setup users on mongodb and maintain it with mongoid is provided here and I strongly recommend reading it.

Running the generator now makes mongo your default datastore and using the generator would build models using mogoid and not ActiveRecord. To ensure that active record remains your primary orm add the following config to your application.rb

config.generators do |gen|
        gen.orm :active_record
    end

Source :http://stackoverflow.com/questions/6372626/using-active-record-generators-after-mongoid-installation

Now you have ActiveRecord as your primary ORM and you may generate models with mongoid when you need it.

Errors when installing MongoDb on Ubuntu

As before you can find the tar ball at the Mongodb downloads page but on installing and running mongo, I encountered this error.

 exception in initAndListen std::exception: dbpath (/data/db/) does not exist, terminating
  dbexit:
 	 shutdown: going to close listening sockets...
 	 shutdown: going to flush oplog...
 	 shutdown: going to close sockets...
 	 shutdown: waiting for fs preallocator...
 	 shutdown: closing all files...
     closeAllFiles() finished
  dbexit: really exiting now

The error says that it can’t find the /data/db folder. So just create with the right user permissions and you should be good to go

Source : http://ronaldbradford.com/blog/mongodb-experience-getting-started-2010-06-09/

Improving on the original AMQP chat application

Continuing from where my earlier post on AMQP left off, this post describes the changes introduced to make the application more object oriented, shifting from the fanout exchange to direct exchanges and new problems on long polling.

My initial example of the amqp_server would make most Ruby developers cringe. So a quick glance at the AMQP documentation (which is awesome) revealed a simple approach on how to structure the code. The complete source is available here

Here is the revised amqp_server

require 'rubygems'
require 'amqp'
require 'mongo'
require 'em-websocket'
require 'json'
require 'evma_httpserver'
require File.expand_path('../message_parser.rb', __FILE__)
require File.expand_path('../producer.rb', __FILE__)
require File.expand_path('../worker.rb', __FILE__)
require File.expand_path('../consumer.rb', __FILE__)
require File.expand_path('../http_server.rb', __FILE__)
require File.expand_path('../setup.rb', __FILE__)
require File.expand_path('../socket_manager', __FILE__)


# start the run loop
EventMachine.run do
  connection = AMQP.connect(:host => '127.0.0.1', :port => 5672)
  channel = AMQP::Channel.new(connection)

  socket_manager = SocketManager.new
  EventMachine.start_server('127.0.0.1', 8082, Setup, [socket_manager, channel])
# EventMachine.start_server('127.0.0.1', 8081, HttpServer, channel)

  puts "---- Server started on 8081 -----"


   EventMachine::WebSocket.start(:host => '127.0.0.1', :port => 9000) do |ws|

    ws.onopen do
      puts "EStaiblished......"
      ws.send('Connection open')

     puts ">>>>>>>>#{ws.request["query"]} <<<<<<<<< message[:roomname])
    end

    ws.onclose do
      puts " socket connection closed."
      roomname = ws.request["query"]["roomname"]
      username = ws.request["query"]["username"]
      SocketManager.new().remove_socket(roomname, ws)
    end
  end
end

Switching from Fanout to Direct Exchanges

The most significant change is switching from fanout exchanges to direct exchanges. This identifies the room to have a unique queue and is bound to an exchange with a routing_key = room_name. All new messages that arrive are published via the exchange with the same routing key. This works like a charm when working with websockets but won’t be a wise approach when using long polling or simple polling.

Why this works with sockets but is not the best design for polling

In this post we consider each room to have a queue and all messages directed to this room and available from this queue. With websockets its easy to keep track of users connected to this room and on arrival of a message its easy to broadcast this to all the websockets connected on that room.

Keeping track of websockets for a room. socket_manager.rb

class SocketAPI
  def self.api
    @sockets ||= {}
  end
end

class SocketManager
  attr_accessor :sockets

  def initialize
    @sockets = SocketAPI.api
  end

  def add_socket(roomname, sock)
    puts "=#{roomname}" * 50
    puts  "IN ADD SOCKET"
    puts "=" * 50
    puts "SOCKETS #{SocketAPI.api.inspect}"
    @sockets = SocketAPI.api
    if @sockets["#{roomname}"]
      puts "=" * 50
      puts "SOCKET HASH Exists"
      puts "=" * 50
      socket_array = @sockets["#{roomname}"]
      socket_array.push(sock)
    else
      #puts "=#{roomname.blank?}" * 50
      puts "SOCKET HASH DOES NOT Exists"
      puts "=" * 50
      @sockets[roomname.to_s] = []
      socket_array = @sockets["#{roomname}"]
      socket_array.push(sock)
    end
  end

  def remove_socket(roomname, sock)
    sockets = SocketAPI.api
    sockets["#{roomname}"].delete sock
  end
end

So all that is needed is identify the associated sockets for a room and push messages from the consumer to the browser. With polling however we would need to use the ‘pull api’ for queues. Heres an example


require "rubygems"
require "amqp"

EventMachine.run do
  connection = AMQP.connect(:host => '127.0.0.1')
  puts "Connected to AMQP broker. Running #{AMQP::VERSION} version of the gem..."

  channel  = AMQP::Channel.new(connection)
  queue    = channel.queue("c", :auto_delete => true)
  exchange = channel.direct("cexchange")

#  queue.subscribe do |payload|
#    puts "Received a message: #{payload}. Disconnecting..."
#    connection.close { EventMachine.stop }
#  end

  queue.bind(exchange, :routing_key => "cratos")


  exchange.publish "Hello, world!", :routing_key => "cratos"
  exchange.publish "Goodbye world", :routing_key => "cratos"

  exchange.publish "Goodbye world", :routing_key => queue.name


  q = channel.queue("c", :auto_delete => true)


  q.status do |message_count, consumer_count|
    messages = message_count
    consumers = consumer_count

    if messages > 0
      0.upto(messages - 1) do
        q.pop { |m, p| puts "#{m} Payload #{p}" }
      end
    end
  end
end

So we pop off messages from the queue one by one, but in this case we have no information about all the logged in users. So the messages are wiped out from the queue by the first poll that arrives from the members of the room. What this means is that the queue is wiped clean with the first poll updating one user’s browser window with the latest messages while the others see the old messages.

What are the alternatives

Right now it seems like “topic exchanges” would work. Every user could have a queue for each room. So the routing key could be something like “sid.harry_potter” where the period separates the username and the roomname. Maybe this is not the best alternative but the first one I could think of.

Ugly hack

One part of the code which seems like a horribly ugly hack to me is the Setup class which creates our queues. Setup.rb simply defines a basic EventMachine Server which listens on port 8082 for incoming requests. This request arrives when the user creates a new room (in our rails app).

The incoming requests triggers a call to the worker.rb which creates the queues. The other change aren’t that significant other than moving chunks to respective classes.

I still continue to use Websockets but using websockets for production wouldn’t be wise considering its limited adoption. Hence I dabbled with SocketIO and simple XHR-polling.

Though my current link to the source does not contain my experiments with Long Polling I would like to briefly go into the problems I faced.

Some of the basic issues would be violating the same origin policy and the easiest solution that I could come up with was to use Apache proxying.



   ServerName localhost
   DocumentRoot /etc/apache2/www/trackertalk/public/    
   ErrorLog "/private/var/log/apache2/dummy-host.example.com-error_log"
   CustomLog "/private/var/log/apache2/dummy-host.example.com-access_log" common
   ProxyPass /poll http://a.localhost:8081
   ProxyPassReverse	/poll http://a.localhost:8081
   RailsEnv development	



   ServerName a.localhost
   ProxyRequests Off
   ProxyPass / http://localhost:8081
   ProxyPassReverse / http://localhost:8081


All that is needed is to forward all requests that arrive to localhost/poll to a.localhost:8081 where our http_server would live. The http_server is another http_server that uses EM::Http_Server on port 8082 to handle incoming requests. It would need to be able to handle incoming requests (like a poll to check for messages in the relevant queue and forward it back to the client or a new message and add it to the correct queue.) and provide
the required reponse.

If you know or have built applications using AMQP and Ruby, I would love to know more about how production ready applications are structured and built. Any suggestions, corrections or feedback on my code would be awesome