The state of tech, hiring and leadership.

I generally tend to dismiss posts on hiring because they are largely regurgitated content but lately I’ve noticed an interesting trend that is oddly comforting and needed to be shared.

One the of things that I look for in an organization is maturity in leadership. I immediately develop a sense of respect for the team and the company when i see it being led by someone with the ability to make sound decisions. However, this does not imply that younger leaders are less adept at leading organizations efficiently. In fact what I’ve observed is quite the opposite.

The way interviews are conducted at these younger organizations are so much more welcoming and in tune with how we develop products/software in the real world. However, this is what I see often with traditional organizations.

Pressure Tests

These are by far the most pointless ways to judge whether a person is qualified to work in your organization. If you belong to an organization that is constantly under pressure then that is clearly a failure in leadership.

Proving a Mathematical Formulae

I’m terrible at Math but I also not too bad at designing good software systems. 9 times out of 10 I’d never be able to prove a mathematical formula on a whiteboard.

Modeling vague business problems

These are the hardest to justify as the interviewer just makes up the problem as you go along. I’ve never really managed to figure out the point behind these kind of questions and what they aim to evaluate. If you’re lucky you’ll end up saying what the interviewer wants to hear.


To check if you’re Batman?


I used to be extremely cynical about the new generation of engineers and them leading organizations, but they’re in fact way more confident, fearless and open to trying new things. A lot of the times new ideas are brushed aside in the name of simplicity and trusted technology. I’ve had senior engineers/leaders come to me saying that they aren’t easily moved by the coolness in tech, which is a fair comment if it isn’t used in self defense.

One of the most important aspects of good leadership is the ability to wear your knowledge lightly. Truly smart people are rarely rude, standoffish or insecure about others and this usually comes across during interviews. It’s in fact the exact opposite with them. You come out enlightened after a 1 hour session. On the flip side, I’ve had experiences where the entire focus of the interview was to show me that I didn’t know enough to be part of the company. Which makes me wonder if it would’ve been easier to have cut the session short and ask me to leave earlier.

Interviews at younger organizations.

They’re more focused on knowing you and understanding if you’d be a cool person to hangout with. Learning about the things you worked on and the decisions you took. Evaluating a person based on what he knows or the circumstances in which he took certain decisions is a lot different from evaluating them based on what you know or casting a judgement on that decision in isolation.

There is a lot room for this new breed of leaders in India and I’m happy to see more organizations going this way. Understanding a potential hire has certainly emerged as an essential step towards hiring someone in your organization and is a huge step towards building businesses of the future. One of things that I hear quite often is that investors invest in people, however that hasn’t fully translated across the entire organization. Very few leaders are investing in other people (they hire). While the Indian tech scene still has way too many middle managers competing with potential hires instead of evaluating them for complimentary skill sets that allows an organization to level up faster, I expect this to change very soon.

Things are definitely looking up.

Software consulting – Why project managers and product managers are so important

The company that I work for has been hiring engineers and has been pretty aggressive at it. We’ve spared no costs and efforts in finding talented and interested people. Unlike a lot of companies we value enthusiasm and attitude over sheer brilliance. The ‘why’ (do we prefer enthusiasm over attitude) has been covered several times and my reasons align with most of such articles and would be tangential to the point of my post.

I had been oblivious to the stigma that is associated with software consulting firms, especially with the sudden boom in product startups in all major cities. During the course of one of the interviews I was told that an engineer we were interested in speaking with wasn’t interested in working with a services company because he didn’t want to be tied down, checking off tickets assigned to him. While this isn’t uncommon at consulting firms the way he put it brought back a lot of repressed memories of my time at one of my first jobs at an MNC.

This also brought me to a more recent turn of events where I chose to give up on an assignment because it stopped feeling right. I had recently stepped off one my longest engagements (and one of my personal favourites) to take up a new project which after 2 months still seemed odd. It felt a lot like my first job. In fact my first job had projects which kept me going and interested, but this had brought me to a point where it seemed wiser to step down.

I feel it all boils down to the product/project managers. 

As a project/product manager its vital to embrace your teammates/engineers in the vision of your product/project. Its important to ensure there is no disconnect between what you envision as the goal for your product and what your teammates ideas are.

So what made one project a personal favourite while I couldn’t wait to get off the other?

The better product manager ensured that I was always involved in the day-to-days of the project. Things like keeping me involved in customer feedbacks, issues the accounting team faced and what the company aimed to accomplish gave me a sense of purpose. It defined my role in the success of the company and made my work seem purposeful. Every commit/fix seemed to validate my contribution to the success of the business.

The other, just ensured that I knew about the issues that needed completion and how the module had to work. This was very literally checking off tickets. I had very little vision of how it fit into the grander scheme of things. There always seemed a level of disconnect that I couldn’t seem to get around. I wasn’t aware of what the project was planning next and why it was doing that. So, my interests were solely aligned in getting my features and tests working. I was rarely aware of when my features were merged and was generally notified when there was an issue.

While some may argue that its impractical to have everyone involved in the big picture. I feel its up to the people managers in each team to ensure they carefully abstract the information to the right level to keep everyone feeling like their contributions are purposeful.

So what if the information was already abstracted to the level that I needed to know?

This is where great product managers shine through. If you’ve made it this far its worth getting into this one.

As part of my first job (at the MNC) I was part of a 4 person team that was responsible for automating all and any kind of tests that needed to be run in the company’s QA boxes. At this point, I was a clueless, fresh grad who wasn’t sure how I could be of any use to this project.

My project manager (one of the most experienced engineers at the company) told me that I had to look into a set of tools that would convert any kind of user activity on any web page into a set of reproducible scripts. He then went on to let the other guys know what they would be working on. It seemed like a fairly straightforward task given that the tool did most of the stuff and I had to pretty much just follow the documentation to get things working.

At the end of the day when we all had dug into the nuggets of information that he had distributed, he came in to tell us how each of the modules that the 4 of us had been working on, would interact with each other, to ensure that all the tests were run and reports generated. There have been very few moments in my life where I was hit with such clarity of thought. The beauty of it all was that If he shared this bit of information at the onset it would’ve seemed a lot more daunting. This way he broadened the scope of our work and vision at the right time.

So, my point is that being a project/product manager is hard. Being able to understand the motivations and passions of each of your team mate is tough but important. If the vision and the goals of the uppers management is not conveyed correctly down the chain your pretty much running on zombie legs.

















Mongodb – Understanding how journal mode works with j:true

One of things that has recently confused me with Mongodb while using Mongoid with the Moped driver is the use of the journal mode and the j:1 write concern.

Journal Mode

The journal mode ensures that your writes are written to a redo/write ahead log after its written to the memory but before its pushed to disk. It provides a level of durability where if the server were to crash before certain data is written to disk, on restart this data is read from the journal logs and written to disk. Thus making your data more resilient.

Journaling is enabled on all 64 bit installations since 2.0 as a default but not on the 32 bit ones.

What kind of stuff does journaling record

Journaling takes care of

  • Create/update commands
  • Index creation
  • namespace changes.

Journal Commit Interval

The journal commit interval (journalCommitInterval) is the time interval between subsequent writes to the journal files and is set 100 ms as a default. This is true if your journal files and data file are on the same server. Its 30 ms if they aren’t on the same machine.

This means you can lose data that has not yet been journaled in the last 100ms. You can update the journalCommitInternval setting in your mongodb.conf file to anywhere between 2ms to 300ms. A lower value will increase the frequency with which data is journaled but at the cost of disk performance.

To force mongodb to commit sooner to the journal without updating the journalCommitInterval setting one could use the j:1 write concern. As per the docs

When a write operation with j:true is pending, mongod will reduce journalCommitInterval to a third of the set value.

So a low (whatever works best for you) journalCommitInterval with saves called with j:1 would ensure that the system remains consistent.

To verify that your data was written to the journal you could call the

db.someModel.update({'_id':ObjectId('52d688e92559df237900000b')}, {price_cents:14375}, {multi: false})

view raw


hosted with ❤ by GitHub

Most drivers run this for you so this isn’t something that you need to worry about.


The mongoid documentation for versions 3.0 and above does not talk about the j:1 write concern for some reason but here is what i found based on this discussion.

myobject =
myobject.price = 55

view raw


hosted with ❤ by GitHub

This is equivalent to calling it with

myobject =
myobject.price = 55
myobject.with(safe: {w:1}).save

view raw


hosted with ❤ by GitHub

This implies that the write will be acknowledged once the data is written to the primary’s memory but not to disk. If you want to trigger a journal write as well then you would have to do it with

myobject =
myobject.price = 55
myobject.with(safe: {w:1, j:1}).save

Most of the data was collected and posted as an answer on stackoverflow by mnemosyn and is available on the mongodb docs. My goal was to create a quick reference.



Mongodb – not evil, just misunderstood

Lately I’ve been reading a lot about Mongodb and posts dissuading you from ever using it. Some of these articles are seriously outrageous and make me wonder what got the team to actually start using Mongodb in the first place. Sarah Mei’s recent article was one such that upset me a lot, especially since the title was so inflammatory.

My post however, aims at highlighting the areas where Mongodb works and how it performed brilliantly for us. As someone leading the engineering efforts for a shipping and logistics company I wasn’t too happy initially to see Mongodb being used as the primary datastore but after 2 years I’m more than sure that this was definitely the datastore for us. I’ve outlined areas that confused me when i first encountered them only to learn that they were actually invaluable features that were available to me.

“No migrations” is that all you have?
The advantages of schemaless documents are priceless. Not having to migrate is just one of the perks. Our schemas were largely in the form of Orders (having many) Shipments (going_from) ShipPoint (to) ShipPoint

We rarely used most of these entities without the other and it just served us extremely well to manage them as self contained documents embedding the other.

Mongodb writes are fire and forget? WTF?
This doesn’t always have to be the case, though it significantly contributes to Mongodb’s fast writes. Mongodb’s write concerns configurations allow you to configure the precise level of persistence that needs to be achieved to call it a successful write. So if the write fails you know its failed. The fact that you could know if your write has migrated to replicas or has been journaled is a pretty neat feature.

How can the default writes be fire and forget?
(Version – 2.4.8 changes this however this is valid till version 2.2.6)
It just made sense, given all the information to configure it the way you prefer I would always go with this approach. We add a lot of Notes to each shipment as it gets reviewed at different levels by the sales, accounts and other teams. These notes generally serve as a reminder or a single line indicating that its been viewed – though it doesn’t critically affect the business workflows of the application. Its just seemed logical that these were fire and forget operations and could be stored as quickly as possible.

Another place where this is extremely handy for us is during Tracking. We track several hundreds of shipments each day logging every tracking status, location and time the shipment has been, while in transit. This information is handy for customers to keep an eye on where their shipment has reached. Chances are when fetching this information some of the information is not saved the first time – but we expect that it would be obtained during a second fetch 30 minutes later. The default write concern works brilliantly then.

Read locks and write locks – don’t they slow you down
They do but since most of the stuff is memory mapped this doesn’t affect you in a major way. However, I did notice people always working the primary of a replicaSet and never querying the secondaries for fear of inconsistent data. I think if you have sufficient memory your replication lag would be pretty small and besides if you don’t need the data to be consistent every instant querying secondary is a sensible option to reduce the load off your primary. Which brings me to the PrimaryPreferred Read preference. This allows you to query a secondary in your replica set when your primary is not available. It’s a fairly safe choice in my opinion.

We began querying secondaries for ShipPoints which didn’t change that often.

All the memory usage is killing me!
This is one of the things I that took me time to accept. Mongodb expects that your working set fits into RAM along with the indexes for your database. Your working set is the data that is frequently queried and updated. Since mongodb works with memory maps most of your working set data is mapped to the memory. When this data is not available in memory a page fault occurs and your data needs to be fetched from disk. This results in a performance penalty but as long as you have some swap space you can safely load the data back in.

While our working set was fairly small our reporting application needed access to the entire shipment records to generate reports. This resulted in Mongo running out of memory and spitting OperationFailure errors on a regular basis.

Our initial approach was naive and we started using Redis(which is another datastore thats pure gold.) to store snapshots of information but soon realised we could just use Mongodb to make it work.

So can I never generate reports without having my dataset fit in memory?
Rollups to the rescue. Rollups are pre-aggregated statistical information that help you speed up your aggregation process. This makes life significantly easier as you query for short time ranges to generate micro-reports.

Here is a simplified snapshot on how we generated daily and monthly aggregates with mapreduce.

class AggregatesGenerator
def self.generate_daily_aggregates(date = nil)
map = <<-MAP
var key = {date: new Date(this.record_created_at.getFullYear(),
var data = {}; =;
data['zone_' +] = 1;
data.service_type_string = this.service_type.toString().toLowerCase();
data[this.service_type.toString().toLowerCase()] = 1;
data.total_price_cents = parseInt(this.price_cents);
data.shipment = this.tracking_id
if(this.carrier == 'FEDEX'){
data.FEDEX = 1;
data.UPS = 0;
data.UPS = 1;
data.FEDEX = 0;
emit(key, data);
reduce = <<-REDUCE
function(key, values){
var results = {'next_day_air': 0,
'second_day': 0,
'three_day_select': 0,
'ground': 0,
'fedex_2_day': 0,
'zone_2': 0,
'zone_3': 0,
'zone_4': 0,
'zone_5': 0,
'zone_6': 0,
'zone_7': 0,
'zone_8': 0,
'zone_9': 0,
'zone_10': 0,
'zone_14': 0,
'zone_17': 0,
'zone_22': 0,
'zone_23': 0,
'zone_25': 0,
'zone_92': 0,
'zone_96': 0,
'total_price_cents': 0,
'UPS': 0,
results.total_price_cents += parseInt(v.total_price_cents);
results[v.service_type_string] += v[v.service_type_string];
results['zone_' +] += v['zone_' +];
var today = new Date()
results.updated_at = new Date(today.getFullYear(),
results.FEDEX += v.FEDEX;
results.UPS += v.UPS;
return results;
finalize = <<-FINALIZE
function(key, results){
var fields = [
results[f] = 0;
results.shipments = [];
return results;
if date.nil?
ReportingShipment.order_by([:record_created_at, :desc]).
map_reduce(map, reduce).out(merge: 'daily_aggregates').finalize(finalize).find
ReportingShipment.where({:record_created_at.gte => date.beginning_of_day, => date.end_of_day}).
map_reduce(map,reduce).out(merge: 'daily_aggregates').find
def self.generate_monthly_aggregates
map = <<-MAP
var objDate =;
var key = {
year: objDate.getFullYear(),
month: objDate.getMonth()
emit(key, this.value);
reduce = <<-REDUCE
function(key, values){
var result = {
result.next_day_air += v.next_day_air;
result.second_day += v.second_day;
result.three_day_select += v.three_day_select;
result.ground += v.ground;
result.fedex_ground += v.fedex_ground;
result.total_price_cents += parseInt(v.total_price_cents);
result.zone_2 += v.zone_2;
result.zone_3 += parseInt(v.zone_3);
result.zone_4 += parseInt(v.zone_4);
result.zone_5 += parseInt(v.zone_5);
result.zone_6 += parseInt(v.zone_6);
result.zone_7 += parseInt(v.zone_7);
result.zone_8 += parseInt(v.zone_8);
result.zone_9 += parseInt(v.zone_9);
result.zone_10 += parseInt(v.zone_10);
result.zone_14 += parseInt(v.zone_14);
result.zone_17 += parseInt(v.zone_17);
result.zone_22 += parseInt(v.zone_22);
result.zone_23 += parseInt(v.zone_23);
result.zone_25 += parseInt(v.zone_25);
result.zone_92 += parseInt(v.zone_92);
result.zone_96 += parseInt(v.zone_96);
return result;
DailyAggregate.order_by([:_id, :desc]).map_reduce(map, reduce).
out(merge: 'monthly_aggregates')

view raw


hosted with ❤ by GitHub

So you mean this can’t be realtime?
Yes it can – through atomic updates. Just like we generated rollups to speed up reporting we can generate pre-aggregated snapshots of aggregated information like this.

{ "_id" :
{ "year" : 2013, "month" : 3 },
"value" :
{ "next_day_air" : 30,
"second_day" : 83,
"three_day_select" : 7,
"ground" : 27,
"fedex_ground" : 51,
"priority_overnight" : 0,
"fedex_express_saver" : 0,
"fedex_2_day" : 0,
"total_price_cents" : 0,
"zone_2" : 10,
"zone_3" : 10,
"zone_4" : 12,
"zone_5" : 16,
"zone_6" : 158,
"zone_7" : 1,
"zone_8" : 0,
"zone_9" : 0,
"zone_10" : 0,
"zone_14" : 0,
"zone_17" : 0,
"zone_22" : 0,
"zone_23" : 0,
"zone_25" : 0,
"zone_92" : 0,
"zone_96" : 0,

Once this is in place you can update your aggregates by simply incrementing the right counter with something like"value.next_day_air", 1)

I haven’t even touched upon the replication and sharding features that Mongodb offers which I will reserve for another post. To summarise I feel Mongodb is awesome and is a lot like the kid in class who you dismissed because your friends thought he was weird – till you got to know him.

Disclaimer: I don’t claim to be an authority on Mongodb and everything that I have written about is stuff that I’ve learnt while working with Mongodb. I recommend reading the documentation and going through the talks available on the Mongodb website.

Simple Quick SSH Tunneling to expose your web app

I’ve been a localtunnel user for quite some time now and I really love the fact that its a free service, quick to install and easy to expose your development app to the world. There are quite a few worthy alternatives now with and pagekite that pretty much do the same thing. 

But at times it gets annoying (especially when in the middle of other work) that I’m unable to access localtunnel because its down or I outran the free usage limit for Pagekite

I generally end up using localtunnel when I have to wait for an IPN from Paypal or (relay response) while working on my development environment. So here is a quick way to roll out a basic version of the service for your own needs.

Before we move to the “how to” here is a quick intro to ssh tunneling

Now, I assume that you have a staging server (or some server you have ssh access to).

The following terminal command does pretty much the same thing we do with the Ruby code that follows:

ssh -R0.0.0.0:9999:localhost:3000

The -R indicates that you’ll be forwarding your local app running on port 3000 to the remote port 9999 on the remote host so that everyone can access it. Now your application running on localhost:3000 is accessible at

We do the same thing now using the ruby net-ssh library. The following code snippet is customised to my defaults but its simple enough to change those settings.

#! /usr/bin/env ruby
require 'rubygems'
require 'net/ssh'
require "net/ssh/gateway"
# Terminal equivalent
# #ssh -R0.0.0.0:9999:localhost:3000
puts "Enter the remote server name you wish to forward your local port to: ("
remote_host = gets.chomp
puts "Enter the remote port you wish your local app to be available on – on the remote server (ex: 9999)"
remote_port = gets.chomp
puts "Enter remote server user to ssh with"
remote_user = gets.chomp
puts "Enter local port to forward"
local_port = gets.chomp
remote_host = "" if remote_host.empty? || remote_host.nil?
remote_port = 9999 if remote_port.empty? || remote_port.nil?
remote_user = "sid" if remote_user.empty? || remote_user.nil?
local_port = 3000 if local_port.empty? || local_port.nil?
gateway =, remote_user)
puts "Forwarding{local_port} to #{remote_host}:#{remote_port}"
#remote_host = ""
Net::SSH.start(remote_host, remote_user) do |ssh|
puts "Connecting…"
#puts ssh.inspect
ssh.logger.sev_threshold = Logger::Severity::DEBUG
#remote forward from remote to established
ssh.forward.remote(local_port, '', remote_port, remote_host)
ssh.loop { true }

Hope this helps

Mongoid – Comparing attributes belonging to the same document

I spent an insane amount of time trying to figure out how to elegantly compare attributes belonging to the same document and stumbled upon a few stackoverflow questions that helped. However the solution still doesn’t appear elegant enough but does the job.

scope :expired, any_of({ => nil}, 
                       {:expiry_date.lte =>}, 
                       {:usage_count => {"$gte" => "this.max_usage_count"}})

Certainly not noteworthy but this post is to help me document this bit.


To build a named scope to allow me to query expired documents. The model has the following fields

	field :code, :type => String
	field :allowed_domain, :type => Integer, :default => DOMAIN_ALL_SITES
	field :description, :type => String
	field :discount_type, :type => Integer, :default => PERCENTAGE_DISCOUNT
	field :dollar_discount_cents, :type => Integer, :default => 0 
	field :percentage_discount, :type => Integer, :default => 0
	field :max_usage_count, :type => Integer, :default => 1
	field :usage_count, :type => Integer, :default => 0
	field :expiry_date, :type => DateTime, :default => ( + 1.week)
	field :completed_date, :type => DateTime	

	index :code
	index :expiry_date
	index :allowed_domain

The precondition for an expired record is that its completed_date should be nil , the expiry_date should be less than the present date and usage_count must be less than the max_usage_count

While the above block of code does that, what I hate is the mixing JSON and Ruby. While mongoid for most purposes does a good job in making life easier (as every ORM should), there are occasions where I’d prefer it let me write pure Mongodb queries or use the Ruby wrappers entirely.

Other Resources

Further bulletins as events warrant.

My Favorite Ted Talks

I’ve been going through some of my favorite Ted Talks, here’s the list.

1. Sir Ken Robinson on Creativity.

2. Simon Sinek, People don’t buy what you do, they buy why you do it

I love this talk just for that one line which just resonates with everything I believe in.

3. Dan Ariely

Lastly, to my favorite speaker Dan Ariely. I’ve been a big fan of his books and talks. I strongly recommend you read Predictably Irrational. Here are his talks

I can’t believe I missed this the first time around.

4. Derek Sivers – How to start a movement.

Async Responses in Thin and a Simple Long Polling Server

I’ve been playing with Thin, the Ruby web server which packages the awesomeness of Rack, Mongrel and Eventmachine (\m/). However, the thing that blew me away completely was the James Tucker’s Async Rack that was integrated to Thin. The combination opens a whole new world of realtime responses, something that I’ve been constantly switching to NodeJS and SocketIO for.

Async Rack

A simple rack application would look like this


class MyRackApp
  def call(env)
    [200, {'Content-Type' => 'text/html'}, ["Hello World"]]


# thin start -R -D -p3000

This simple rack app on hitting port 3000 responds with the a status code (200), the content-type and some body. Nothing special. With async rack we’d be able to send the response head back while we build the response. Once the response is built we send the body and close the connection.

A quick glance at Patrick’s Blog (apologies for not being able to find the last name of the author of this blog) should give you an excellent understanding of what I’m trying to explain

A simple async app


AsyncResponse = [-1, {}, []].freeze

class MyAsyncApp
  def call(env) do
      body = [200, {"Content-Type" => "text/html"}, ["<em > This is Async </em >"]]


The AsyncResponse is a constant which returns a -1 status which tells the server that the response will be coming later asynchronously. It, as defined in the examples provided in thin an “Async Template” Async Template

On a request the code initially returns an AsyncResponse while the thread waits on the sleep request. Once the thread is active again it builds the response and sends the response keeping the connection alive.

An async app where we build the response and close the connection

From here on, it would really help to have the thin code open in your text editor. We would be looking at the request.rb, response.rb and the connection.rb methods from /thin folder.

require 'rubygems'

class DeferrableBody
	include EventMachine::Deferrable

	def each &blk
		puts "Blocks: #{blk.inspect}"
		@callback_blk = blk

	def append(data)
		puts " -- appending data --"
		data.each do |data_chunk|
			puts " -- triggering callback --"

class AsyncLife
	AsyncResponse = [-1, {}, []].freeze
	def call(env)
		body =

		EM.next_tick {
			puts "Next tick before async"
			puts "#{env['async.callback']}"
			env['async.callback'].call([200, {'Content-Type' => 'text/plain'}, body])

		puts "-- activated 5 second timer --"
		EM.add_timer(5) {
			puts "--5 second timer done --"
			body.append(['Mary had a little lamb'])

			puts "-- activated 2 second timer --"
				puts "--7 second timer done --"
				body.append(['This is the house that jack built'])
				puts "-- succeed called -- "




This example uses Eventmachine Deferrable which allow us to determine when we’re done building our response. But the tricky part I struggled to get my head around was the strangle looking each method and how it uses the append method to build our responses.

Walking through the code

When the request comes in from the browser the application renders the AsyncResponse constant which tells the server that the response body is going to be built over time. Now on request eventmachine’s ( checkout thin/connection.rb ) post_init method creates a @request and @response object. This triggers a post_process method which on the first call returns without setting the header, status or body and prepares for the asynchronous response.

On next_tick we begin to create our header. We initialize a EM::Deferrable object which is assigned as the response body and this ensures the header is sent ahead of the body (because we don’t have anything to iterate over in the each block where the response is sent) The env[‘async.callback’] is a closure for the method post_process which is created by method(:post_process) checkout the pre_process method in the thin/connection.rb

The each method in our Deferrable class overrides the default each implementation defined by the response object. So now our each block is saved in an instance variable @callback_blk which is call ed when we call the append method. So essentially we are calling send_data on each of the data blocks we’re sending back when we call the append method.

Once that’s done we call succeed which tells Eventmachine to trigger the callback to denote we’re done building the body. It ensures the request response connection is closed.

The default each implementation

 # Send the response      
      @response.each do |chunk|        
        trace { chunk }
        send_data chunk

Thats pretty much what I gathered by going through the code on async response with thin and rack. Another useful module is the thin_async bit written by the creator of thin @macournoyer available here . This pretty much abstracts the trouble of overriding the each block.

Here’s an example of a simple long polling server I built using thin available here

Hope this is helpful

Tagging with Redis

Its been a long time since my last post and this one is about Redis. So I’ve been working on this project for a bit now and I had this story to implement where I had to associate posts with other posts which were identified to be similar based on a tagging system that already existed. The trouble here was that the existing tagging system was closely tied to a lot of other functionalities and couldn’t be easily (quickly) re-written. The feature needed to be rolled out quickly for several reasons which I won’t get into.

The Problem

The tags associated to each posts were poorly organized where one record in the Tag model would hold all the tags associated to the post as a whitespace separated string (ouch!) # business money finance investment

So to find posts that had 2 tags in common from a database of approximately 2000 posts with at least 4 tags each took a significant amount of time. Here is a basic benchmark of just finding the matches on each request.

Benchmark.measure do
 similar_posts = []
 Post.tagged_posts.each do |post|
   similar_tags = ( &
   similar_posts << post if similar_tags.size >= 2

Here is what benchmark returns

 => #Benchmark::Tms:0x111fd8cb8 @cstime=0.0, @total=2.25, 
@cutime=0.0, @label="", @stime=0.19, @real=4.79089307785034, @utime=2.06

Not Great.

So the next option was to pre-populate the related posts for every post and store it in the database as similar_posts. So all that was required was to fetch the post with its ‘similar_posts’ at runtime. This seemed like an acceptable solution considering the tags were not changed on a regular basis but if changed it would require the similar_posts table to be rebuilt again(which took a long time). Here is the benchmark for fetching the pre-populated data from the database

Benchmark.measure { p.similar_posts }
=> #Benchmark::Tms:0x104d153f8 @cstime=0.0, @total=0.0100000000000007, 
@cutime=0.0, @label="", @stime=0.0, @real=0.0333230495452881, @utime=0.0100000000000007

Nice! But this came at the cost of having to rebuild the similar_videos every time something had to be changed with tags or videos.


Redis is this awesome in memory key-value, data-structure store which is really fast. Though it would be wrong to think of it as a silver bullet it does a lot things which are really awesome. One is the ability to store data structures and perform operations on them, PubSub and a lot of other cool stuff. It even allows you to persist the information using a snapshotting technique which takes periodic snapshots of the dataset or by “append only file” where it appends to a file all the operations that take place and recreates the dataset in case of failure.

In my case I didn’t need to persist the data but maintain a snapshot of it in memory. So assuming some basic understanding of Redis and the redis gem I’ll describe the approach.

We created a SET with each tag name as key in REDIS so that every tag contains a set of the post_ids of all posts that have that tag. Inorder to identify a post having two tags in common all that was needed was the intersection of tags sets and REDIS provides built methods for these operations. Thats it!

{"communication" => ['12','219', .. , '1027']}  #sample SET

Fetching the similar posts

def find_similar_posts 
  similar_posts = []
  if self.tags.present? && self.tags.first.present?
    tags =
    tags_clone = tags.clone
    tags.each do |tag| 
      tags_clone.each do |tag_clone|  
        similar_posts << REDIS_API.sinter(tag.to_s, tag_clone.to_s)
    puts "No matching posts."
 similar_posts -= []


 >> Benchmark.measure { p.find_similar_posts }
Benchmark::Tms:0x1100636f8 @cstime=0.0, @total=0.0100000000000007, @cutime=0.0, @label="",
@stime=0.0, @real=0.163993120193481, @utime=0.0100000000000007

Which is pretty good considering that we do all the computation within this request and nothing is pre-populated.

MongoDB with Mysql and ActiveRecord.

I’ve been playing with mongodb for a little while now and here some basic issues that I faced and some setup help.

Setting up mongo is really straight forward in OS X. You may either download it from here or use homebrew to install it. If your using brew

brew install mongodb

should do it.

Skip over to the bin folder and launch it with


You’ll now see that the console is available for access. It can be launched by


and you should see a irb like shell.

Using MongoDB with Mysql while continuing to use Mysql as the primary datastore.

I have been looking into Mongo purely because I was told its highly performant but I still need my primary datastore to be mysql because I see the need for transactions in the future. I decided to go with mongoid (was recommended because its maintained and is unlikely to vanish soon) which only require you to add the gems to the Gemfile.

gem "mongoid", "~> 2.3"
gem "bson_ext", "~> 1.4"

# Run bundle install

If you decide to use Mongoid with MongoDB as your primary datastore you would have to follow these steps completely. However, since I needed to still retain my mysql configuration I only ran this step

 rails g mongoid:config

This should generate a mongoid.yml file similar to your database.yml file in the config folder. A very well written blog post on how to setup users on mongodb and maintain it with mongoid is provided here and I strongly recommend reading it.

Running the generator now makes mongo your default datastore and using the generator would build models using mogoid and not ActiveRecord. To ensure that active record remains your primary orm add the following config to your application.rb

config.generators do |gen|
        gen.orm :active_record

Source :

Now you have ActiveRecord as your primary ORM and you may generate models with mongoid when you need it.

Errors when installing MongoDb on Ubuntu

As before you can find the tar ball at the Mongodb downloads page but on installing and running mongo, I encountered this error.

 exception in initAndListen std::exception: dbpath (/data/db/) does not exist, terminating
 	 shutdown: going to close listening sockets...
 	 shutdown: going to flush oplog...
 	 shutdown: going to close sockets...
 	 shutdown: waiting for fs preallocator...
 	 shutdown: closing all files...
     closeAllFiles() finished
  dbexit: really exiting now

The error says that it can’t find the /data/db folder. So just create with the right user permissions and you should be good to go

Source :