Software consulting – Why project managers and product managers are so important

The company that I work for has been hiring engineers and has been pretty aggressive at it. We’ve spared no costs and efforts in finding talented and interested people. Unlike a lot of companies we value enthusiasm and attitude over sheer brilliance. The ‘why’ (do we prefer enthusiasm over attitude) has been covered several times and my reasons align with most of such articles and would be tangential to the point of my post.

I had been oblivious to the stigma that is associated with software consulting firms, especially with the sudden boom in product startups in all major cities. During the course of one of the interviews I was told that an engineer we were interested in speaking with wasn’t interested in working with a services company because he didn’t want to be tied down, checking off tickets assigned to him. While this isn’t uncommon at consulting firms the way he put it brought back a lot of repressed memories of my time at one of my first jobs at an MNC.

This also brought me to a more recent turn of events where I chose to give up on an assignment because it stopped feeling right. I had recently stepped off one my longest engagements (and one of my personal favourites) to take up a new project which after 2 months still seemed odd. It felt a lot like my first job. In fact my first job had projects which kept me going and interested, but this had brought me to a point where it seemed wiser to step down.

I feel it all boils down to the product/project managers. 

As a project/product manager its vital to embrace your teammates/engineers in the vision of your product/project. Its important to ensure there is no disconnect between what you envision as the goal for your product and what your teammates ideas are.

So what made one project a personal favourite while I couldn’t wait to get off the other?

The better product manager ensured that I was always involved in the day-to-days of the project. Things like keeping me involved in customer feedbacks, issues the accounting team faced and what the company aimed to accomplish gave me a sense of purpose. It defined my role in the success of the company and made my work seem purposeful. Every commit/fix seemed to validate my contribution to the success of the business.

The other, just ensured that I knew about the issues that needed completion and how the module had to work. This was very literally checking off tickets. I had very little vision of how it fit into the grander scheme of things. There always seemed a level of disconnect that I couldn’t seem to get around. I wasn’t aware of what the project was planning next and why it was doing that. So, my interests were solely aligned in getting my features and tests working. I was rarely aware of when my features were merged and was generally notified when there was an issue.

While some may argue that its impractical to have everyone involved in the big picture. I feel its up to the people managers in each team to ensure they carefully abstract the information to the right level to keep everyone feeling like their contributions are purposeful.

So what if the information was already abstracted to the level that I needed to know?

This is where great product managers shine through. If you’ve made it this far its worth getting into this one.

As part of my first job (at the MNC) I was part of a 4 person team that was responsible for automating all and any kind of tests that needed to be run in the company’s QA boxes. At this point, I was a clueless, fresh grad who wasn’t sure how I could be of any use to this project.

My project manager (one of the most experienced engineers at the company) told me that I had to look into a set of tools that would convert any kind of user activity on any web page into a set of reproducible scripts. He then went on to let the other guys know what they would be working on. It seemed like a fairly straightforward task given that the tool did most of the stuff and I had to pretty much just follow the documentation to get things working.

At the end of the day when we all had dug into the nuggets of information that he had distributed, he came in to tell us how each of the modules that the 4 of us had been working on, would interact with each other, to ensure that all the tests were run and reports generated. There have been very few moments in my life where I was hit with such clarity of thought. The beauty of it all was that If he shared this bit of information at the onset it would’ve seemed a lot more daunting. This way he broadened the scope of our work and vision at the right time.

So, my point is that being a project/product manager is hard. Being able to understand the motivations and passions of each of your team mate is tough but important. If the vision and the goals of the uppers management is not conveyed correctly down the chain your pretty much running on zombie legs.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Mongodb – Understanding how journal mode works with j:true

One of things that has recently confused me with Mongodb while using Mongoid with the Moped driver is the use of the journal mode and the j:1 write concern.

Journal Mode

The journal mode ensures that your writes are written to a redo/write ahead log after its written to the memory but before its pushed to disk. It provides a level of durability where if the server were to crash before certain data is written to disk, on restart this data is read from the journal logs and written to disk. Thus making your data more resilient.

Journaling is enabled on all 64 bit installations since 2.0 as a default but not on the 32 bit ones.

What kind of stuff does journaling record

Journaling takes care of

  • Create/update commands
  • Index creation
  • namespace changes.

Journal Commit Interval

The journal commit interval (journalCommitInterval) is the time interval between subsequent writes to the journal files and is set 100 ms as a default. This is true if your journal files and data file are on the same server. Its 30 ms if they aren’t on the same machine.

This means you can lose data that has not yet been journaled in the last 100ms. You can update the journalCommitInternval setting in your mongodb.conf file to anywhere between 2ms to 300ms. A lower value will increase the frequency with which data is journaled but at the cost of disk performance.

To force mongodb to commit sooner to the journal without updating the journalCommitInterval setting one could use the j:1 write concern. As per the docs

When a write operation with j:true is pending, mongod will reduce journalCommitInterval to a third of the set value.

So a low (whatever works best for you) journalCommitInterval with saves called with j:1 would ensure that the system remains consistent.

To verify that your data was written to the journal you could call the

Most drivers run this for you so this isn’t something that you need to worry about.

Mongoid

The mongoid documentation for versions 3.0 and above does not talk about the j:1 write concern for some reason but here is what i found based on this discussion.

This is equivalent to calling it with

This implies that the write will be acknowledged once the data is written to the primary’s memory but not to disk. If you want to trigger a journal write as well then you would have to do it with

Most of the data was collected and posted as an answer on stackoverflow by mnemosyn and is available on the mongodb docs. My goal was to create a quick reference.

References:

  1. http://docs.mongodb.org/manual/reference/write-concern/
  2. http://docs.mongodb.org/manual/core/journaling/
  3. http://docs.mongodb.org/manual/tutorial/manage-journaling/
  4. http://docs.mongodb.org/manual/reference/configuration-options/#journalCommitInterval
  5. http://docs.mongodb.org/manual/core/journaling/
  6. http://docs.mongodb.org/manual/core/write-concern/#write-concern
  7. http://rsmith.co/2012/11/05/mongodb-gotchas-and-how-to-avoid-them/
  8. https://groups.google.com/forum/#!msg/mongoid/Bybccfc0gLI/Aei0VkMAZqoJ

Mongodb – not evil, just misunderstood

Lately I’ve been reading a lot about Mongodb and posts dissuading you from ever using it. Some of these articles are seriously outrageous and make me wonder what got the team to actually start using Mongodb in the first place. Sarah Mei’s recent article was one such that upset me a lot, especially since the title was so inflammatory.

My post however, aims at highlighting the areas where Mongodb works and how it performed brilliantly for us. As someone leading the engineering efforts for a shipping and logistics company I wasn’t too happy initially to see Mongodb being used as the primary datastore but after 2 years I’m more than sure that this was definitely the datastore for us. I’ve outlined areas that confused me when i first encountered them only to learn that they were actually invaluable features that were available to me.

“No migrations” is that all you have?
The advantages of schemaless documents are priceless. Not having to migrate is just one of the perks. Our schemas were largely in the form of Orders (having many) Shipments (going_from) ShipPoint (to) ShipPoint

We rarely used most of these entities without the other and it just served us extremely well to manage them as self contained documents embedding the other.

Mongodb writes are fire and forget? WTF?
This doesn’t always have to be the case, though it significantly contributes to Mongodb’s fast writes. Mongodb’s write concerns configurations allow you to configure the precise level of persistence that needs to be achieved to call it a successful write. So if the write fails you know its failed. The fact that you could know if your write has migrated to replicas or has been journaled is a pretty neat feature.

How can the default writes be fire and forget?
(Version – 2.4.8 changes this however this is valid till version 2.2.6)
It just made sense, given all the information to configure it the way you prefer I would always go with this approach. We add a lot of Notes to each shipment as it gets reviewed at different levels by the sales, accounts and other teams. These notes generally serve as a reminder or a single line indicating that its been viewed – though it doesn’t critically affect the business workflows of the application. Its just seemed logical that these were fire and forget operations and could be stored as quickly as possible.

Another place where this is extremely handy for us is during Tracking. We track several hundreds of shipments each day logging every tracking status, location and time the shipment has been, while in transit. This information is handy for customers to keep an eye on where their shipment has reached. Chances are when fetching this information some of the information is not saved the first time – but we expect that it would be obtained during a second fetch 30 minutes later. The default write concern works brilliantly then.

Read locks and write locks – don’t they slow you down
They do but since most of the stuff is memory mapped this doesn’t affect you in a major way. However, I did notice people always working the primary of a replicaSet and never querying the secondaries for fear of inconsistent data. I think if you have sufficient memory your replication lag would be pretty small and besides if you don’t need the data to be consistent every instant querying secondary is a sensible option to reduce the load off your primary. Which brings me to the PrimaryPreferred Read preference. This allows you to query a secondary in your replica set when your primary is not available. It’s a fairly safe choice in my opinion.

We began querying secondaries for ShipPoints which didn’t change that often.

All the memory usage is killing me!
This is one of the things I that took me time to accept. Mongodb expects that your working set fits into RAM along with the indexes for your database. Your working set is the data that is frequently queried and updated. Since mongodb works with memory maps most of your working set data is mapped to the memory. When this data is not available in memory a page fault occurs and your data needs to be fetched from disk. This results in a performance penalty but as long as you have some swap space you can safely load the data back in.

While our working set was fairly small our reporting application needed access to the entire shipment records to generate reports. This resulted in Mongo running out of memory and spitting OperationFailure errors on a regular basis.

Our initial approach was naive and we started using Redis(which is another datastore thats pure gold.) to store snapshots of information but soon realised we could just use Mongodb to make it work.

So can I never generate reports without having my dataset fit in memory?
Rollups to the rescue. Rollups are pre-aggregated statistical information that help you speed up your aggregation process. This makes life significantly easier as you query for short time ranges to generate micro-reports.

Here is a simplified snapshot on how we generated daily and monthly aggregates with mapreduce.

So you mean this can’t be realtime?
Yes it can – through atomic updates. Just like we generated rollups to speed up reporting we can generate pre-aggregated snapshots of aggregated information like this.

Once this is in place you can update your aggregates by simply incrementing the right counter with something like

I haven’t even touched upon the replication and sharding features that Mongodb offers which I will reserve for another post. To summarise I feel Mongodb is awesome and is a lot like the kid in class who you dismissed because your friends thought he was weird – till you got to know him.

Disclaimer: I don’t claim to be an authority on Mongodb and everything that I have written about is stuff that I’ve learnt while working with Mongodb. I recommend reading the documentation and going through the talks available on the Mongodb website.

Simple Quick SSH Tunneling to expose your web app

I’ve been a localtunnel user for quite some time now and I really love the fact that its a free service, quick to install and easy to expose your development app to the world. There are quite a few worthy alternatives now with showoff.io and pagekite that pretty much do the same thing. 

But at times it gets annoying (especially when in the middle of other work) that I’m unable to access localtunnel because its down or I outran the free usage limit for Pagekite

I generally end up using localtunnel when I have to wait for an IPN from Paypal or Authorize.net (relay response) while working on my development environment. So here is a quick way to roll out a basic version of the service for your own needs.

Before we move to the “how to” here is a quick intro to ssh tunneling

Now, I assume that you have a staging server (or some server you have ssh access to).

The following terminal command does pretty much the same thing we do with the Ruby code that follows:

The -R indicates that you’ll be forwarding your local app running on port 3000 to the remote port 9999 on the remote host abc.example.com so that everyone can access it. Now your application running on localhost:3000 is accessible at abc.example.com:9999

We do the same thing now using the ruby net-ssh library. The following code snippet is customised to my defaults but its simple enough to change those settings.

Hope this helps

Mongoid – Comparing attributes belonging to the same document

I spent an insane amount of time trying to figure out how to elegantly compare attributes belonging to the same document and stumbled upon a few stackoverflow questions that helped. However the solution still doesn’t appear elegant enough but does the job.

scope :expired, any_of({:completed_date.ne => nil}, 
                       {:expiry_date.lte => Date.today.to_time}, 
                       {:usage_count => {"$gte" => "this.max_usage_count"}})

Certainly not noteworthy but this post is to help me document this bit.

Goal

To build a named scope to allow me to query expired documents. The model has the following fields


	field :code, :type => String
	field :allowed_domain, :type => Integer, :default => DOMAIN_ALL_SITES
	field :description, :type => String
	field :discount_type, :type => Integer, :default => PERCENTAGE_DISCOUNT
	field :dollar_discount_cents, :type => Integer, :default => 0 
	field :percentage_discount, :type => Integer, :default => 0
	field :max_usage_count, :type => Integer, :default => 1
	field :usage_count, :type => Integer, :default => 0
	field :expiry_date, :type => DateTime, :default => (DateTime.now + 1.week)
	field :completed_date, :type => DateTime	

	index :code
	index :expiry_date
	index :allowed_domain

The precondition for an expired record is that its completed_date should be nil , the expiry_date should be less than the present date and usage_count must be less than the max_usage_count

While the above block of code does that, what I hate is the mixing JSON and Ruby. While mongoid for most purposes does a good job in making life easier (as every ORM should), there are occasions where I’d prefer it let me write pure Mongodb queries or use the Ruby wrappers entirely.

Other Resources
http://stackoverflow.com/questions/3795044/mongoid-named-scope-comparing-two-time-fields-in-the-same-document

Further bulletins as events warrant.

My Favorite Ted Talks

I’ve been going through some of my favorite Ted Talks, here’s the list.

1. Sir Ken Robinson on Creativity.

2. Simon Sinek, People don’t buy what you do, they buy why you do it

I love this talk just for that one line which just resonates with everything I believe in.

3. Dan Ariely

Lastly, to my favorite speaker Dan Ariely. I’ve been a big fan of his books and talks. I strongly recommend you read Predictably Irrational. Here are his talks

I can’t believe I missed this the first time around.

4. Derek Sivers – How to start a movement.

Async Responses in Thin and a Simple Long Polling Server

I’ve been playing with Thin, the Ruby web server which packages the awesomeness of Rack, Mongrel and Eventmachine (\m/). However, the thing that blew me away completely was the James Tucker’s Async Rack that was integrated to Thin. The combination opens a whole new world of realtime responses, something that I’ve been constantly switching to NodeJS and SocketIO for.

Async Rack

A simple rack application would look like this


# my_rack_app.ru

class MyRackApp
  def call(env)
    [200, {'Content-Type' => 'text/html'}, ["Hello World"]]
  end
end

run MyRackApp.new

# thin start -R my_rack_app.ru -D -p3000

This simple rack app on hitting port 3000 responds with the a status code (200), the content-type and some body. Nothing special. With async rack we’d be able to send the response head back while we build the response. Once the response is built we send the body and close the connection.

A quick glance at Patrick’s Blog (apologies for not being able to find the last name of the author of this blog) should give you an excellent understanding of what I’m trying to explain

A simple async app


# my_async_app.ru

AsyncResponse = [-1, {}, []].freeze

class MyAsyncApp
  def call(env)
    Thread.new do
      sleep(10)
      body = [200, {"Content-Type" => "text/html"}, ["<em > This is Async </em >"]]
      env['async.callback'].call(body)
    end

    AsyncResponse
  end
end


The AsyncResponse is a constant which returns a -1 status which tells the server that the response will be coming later asynchronously. It, as defined in the examples provided in thin an “Async Template” Async Template

On a request the code initially returns an AsyncResponse while the thread waits on the sleep request. Once the thread is active again it builds the response and sends the response keeping the connection alive.

An async app where we build the response and close the connection

From here on, it would really help to have the thin code open in your text editor. We would be looking at the request.rb, response.rb and the connection.rb methods from /thin folder.


require 'rubygems'

class DeferrableBody
	include EventMachine::Deferrable

	def each &blk
		puts "Blocks: #{blk.inspect}"
		@callback_blk = blk
	end

	def append(data)
		puts " -- appending data --"
		data.each do |data_chunk|
			puts " -- triggering callback --"
			puts @callback_blk.call(data_chunk)
		end
	end	
end

class AsyncLife
	AsyncResponse = [-1, {}, []].freeze
	
	def call(env)
		body = DeferrableBody.new

		EM.next_tick {
			puts "Next tick before async"
			sleep(3)
			puts "#{env['async.callback']}"
			env['async.callback'].call([200, {'Content-Type' => 'text/plain'}, body])
			sleep(5)
		}

		puts "-- activated 5 second timer --"
		EM.add_timer(5) {
			puts "--5 second timer done --"
			body.append(['Mary had a little lamb'])

			puts "-- activated 2 second timer --"
			EM.add_timer(2){
				puts "--7 second timer done --"
				body.append(['This is the house that jack built'])
				puts "-- succeed called -- "
				body.succeed
			}

		}


		AsyncResponse
	end
end

run AsyncLife.new

This example uses Eventmachine Deferrable which allow us to determine when we’re done building our response. But the tricky part I struggled to get my head around was the strangle looking each method and how it uses the append method to build our responses.

Walking through the code

When the request comes in from the browser the application renders the AsyncResponse constant which tells the server that the response body is going to be built over time. Now on request eventmachine’s ( checkout thin/connection.rb ) post_init method creates a @request and @response object. This triggers a post_process method which on the first call returns without setting the header, status or body and prepares for the asynchronous response.

On next_tick we begin to create our header. We initialize a EM::Deferrable object which is assigned as the response body and this ensures the header is sent ahead of the body (because we don’t have anything to iterate over in the each block where the response is sent) The env[‘async.callback’] is a closure for the method post_process which is created by method(:post_process) checkout the pre_process method in the thin/connection.rb

The each method in our Deferrable class overrides the default each implementation defined by the response object. So now our each block is saved in an instance variable @callback_blk which is call ed when we call the append method. So essentially we are calling send_data on each of the data blocks we’re sending back when we call the append method.

Once that’s done we call succeed which tells Eventmachine to trigger the callback to denote we’re done building the body. It ensures the request response connection is closed.

The default each implementation


 # Send the response      
      @response.each do |chunk|        
        trace { chunk }
        send_data chunk
      end

Thats pretty much what I gathered by going through the code on async response with thin and rack. Another useful module is the thin_async bit written by the creator of thin @macournoyer available here . This pretty much abstracts the trouble of overriding the each block.

Here’s an example of a simple long polling server I built using thin available here

Hope this is helpful