Sohan's Blog

Things I'm Learning

MicroOptimization Trap

Yesterday, couple of my friends and I were discussing about a CoffeeScript topic. In short, given the following code in CoffeeScript:

1
2
3
class Cat
  type: -> 'cat'
  meow: -> 'meow meow mee..ow'

The compiler produces this in JavaScript:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
var Cat;

Cat = (function() {

  function Cat() {}

  Cat.prototype.type = function() {
    return 'cat';
  };

  Cat.prototype.meow = function() {
    return 'meow meow mee..ow';
  };

  return Cat;

})();

As you see here, the prototype method definitions use the prefix Cat.prototype repeatedly. Which could be compressed using closure as follows:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
var Cat;

Cat = (function() {

  function Cat() {}

  var p = Cat.prototype;

  p.type = function() {
    return 'cat';
  };

  p.meow = function() {
    return 'meow meow mee..ow';
  };

  return Cat;

})();

This closure form is for sure less bloated than the one produced by CoffeeScript. And on IE7, each Cat.prototype lookup takes about a microsecond, so, the closure form quite literally yields a micro-second-optimization! When one zooms into such micro-optimizations, other bigger fishes are easily missed.

For example, even though the output is a little bloated from the CoffeeScript compiler, the CS source code is rediculously simpler/shorter even for this tiny example. But that’s not the only reason why one would use CS. If for nothing else, CS produces lint JavaScript for free. I’m not saying one has to use CoffeeScript, but this serves as an example scenario. When making a decision, a tunnel-vision into the micro-optimization may as well be a trap.

Unconventional Conventions

JavaScript has been reborn, thanks to jQuery to begin this epic and wonderful march. These days, we have a huge influx of micro to macro frameworks, all providing some well curated set of features. Every now and then I try to get a glimpse of the new and hot stuff so when time comes, I can make an informed decision on what to use and why.

This post is about my recent attempt at learning AngularJS and EmberJS. Client-side MVC and/or single page apps need some core features as follows:

  1. URL routing
  2. Rendering views
  3. Read/write data from/to the backend server

Both AngularJS and EmberJS offer these and some additional features, most notably, two-way data binding, so a change in the JavaScript object reflects on the rendered view and vice versa. But, to be honest, I think they overstretched the framework N miles to the north of where it should be. Let me explain with an example:

Conventions are good when they offer an easy mental model. For example using the following router mapping:

1
2
3
App.Router.map(function() {
  this.route('favorites');
});

it would match favorites with a App.FavoritesRoute based on the names. It’s an easy enough pattern to recognize and remember. However, I find it’s too far fetched in the following example:

1
2
3
4
5
6
App.FavoritesRoute = Ember.Route.extend({
  model: function() {
    // the model is an Array of all of the posts
    return App.Post.find();
  }
});

My mental model is challenged in a few ways here:

  1. Route has a model hook - which, by conventional usage, I can’t seem to recognize as a pattern.
  2. App.Post.find() entails an Ember.ArrayController, one provided by the framework. Here, a router is producing a Controller in disguise from the model hook.

I’m sure with repeated usage, I’d be able to use these conventions without much of a problem. But on a pure API design/Architecture perspective, I think it’d make more sense to remove such unconventional conventions.

To be honest, I’ve attempted the EmberJS guides for the 3rd time now, in less than a month, and I feel lost in so many conventions. I have had similar experience with learning AngularJS as well.

With AngularJS, I was first a bit skeptical about all the custom tags that you’d introduce when using AngularJS. But after giving it a go, it sort of made sense as a trade-off between less code and manageable clutters. But I got really uncomfortable with the likes of the following:

1
2
3
4
5
<ul class="phones">
  <li ng-repeat="phone in phones | filter:query">

  </li>
</ul>

While it offers a snappy filtering experience and some declarative code, it feels too far stretched again. The convention around $scope, and a few other oddities as you’ll see in this EggHead.io $scope vs scope tutorial can be quite a stress on the brain.

To conclude, I personally like both frameworks and think they offer some great features over their counterparts, but it’d make sense in a later version to pick conventions where it’s truly conventional over an imposed one with lots of foreign concepts.

Call Me Sohan and Ask-Me-Not About S M

I have heard this question n times, where n tends to the number of days since June 2006, as I started working with the people from North America:

Should I call you S M?

Well, I’m from Bangladesh and even today, we don’t really have a first and last name, instead what we have is, a full name and a nick name. My full name is, S M Sohan, and my nick name is Sohan. So, call me Sohan.

I don’t have a first or last name like many people in Bangladesh. But since almost everywhere, from the online forms to the call center agents, they ask me about these info, I tell them, my first name is, S M, and last name is, Sohan. At times I get funny reactions to this, for example:

S M? Just S M?

Better yet, some offer me spelling suggestions:

You can spell it ESEM, because SM is hard to pronounce!

You people are funny! Well, S M is actually a short form for Sheikh Mohammed. But it’s so common in Bangladesh that, almost nobody uses the full version in their official documents. As a bonus, you save all those inks and bytes required to spell these extra characters. Honestly, of all these years, this is probably just the 4th time I’ve written the full form, the other 3 times were for US visa applications. Also, it’d be equally weird to me, had I used Sheikh as my first name and people called me by that!

This first/last name convention also causes funny consequences to my friends. For example, two of my friends have names differing only in their middle names and they were rommies at a shared residence in Calgary. Guess what, their credit reports got all confused about who’s who and mixed up the scores!

Lot of my muslim friends go by “Md Xxxx Yyyy”, where Md is a short form for Mohammed. People that aren’t too familiar with it, at times think of them as doctors, which can be quite funny, and dangerous!

If you aren’t convinced yet, take these ones for example. AAMS Arefin Siddique current VC of the University of Dhaka, A. K. Fazlul Huq a renowned leader in the history, A P J Abdul Kalam an ex-president and a nuclear scientist from India.

I hope this makes you comfortable in calling me by the name Sohan :)

Implementation Challenges With a Multi-Tenant/SaaS Database

climbing

This 2006 MSDN article points out some key aspects of designing a multi-tenant database for SaaS applications. As you can read in the article, SaaS databases need to pick one of the following three configurations:

  1. separate databases
  2. shared database, separate schema and
  3. shared database, shared schema.

A number of factors including economic, security, skillset, etc. contribute to the selection of the best suitable configuration. In this post, from my experience, I’m sharing the following practical requirements that introduce additional implementation challenges:

  1. Each account needs to have a maximum allowed space on the database (economic).
  2. Data from one account should never be accessible to other accounts (security).
  3. However, for backend usage, we need the ability to run queries across all accounts.

Size limiting is quite hard. It almost forces the use of separate database/schema per account. Even then, most databases of today don’t have a clean mechanism to exert such a hard limit.

Separate databases reduce the chance of cross-account data leaks. But backend tasks suffer for this. For example, your monthly billing processor needs to generate bill for all accounts. With one database/account, it cannot do one simple query to a single database anymore.

Also, most ORM libraries don’t support separate databases for a single type. For example, to fetch the orders from the database, the ORM library needs to connect to database A for account X, but to database B for account Y and so on. At this point, if possible, you’ll need to tweak the ORMs a lot or fall back to your own ORM, which as I wrote in the past, is almost never a good idea.

Connection pooling is another challenge. It’s generally a good practice to use connection pooling, to save the overhead of establishing a connection before every query. With separate databases, and hundreds, if not thousands, of accounts being served from an app server, the connection pool would either have too many or too few connections in it to be useful.

I don’t know about a clean architecture that’d address these requirements while not introducing the dev challenges. Please comment if you’ve any suggestion.

Interesting User Interaction Data on Dashboard Widgets

At SourceFire, among other things, we build a cloud based Advanced Malware Protection software. As security products go, we collect a huge amount of data about events that happen on the user devices so we can identify and potentially quarantine malicuous events. Security in itself is a big specialized domain, but this post is about a dashboard that we serve as the main point of entry into the big data that we collect from enterprises and consumers.

We were debating about what data to put on the dashboard. The target is to offer a very high level, but an actionable view of the whole enterprise data. In the end we settled down with a list of 7 types of data and decided to use a layout similar to the following.

Layout

I was looking at the metrics this morning to see if our prioritization of the layout tiles made sense to the users and found that

the number of clicks on each widget is highly correlated to its relative order in the layout.

I am not sure how to interpret this result. It may mean one of:

  1. The ordering of the tiles matches the actual priority of our users.
  2. Irrespective of the contents, users are more likely to click on data in this order.
  3. The results are totally random and it’s more of a coincidence over a correlation.

At this point, I feel we need to shuffle the tiles to see if the position has a strong bias on the results. When I get a chance to do this, will post again with my findings.

MvcMailer and Open Source Happiness

MvcMailer makes me happy as a developer.

I find an immense amount of happiness everytime I check the Nuget page of MvcMailer. It’s so refreshing to see the download count going up (42,427 total downloads as I write), receiving feedback, praises and even the complains.

MvcMailer Tweets

The idea of MvcMailer came from trying to bring some of the amazing Ruby on Rails ActionMailer features to ASP.NET MVC developers. It was well accepted since the very beginning and I hope is still helping people. As a side-effect, MvcMailer gives me an opportunity to keep in touch with the developments in ASP.NET MVC, C#, the .NET framework and the community around these. During our Sunday coffee and code seessions, I usually pair up with Tyler Mercier, and MvcMailer is often the project we’d work on, or at least talk about. It also gives me a code to share on my github profile, which I expect a good future employer to care about :)

Yesterday, we met for Coffee and Code and released a new version to better work with .NET 4.5. This has been the most requested feature in the last couple months and required a small code change. After we released it, I drove home in about 20 minutes. Checked the counts, and it had already seen 4 downloads. This felt just amazing. Now after about 20 hours, it feels even better to see 100+ downloads of the new release. It feels so good.

Open source is truly rewarding, both to its contributors and to its consumers.

Happy open sourcing.

Solution Architecting Using Queues?

When building a bunch of applications that need to interact with each other, Queues or Message Oriented Middleware services offer some very useful features. To name a few, you get features like a) Guaranteed message delivery, b) routing, c) throttling, etc., for free. Tools like ActiveMQ, RabbitMQ, MSMQ, JMS servers are tuned and time tested to handle such requirements robustly. I’ve had good experience with these tools for the happy paths. However, when dealing with unhappy paths, irrespective of the tools you use, it can be quite challenging to design the failovers.

Here’s a list of the things to consider when designing a solution using Queues:

Speed of Consumption

Queues are essentially a producer/consumer or pub/sub system. This means a slower consumption rate results into a pileup of messages. In a high transaction system, millions of records can pile up in the matter of hours - forcing the producer to slow down or messaages being dropped. While architecting a solution, the rates should be measured and tested. Load tests are very useful to avoid such production issues. These issues are generally hard to fix once rolled out.

Error Paths

Even when a consumer is keeping up with the producer, chances are some messages cannnot be processed due to errors. If it’s a network error or some other error external to the system, a simple redelivery from the queue may fix it. But in real life, chances are, this is an application defect. So, a simple retry won’t make any difference. The application defect needs to be fixed and then, the message needs to be replayed. The solution needs to be designed considering the effect/resolution of the error paths.

Automated Monitoring

Automated monitoring is crucial to any Queue in the production system. Without monitoring, the queue may be flooded with unconsumed messages.

Human Intervention

When a message cannot be processed by a consumer, chances are, a human intervention will be required to complete/undo the intended operation. For example, if your payment processing application sends out a message on complete, and the shipping processor fails to process it - a human interaction may be required to fulfill the shipping or undo the order. When designing a solution, such edge cases should be identified.

Deployments

Typically queues are good in terms of holding the unconsumed messages for disconnected consumers. However, if a consumer has significant downtime between deployments, this can stretch the limits of the queue.

I’m sure this is not all. These are from my real-life experience only. If you have experiences to share, please use the comments below.

Random Data in Tests

… are evil if you expect the randomizer to give you unique values on each call.

Here’s an example:

1
2
3
4
5
6
7
before(:each) do
  @user_1 = User.create!(email: "email-#{Random.rand(1000)}@example.com")

  # the following will fail intermittently if you have a unique validation on User#email

  @user_2 = User.create!(email: "email-#{Random.rand(1000)}@example.com")
end

It’s a timebomb, use it, it will haunt you later. Be sane, stop using random numbers in tests like this.

Practical API Design Challenges for Client Side MVC

This following list is probably a subset of a bigger one, but I’ve composed it based on my experience using BackboneJS + Rails on the current project at work. So, here goes the list:

Designing Granularity of the API

Using RESTful API, it makes sense to have separate API methods to repesent each type of resource. However, to construct a meaninful UI for the end user, often time we need to represent multiple resources on the same screen. Let’s explain this with an example requirement:

As a user I want to see the emails using a saved custom filter

In this case, we would want the UI to show the description of the selected filter, and also the emails matching the filter. Conceptually, CustomFilter and Email are two resources or models, having distinct properties and behaviors. It’d make sense to have two separate API’s as follows:

Filter API: GET http://fancy_domain.com/filters/emails_from_boss_about_release

1
2
3
4
5
6
{
  id: 10,
  filter_name: emails_from_boss_about_release,
  subject_including: 'release',
  from: 'boss@the_company.com'
}

Email API: GET http://fancy_domain.com/filters/10/emails

1
2
3
4
5
6
7
8
9
[
  {
    subject: 'Release note',
    from: "boss@the_company.com",
    to: "sohan@the_company.com",
    body: "<html>.. </html>",
    sent_at: "20130102040511"
  }
]

Although this granularity makes good sense as an API, it starts causing issues for typical client side MVC use case.

Unless the data from these two API methods are combined on the server side, the client must make two requests.

In a real world use, I’ve seen it’s more common for the UI to require multiple resources than not. Since, we want our users to get faster response, we try to create API’s that suit the UI’s need. So, APIs start getting bloated payloads to match the UI’s need. This typically results into something like the following:

Filter Email API: GET http://fancy_domain.com/filters/emails_from_boss_about_release?include_emails=true

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
{
  id: 10,
  filter_name: 'emails_from_boss_about_release',
  subject_including: 'release',
  from: 'boss@the_company.com',
  emails:
  [
    {
      subject: 'Release note',
      from: 'boss@the_company.com',
      to: 'sohan@the_company.com',
      body: '<html>.. </html>',
      sent_at: 20130102040511
    }
  ]
}

This is an example with just two resources. I’ve seen UI’s that need more than two resources, and its quite common as well.

Short/Medium/Long form of Models/Resources

The view usually renders a subset of all possible attributes associated with a model.

For example, when rendering the WeatherForecast model, in the list view only the minimum and maximum temperature for the day with a short summary of Sunny/Cloudy/Rainy/Snowy are shown. However, when you want to see more, you’d want to show the hourly variation of the weather with additional data of interest.

This requires the RESTful API to prepare multiple respresentations of a resource to suit the specific UI needs. This can complicate the logic on both the server and client sides, since the code must accommodate different representations for the same model.

Are ActiveRecord Callbacks Any Good?

Wanted to share my thoughts on ActiveRecord Callbacks. I’d like to know your thoughts if you disagree. Please use the comments.

  • Only use them when the behavior is must have under all situations, including your tests. For example, we know email addresses are case-insensitive. So no matter what, we may always want to store a lower cased version in the db.
user.rb
1
2
3
4
5
6
7
8
9
10
#Good
class User < ActiveRecord::Base

  before_save :downcase_email

  def downcase_email
    email.downcase!
  end

end
  • Never use them otherwise. A couple of classic examples that I consider as bad:
user.rb
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
#Bad: Sends email, probably not required under all situations such as when creating via migrations, tests etc.
class User < ActiveRecord::Base

  after_create :welcome

  def welcome
    WelcomeMailer.welcome(self).deliver
  end

end

#Bad: Interacts with external components
class User < ActiveRecord::Base

  after_create :setup_orientation

  def setup_orientation
    OrientationMessageQueue.enque(self)
  end

end
  • A common problem with the callbacks is, there are some handy methods that skip the callbacks. For example, consider the following:
User.rb
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
#Careful

class User < ActiveRecord::Base

  before_save :update_coordinates

  def update_coordinates
    self.coordinates = Geo.find_coordinates(zip_code)
  end

end

#works fine
sohan = User.find_by_name('Sohan')
sohan.zip_code = '33333'
sohan.save!

#doesn't fire the callback
User.update_all({zip_code: '33333'}, {zip_code: '55555'})

If you’re using the callbacks, I’d emphasize again, only use when you are absolutely sure the desired behavior applies under all contexts.