Sohan's Blog

Things I'm Learning

The Year 2014

2013 was an amazing year in many ways for my personal life. To recap, the top story of 2013 was of course the birth of our son, Shopoth. And then, we bought our first house, paid off our car-loan and ended the year with a much needed vacation, that too to Bangladesh where we spent some quality time with the families.

2013 was my year of learning Haskell. I can’t claim to be a seasoned Haskell programmer yet. But I think I learned a lot of new concepts that you only see once you’re left in the unchartered territory of a functional programming language. So, in 2014, Haskell still remains a the language I want to get better at.

I’d like to change a few things in 2014 in terms of my career. It’s been almost 8 years since I’ve been developing web apps on the job. It’s been exciting and I feel overwhelmed when I look back at how far the industry has matured over the years. However, at the same time, in 2014, I’d like to focus on my soft-skills, especially on my negotiation skills. The target is to practice negotiation before I need to negotiate with others, so I’m well prepared to share my opinion about a topic. And afterwards, do a retrospective to find out room for improvement.

In 2013, I also got a PhD admission at the University of Calgary. So far, I’ve enrolled into only a single course. But the target is to publish one paper on API evolution by the end of this summer.

2014 is the first year of my thirties. I guess, I should start checking off items from my bucket list from now on. Still debating which one to target in 2014: a) learn to fly an airplane, b) start my own side project. Will keep you posted when I’ve made my mind. Happy 2014 till then.

Oh, before I conclude, here’s some stats about my open source contribution in 2013:

  1. 40K+ downloads of MvcMailer
  2. 400+ downloads of TextHelper
  3. 1.5K+ downloads of streamy_csv
  4. 17K+ visits to my blog

In 2014, I’d like to continue my open source contribution, preferably to some well established projects.

Configure Me Not

Configuration in software provides a method to build systems that can adapt to different configurations. For example, if a website’s language and date/currency formats are configurable, then it can be configured to support multiple languages and regional formats. Configuration makes it possible to deliver such features without needing a log of change in the application source code.

However, this notion of flexibility that configuration provides can be a trap at times. I’ve a definition of configurable as follows:

A configurable must have at least two configurations.

This is another way of saying YAGNI. But I find this to be more specific than YAGNI, as it quantifies and makes it apparent.

Here are a few examples to illustrate my definition.

  1. Custom interfaces with a single implementation.

    Interfaces are often times thought as a configurable component, as a new implementation can be used in place of an old one without changing the code that uses it.

    Except, if your interface only ever have one implementation, this provides a false notion of flexibility. In practice, I’ve seen for most custom interfaces, a new implementation almost always needs a change in the original interface which doesn’t really make it configurable anymore.

  2. Default arguments in methods that are never passed a non-default value.

    Default arguments are great, as they often times simplify the common case. However, if a method with a default argument is never called with a non-default value, it’s simply not worth using a default argument. Use a local variable instead.

  3. Configuration key value pairs where there’s only one value.

    Since magic numbers and hardcoded strings are bad, it’s tempting to use the configuration file to hold such values. However, if there’s only one such value, it’s probably a constant and not a configurable object.

  4. Exhaustively validating method parameters against all possible but unused values.

    If you’re writing a method that’s only gonna be called from another method in your project, you probably know what you’re passing to the method. Validating for different negative inputs to such methods provide a sense of robustness without really adding any value to it.

Hoping, the definition makes sense. Would love to hear your opinion and examples of configure me not.

Introducing Asset Pipeline to Older Apps?

Introducing Asset Pipeline to an old project is quite hard. Most pre-asset pipeline projects used small JavaScript/CSS files that are often scoped to a single page or a part of the application. A typical example is as follows:

login.js
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
$(function(){

  $('#login').on('click', function(){
    var isLoginValid = hasValue($('#user_name')) && hasValue($('#user_password'));

    if(isLoginValid){
      $('#login_errors').hide();
      $('#login_form').submit();
    } else {
      $('#login_errors').show();
      return false;
    }
  });

});

Now, within the scope of the login page this code executes just fine. However, with asset pipeline, if this file is included in the application manifest, then all pages that include the manifest will execute this code on load. This is wasteful and more importantly, may result in unexpected behaviors and conflicts.

To work around this problem, when introducing asset pipeline, the code needs to be wrapped in some method that can be called to initialize it only from the login page. Here’s an example of the wrapper method:

login.js
1
2
3
4
5
6
7
8
9
10
11
12
13
14
  App.validateLogin = function(){

    $('#login').on('click', function(){
      var isLoginValid = hasValue($('#user_name')) && hasValue($('#user_password'));

      if(isLoginValid){
        $('#login_errors').hide();
        $('#login_form').submit();
      } else {
        $('#login_errors').show();
        return false;
      }
    });
  };

Now that the logic is wrapped inside a method, it can be included in all the pages without causing any wasteful execution and risking unexpected outcomes or conflicts. This method can be called from within the login page as shown in the following example:

login.html.erb
1
2
3
4
5
<script type="text/javascript">
  $(function(){
    App.validateLogin();
  });
</script>

This is of course only a minimum change approach that’ll get asset pipelines working for an existing app. I’d recommend refactoring the code to make it testable and adding unit tests as you go.

We have a 4 year old Ruby on Rails project, and now running 3.2 with asset pipelines. It has only one manifest file. We used this simple approach to convert all existing js code and it worked great. Hope it helps when you start upgrading your assets to use the pipelines.

Career Choices: Should I Take This Job?

I get intimidated by this question, “Should I take this job?”

Growing up, my dad had a huge influence on my upbringing. He had a government job and it was the only job he ever had (retired in 2012). However, my generation Y seems somewhat lucky to be able to pick and choose a job, once every couple years or so. And every time I feel like it’s time to start looking around for that next job, I get intimidated by this question.

I think this is because, I put a lot of passion into the work I do. I also feel proud of my work. So, when it gets boring, and learning opportunity gets flat, it rings that bell - “what’s out there?”

We get comfortable around the people at work, build relationships and know the businesses inside out. The thought of leaving the comfort zone intimidates me. At the same time, a new work, working with new people and just a lot of learning opportunity gets me excited. So, everytime I need to solve this trade-off, it becomes a huge mental load for me. Last time, when I left ThoughtWorks, the mental load was paramount, and here’s how I made up my mind:

I focused on the worst case and best case outcomes. For me, the worst case was if the new job I was taking would leave me jobless, right when Shopoth, my son, was due to be born. I was pretty confident that wasn’t gonna happen until something really bad happened. The next in line was, if work and life at the new place turns out to be too bad that I’d have to find a new job right after. If it indeed happened, I timeboxed it to at least see it for 8 months, when our new born baby would be 4 months old and we’d possibly get somewhat settled. With these two items checked, I felt somewhat comfortable with the decision to move.

The best case for me was the opportunity to lead some innovations while making good income. Both seemed to be in place. So, I made the decision. It worked out pretty well till now. I hope the Best/Worst case outcome thing helps you in making your decision.

Will You Put That Cell Phone Away?

I know it’s your phone and you can do anything you want. But may be it’s worth holding back at times. There’s more to life than our screens.

I’m leaving it off the hand, off the table and off the bed, in silent mode, all day long. I’m asking for permission to use it if I have to while I’m with others. Instead of spending the time with the screens, it’s been very rewarding to play with my little son, with all the attention on him. My little guy sure senses the attention and you can tell it just from looking at his eyes.

Algorithms Need Better UI

Since 2006, as I became a professional software engineer, I spent a fair amount of time reading books and online articles about software. Most topics were around Object Orientation, Web technologies, Software design, Testing, Automation, Agile and Lean, etc. Last year, I decided to take a break from such topics on purpose. So, I cleared all my RSS subscriptions.

Instead, I thought I would revisit some of the algorithms that I learned during my undergraduate courses. It’s been a few years since I studied algorithms and I thought I was more seasoned to appreciate and solve some of the algorithm problems than in the past. So, I started with the dynamic programming problems such as: Longest Common Subsequence (LCS) and Knapsack Problem and found this solution in Wikipedia:

Source code for finding the lenght of the Longest Common Subsequence
1
2
3
4
5
6
7
8
9
10
11
12
13
function LCSLength(X[1..m], Y[1..n])
    C = array(0..m, 0..n)
    for i := 0..m
       C[i,0] = 0
    for j := 0..n
       C[0,j] = 0
    for i := 1..m
        for j := 1..n
            if X[i] = Y[j]
                C[i,j] := C[i-1,j-1] + 1
            else
                C[i,j] := max(C[i,j-1], C[i-1,j])
    return C[m,n]

As a software developer, when I’m reading an article, if there is an accompanied source code, my eyes automatically scroll into it skipping any textual blurb. It was the same in this case, and I must say, I noticed a few things here:

  1. The single literal variable names need some love.
  2. The code can use some logical grouping and naming to easily communicate what’s happening here.

I found this to be an UI problem. The algorithm itself is quite complicated for an average person to understand. However, we can probably reduce some noise with better naming/grouping. Here’s an alternate version of the same code.

Modified source code, with descriptive names
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
function LCSLength(sequence1[1..sequence1Size], sequence2[1..sequence2Size])

    table = GetTableWithZerosInFirstRowAndColumn(sequence1Size, sequence2Size)

    for sequence1Index := 1..sequence1Size
        for sequence2Index := 1..sequence2Size

            if sequence1[sequence1Index] = sequence2[sequence2Index]
              IncrementLength(table, sequence1Index, sequence2Index)
            else
              UseCurrentLength(table, sequence1Index, sequence2Index)

    return table[sequence1Size, sequence2Size]

function GetTableWithZerosInFirstRowAndColumn(columns, rows)
  table = array(0..column, 0..rows)

  InitializeFirstRowWithZeros(table)
  InitializeFirstColumnWithZeros(table)

  return table

function InitializeFirstRowWithZeros(table[columns x rows])
  for columnIndex := 0..columns
       table[columnIndex, 0] = 0

function InitializeFirstColumnWithZeros(table[columns x rows])
  for rowIndex := 1..rows
       table[rowIndex, 0] = 0

function IncrementLength(table, columnIndex, rowIndex)
    table[columnIndex,rowIndex] := table[columnIndex-1,rowIndex-1] + 1

function UseCurrentLength(table, columnIndex, rowIndex)
    leftCell = table[columnIndex-1,rowIndex]
    topCell = table[columnIndex,rowIndex-1]
    table[columnIndex,rowIndex] := max(leftCell, topCell)

I know this modified code is verbose, but I find it self explanatory. Using descriptive names are not a new concept in software engineering at all. I wish we had our algorithm books with annotated source code like this, where it’s readable by humans.

Let’s use uglifiers and minifiers to do the machinification for us.

AngularJS Is Very Productive, and Cool Too!

It has a very steep learning curve, but yields a superb productivity boost once you’ve learned it. Check out my demo of the wizard that we’ll discuss next.

AngularJS works by extending HTML to produce declarative UI code and eliminating the need for a lot of boilerplate code. For example, the mental model of a wizard can be expressed using the following HTML:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
<wizard title="Flight Search">

  <step title="Search">

  </step>


  <step title="Select a flight">

  </step>


  <step title="Select a return flight">

  </step>


  <step title="Checkout">

  </step>

  <step title="Confirm purchase">

  </step>

  <step title="Receipt">

  </step>

</wizard>

With AngularJS, one can write exactly this markup with the help of two custom directives, widget and step. This declarative UI code makes it very easy to read. In addition to this, the two way data binding capabilities of AngularJS makes it very productive as we don’t need to write a bunch of references to the DOM nodes and render the nodes as the data changes. For a working example, check the source code of the demo and if you are like me, you’ll love to see how simple it is.

Released Streamy_csv Gem

Following the previous post, I decided to spin off a little ruby gem for you folks. Get streamy_csv, and write only your application code while it’ll do the boilerplate work for you.

In a nutshell, with this gem in your application, all you need to do is this:

1
2
3
4
5
6
7
8
9
10
11
12
Class ExportsController

  def index

    stream_csv('data.csv', MyModel.header_row) do |rows|
      MyModel.find_each do |my_model|
        rows << my_model.to_csv_row
      end
    end

  end
end

Find more at https://github.com/smsohan/streamy_csv

Generating and Streaming Potentially Large CSV Files Using Ruby on Rails

Most applications I’ve worked on at some point required that ‘Export’ feature so people would be able to play with the data using the familiar Excel interface. I’m sharing some code here from a recent work that did the following:

Generate a CSV file for download with up to 100,000 rows in it. Since the contents of the file depends on some dynamic parameters, and the underlying data is changing all the time, the file must be generated live. Generating a large file takes time and the load balancer will drop the connection if it takes more than 1 minute. In fact, as a consumer I myself would be frustrated had it took even 1 minute to see something happening. This problem natually requires a streaming solution.

For a familiar example, let’s say we are downloading a CSV file containing transactions on an online store for the accounting folks. Lets say the URL is as follows:

http://transactions.com/transactions.csv?start=2013-01-01&end=2013-04-30&type=CreditCard&min_amount=400

So, this would download a file containing the transactions from January to April of 2013, where a CreditCard was used for a purchase over $400. Here goes the code example with inline comments describing interesting parts.

app/models/transaction.rb
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
class Transaction
  belongs_to :store
  attr_accessible :time, :amount

  def self.csv_header
    #Using ruby's built-in CSV::Row class
    #true - means its a header
    CSV::Row.new([:time, :store, :amount], ['Time', 'Store', 'Amount'], true)
  end

  def to_csv_row
    CSV::Row.new(title: title, store: store.name, amount: amount)
  end

  def self.find_in_batches(filters, batch_size, &block)
    #find_each will batch the results instead of getting all in one go
    where(filters).find_each(batch_size: batch_size) do |transaction|
      yield transaction
    end
  end

end

Given this Transaction model, the controller can call the methods and set appropriate http headers to stream the rows as they are generated instead of waiting for the whole file to be generated. Here’s the example controller code:

app/controllers/transactions_controller.rb
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
class TransactionsController

  def index

    respond_to do |format|

      format.csv render_csv

    end

  end

  private

  def render_csv
    set_file_headers
    set_streaming_headers

    response.status = 200

    #setting the body to an enumerator, rails will iterate this enumerator
    self.response_body = csv_lines(filters)
  end


  def set_file_headers
    file_name = "transactions.csv"
    headers["Content-Type"] = "text/csv"
    headers["Content-disposition"] = "attachment; filename=\"#{file_name}\""
  end


  def set_streaming_headers
    #nginx doc: Setting this to "no" will allow unbuffered responses suitable for Comet and HTTP streaming applications
    headers['X-Accel-Buffering'] = 'no'

    headers["Cache-Control"] ||= "no-cache"
    headers.delete("Content-Length")
  end

  def csv_lines

    Enumerator.new do |y|
      y << Transaction.csv_header.to_s

      #ideally you'd validate the params, skipping here for brevity
      Transaction.find_in_batches(params){ |transaction| y << transaction.to_csv_row.to_s }
    end

  end

end

As you see in this example, it’s pretty straight forward once you put the pieces together. These streaming headers work under most servers including Passenger, Unicorn, etc. but webrick doesn’t support streaming responses. It took me some time to figure out the headers and the enumerator thing, but since then it’s working beautifully for us. Hope it will help someone with a similar need.

Simplicity and Client-Side MVC

After spending about 6 months on this new project using BackboneJS, and spending some hours learning AngularJS and EmberJS, my realization at this point is:

Use Client-Side MVC very Selectively.

Sometimes on a single page of your app, you need to offer a lot of interactions, each scoped to a small part of the page only. In such cases Client-Side MVC offers some neat features. I’ll try to share my perspective with some concrete examples where I’d say yes/no to Client-Side MVC.

  1. Build a Calendar page - Yes.
  2. Build a Master/Detail view - No.
  3. Build a Credit Card Payment Form - No.
  4. Build a Story Wall like Trello - Yes.
  5. Build an Airport Departures/Arrivals display - No.
  6. Build a Search form - No.

As you see here, I suggest using it only when a lot of Client-Side interactions can happen, with little server side data requests.