Sohan's Blog

Things I'm Learning

On ActiveRecord Query Enhancers

The question is, should we use the third-party ActiveRecord Query Enhancers like SearchLogic/Squeel/MetaSearch?

Quoting from Squeel’s Github README page,

1
2
3
4
5
6
7
Squeel lets you rewrite...

Article.where ['created_at >= ?', 2.weeks.ago]
...as...

Article.where{created_at >= 2.weeks.ago}
This is a good thing. If you don't agree, Squeel might not be for you.

At work, we are migrating a Rails 3.0 project into Rails 3.2. We used MetaSearch in the project quite extensively and now discussing if using Squeel (successor of MetaSearch for newer Rails versions) would be a good decision. I recognized there are good points on both sides of the debate and wanted to capture the list here for a reference.

You’d want to use such enhancers because:

  1. They extend the basic AR API to provide a DSL. For example, Squeel provides you an API so you can write the following: User.where{ country != "USA" && drives_truck == true } Instead of this Activ eRecord Query: User.where('country <> ? && drives_truck = ? ', 'USA', true)
  2. They write complex join statements, including outer joins and joining multiple tables, for you using a shorthand. e.g. User.where{company_name_eq 'Coders'}
  3. They support negative logic (not equal, not in) and OR SQL queries that would require raw String queries using ActiveRecord API.
  4. They provide fancy operations such as User.where{name_or_address_contains 'scott'} that would require some raw String when using AR directly.

You’d avoid using these enhancers because:

  1. You think that using String is just fine over using a Hash with hardcoded symbols anyway.
  2. As new versions of AR are released, there’s little guarantee the third-party API will still be compatible.
  3. You are concerned about adding another pile of abstractions and magic on top of ActiveRecord.

Please share if you prefer one over another and if you do, please let us know why.

MongoDB Is Abusing JSON!

I find the MongoDB API is abusing JSON in a really bad way. JSON is probably a good format for storing the documents in MongoDB, but using JSON for it’s weird API is simply a terrible idea. Here’s an example from the SQL to Aggregation Framework Mapping Chart

MongoDB Example
1
2
3
4
5
6
  db.orders.aggregate( [
     { $group: { _id: { cust_id: "$cust_id",
                        ord_date: "$ord_date" },
                 total: { $sum: "$price" } } },
     { $match: { total: { $gt: 250 } } }
  ] )

I find the UI of this query to be distasteful at the best. Here’s an SQL example of the same:

SQL Equivalent
1
2
3
4
5
6
  SELECT cust_id,
         ord_date,
         SUM(price) AS total
  FROM orders
  GROUP BY cust_id, ord_date
  HAVING total > 250

I find this SQL example to be a few magnitudes more readable than the JSON one. The JSON query in this example, is full of hacks. Every time I see a “$” sign, I’m totally confused about what it means. For example, consider this fragment, total: { $sum: “$price” } and compare it to SUM(price) AS total.

I think MongoDB has its strengths. But for haven’s sake, if they can’t find any better, they should leave this JSON ugliness in favor of SQL as a query interface to MongoDB. What do you think?

Hybrid Persistence

Often times the business requirements demand for database features that aren’t easily achievable using a single type of database. And now we have quite a few options to choose from, for example Key Value stores, Document databases, Relational databases etc, each providing some mutually exclusive features from the rest. So, it can be tempting to introduce multiple databases to rip the benefits of each.

But, it comes with a few gotchas that are worth knowing. Here’s a short list from my recent experience on a project:

  1. No simple way to join data from multiple databases without multiple round trips.
  2. Deployment is tricky as it adds more infrastructure.
  3. If data is duplicated, migrations need to address multiple sources.

Let’s explain this with an example. Say, we need to store data for an e-commerce site. Document stores seem to be good candidate for storing the product info, since different products have different data associated with it.

products.json
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
  {
    id: '4wqrqw4890ipoip',
    name: 'iPhone 5',
    color: 'Black',
    price: '699.99'
  }


  {
    id: '4wqrqw4890ipoiq',
    name: 'Fine Sheet',
    thread_count: 400,
    size: 'Queen',
    price: '699.99'
  }

We also need to store transactional data and Relational databases have been successfully used for transactions. So, a simple schema may look like this:

1
  Sales(id, store_id, product_id, quantity)

With a hybrid persistence approach like this, just beware that implementing simple features as follows will need more roundtrips:

  1. Show the invoice for sale with product name, quantity and price.
  2. Show all ‘Fine Sheets’ that are sold in the ‘Brentwood’ store.
  3. List all products sold today, sorted by name.

Hybrid persistence works fine for caching. But whenever your data is split into multiple sources in way that you’d have to combine the parts from each, it’s not gonna be fun time. Just so you know.

How Much to Validate?

Input validation is often required to safe guard against inappropriate use. For example, consider the following API:

TransfersController.rb
1
2
3
4
5
6
7
8
9
10
11
12
13
class TransfersController

  def create()
    from_account_id, to_account_id, amount = params[:from_account_id], params[:to_account_id], params[:amount]

    from_account = Account.find(from_account_id)
    to_account = Account.find(to_account_id)

    @transfer = Transfer.create!(from_account, to_account, amount)
    redirect_to @transfer
  end

end

In this case, the API is expecting a from_account_id which we know, can be easily exploited unless a validation is performed on the server. Say, a simple validation would do this:

TransfersController.rb
1
2
3
4
5
6
7
8
9
10
11
12
13
14
class TransfersController

  def create()
    #...
    if from_account.owner == current_user
      @transfer = Transfer.create!(from_account, to_account, amount)
      #...
    else
      render :unautorized
    end

  end

end

This is an absolutely necessary validation to be performed on the server.

However, the validation requirements can be a little relaxed at times. For example, if you know the only client of your API is also your own app, then you may not need to handle all possible validations both in the server and cliend side and come up with fancy errors. When its non-destructive, an invalid request can just be shown the general purpose error page, since this is only for bad users trying to hack the API calls.

For example, lets say we have the following code:

AppointmentsController.rb
1
2
3
4
5
6
7
8
9
class AppointmentsController
  def show
    user_id, date = params[:user_id], params[:date]

    user = User.find(user_id)

    @appointments = user.appointments.on(date)
  end
end

Say, by business logic, everyone can see the apointments for any user. Now, the above code will surely fail if any of user_id/date is not provided, or has a bad data. So, its possible to add the validation logic. But if I know I’m writing the only client that’s supposed to hit this API using a client side validated request, I’d just skip the input validation. This is not destructive anyway.

I think it’s a trade-off between paranoid programming and pragmatic design. As long as I’m providing expected behavior in the app without opening up a security hole, some validations may be skipped on purpose.

What Programming Language Should I Learn in 2013?

I like the idea of learning one new programming language every year. This is also mentioned at The Pragmatic Programmer Book. I’m now looking for a new language to learn.

In 2011, I learned F#, my first functional programming language experience. It’s a really nice language, especially if you’re familiar with the .Net class library and the windows development echo system in general. This was really fun, learned the basic concepts and building blocks of functional languages.

The direct impact of F# is, even if I’m not using F#, I’m consciously using the concepts in other languages. For example, Ruby comes with a handful of functional style methods in its Array/Enumerable API, and so does UnderscoreJS. Not just that, I’m always watching for opporunities to reduce side-effects, use delayed execution and pass around blocks when it makes sense. Learning F# has improved my Ruby/JavaScript/C# coding as a direct result. One disclaimer here, I’ve failed to grasp the true concepts of Category Theory and Monads, although I understand how to use these.

I also learned ObjectiveC in 2011 while working on a project with ThoughtWorks. This was fun, apart from the frustrations with the IDE. As a language, I actually liked it quite a lot, the way they handle dynamic message passing while still giving you compile errors looked really innovative to me. I believe there’s a lot of room for me to learn more about ObjectiveC and Cocoa framework in general. Please let me know if there’s a good open source project where I can see professional level ObjectiveC code.

2012 was my year of learning JavaScript. Don’t get me wrong, I’ve been writing JavaScript for some time now, but I think I really learned the language in 2012. Thanks to the influx of micro libraries, as well as their coolness, and seriousness. CoffeeScript made me immensely happy, too. Started writing a JavaScript book with Isa Goksu, too bad we didn’t get far with it :(

I’m yet to pick a new language for 2013. Any suggestion? Or, what are you learning in 2013?

The Myth of One Assert Per Test

TL;DR; It’s not one assert per test, rather one logical path per test.

I find this to be a classical example of how an inappropriate choice of terminology leads to huge confusion. In trying to find the original source of the “one assertion per test” quote, google came back only with a bunch of confused blog entries :(

Without much ranting, lets see a code example to start with:

example code
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
def eligible_credit_card_types(customer)

  annual_income = customer.annual_income

  if annual_income > 100_000
    [CreditCard.new(type: 'platinum', limit: 10_000),
     CreditCard.new(type: 'gold', limit: 8_000),
     CreditCard.new(type: 'cashback', limit: 5_000) ]

  elsif annual_income > 50_000
    [CreditCard.new(type: 'gold', limit: 6_000),
    CreditCard.new(type: 'cashback', limit: 3_000)]

  elsif annual_income > 30_000
    [CreditCard.new(type: 'cashback', limit: 1_500)]

  else
    []
  end

end

By saying one logical path per test, I mean I’d write a total of 4 tests for this method, each covering a logical path. But I really don’t care about how many assert calls you need in each logical path to express the desired behavior. For example, this is totally fine:

example test
1
2
3
4
5
6
7
8
9
10
11
context '#eligible_credit_card_types' do
  it 'returns platinum, gold and cashback for people making over a 100K annually' do
    #setup customer, call the method
    cards = credit_card_authorizer.eligible_credit_card_types(customer)

    cards.size.should == 3
    cards[0].type.should == 'platinum'
    cards[0].limit.should == 10_000
    #same for cards[1], cards[2]
  end
end

Of course, this can indeed be converted into a single assertion, if you already had equals method overriden for the credit cards. But, I’d probably skip adding that code if its just for the sake of writing tests.

Image taken from mchenrybowl

The mechnical thought of “one assertion per test” is lame.

  1. If for nothing else, these silly assertions would multipy the running time of your test suite by a factor of digits.
  2. Doesn’t provide you any additional coverage or safety.
  3. As long as each of your tests cover a unique logical path, there’s only one logical reason why it’d fail, irrespective of how many assertions you put in there.

But, there’s a but! If you need many asserts per test, the code is probably asking for some refactoring. It indicates that one logical path in your code is touching too many things. Worse, when it calls too many methods that belong to other objects. Testing such a logical path is hard, usually requires a big setup and a bigger assertion. In such cases, breaking down the test into multiple tests may yield some superficial readability of the test, but certainly works around the actual problem in the code without fixing it. Almost always, this indicates long procedural methods and I’d suggest taking a second look at it to refactor into a more OO code.

Happy Friday!

Readable Unit Tests

From my experience with writing/reading unit tests for years, here’s a little guideline to keep ‘em readable:

  1. A unit test needs to fit entirely (including setups) on a screen without scrolling.
  2. It is OK to have a little bit of duplication in unit tests for readability.
  3. Avoid nesting of contexts beyond 2 levels. Instead, use methods to setup and flatten.
  4. Do not use if/else/loops in a unit test.
  5. If your test needs too much setup/mocking/stubbing, time to refactor the code.

A simple readable unit test is only achievable when the code itself is simple. Adhereing to these guideline will probably make the code simpler as a direct impact!

Unit Testing and Sleep

If your unit test code needs a sleep, its time to refactor it. Ideally, you’d only need to stub/mock the asynchronous call instead of introducing arbitrary sleeps in the test code, because they will eventually fail. Moreover, until it fails, it will slow down your test runner.

Seed Data in a Rails Project

Most applications I’ve worked on required some kind of seed data. For example, I’ve used seed data to create a super admin user so the site can actually be used upon a fresh install. Some other common cases include things like, list of credit card types, names of countries/states etc. This post is about some ideas to manage thsese seed data in a Ruby on Rails project.

Where to put the seed data?

I think you have 2 choices to pick from.

  1. db/seeds.rb
  2. db/migrate/2342423424_some_migration.rb

I prefer having it in the seeds.rb whenever possible since this is the obvious place for it. But I’ve often put them in migrations as well. There are trade-offs associated with both.

Migrations only run once. This means, “double/miltiple seeding” is not a concern at all. So, you can write a migration with some seeding command as follows:

Migration CreateUsers
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
  def self.up

    create_table :users do |u|
      u.name
      u.email
      u.encrypted_password
      u.password_salt
    end

    create_super_admin_user
  end

  def self.create_super_admin_user

    User.create!(name: 'SuperAdmin', email: 'super@mysite.com', ...)

  end

This works great. It creates the super user and by virtue of migrations, it only ever creates one super admin.

The caveat with this approach is, if you change certain things later, then this migration will fail to run. For example, say we added a required field to user as follows:

User
1
2
3
  class User
    validates :age, presence: true
  end

Now, if you run the migration on a freshly checked out code, it will fail due to this validation. One workaround is to fix the code in this migration. But for people that already ran the migrations, and likely your production environment already did so, this has no effect. At this point, it starts getting complicated and some band-aid solutions are usually implemented.

On the other hand, if you are using db/seeds.rb for seed data, you’d have to make sure its idempotent. So, a rerun should not create a 2nd copy of the objects. As a start, your code could be like this:

db/seeds.rb
1
2
3
  if User.where(email: 'super@mysite.com').blank?
    User.create!(name: 'SuperAdmin', email: 'super@mysite.com', ...)
  end

With this in place, you can easily incorporate the change for the new validation as follows:

db/seeds.rb
1
2
3
4
5
6
  super_user = User.where(email: 'super@mysite.com').first
  if super_user && super_user.age.blank?
    super_user.update_attributes(age: 20)
  else
    User.create!(age: 20, name: 'SuperAdmin', email: 'super@mysite.com', ...)
  end

This would work for people who already had a super user and also for people that just got a fresh copy of the code.

So, I prefer putting data in seeds.rb with idempotency.

In addition to handling such cases, it also makes it easy for anyone to take glance at the seed data in one place.

How to handle large amount of seed data?

When you have more than a handful of data to seed, I’d recommend tidying up your db/seeds.rb into a more manageable structure. Here’s how I’d do it:

File organization
1
2
3
4
5
6
  |- app
  |- db
    - seeds.rb
    |- seeds
      - users.rb
      - credit_card_types.rb

With this structure in place, you can turn you seeds.rb into a simple manifesto file as follows:

seeds.rb
1
2
  require 'seeds/users'
  require 'seeds/credit_card_types'

This will help you keeping things organized and hopefully give you a happier solution to the seed data problem.