August 2017

Why?

Whatever craft we are pursuing, we all need to build something to gain experience and mastery. Obviously, software development doesn’t escape this reality. Theory is important but using the knowledge in actual projects and solving problems enhance our comprehension of the subject.

Side projects always seemed amazing to me. Building something during my free time to solve a problem I have, or a problem others may have is a big reason why I started to code. The more I learn about programming, the more I realise its powers.

Like many developers, I started a lot of side projects. Every time, the pattern was the same. I was excited for a few days, perhaps a week. Then, I started to encounter more difficult problems, or I realised the project would take a very look time or life happened and I stopped working on it for a few days…

My Github account is a graveyard of potential projects that I just couldn’t bring to life. Not mature, fully developed, married with two kids in a big mansion kind of life. A gigantic proportion of these projects didn’t even go past the embryon stage. Every mistake is a lesson. Which brings me to this post.

A possible solution

One of the problems was that I did my thing alone. Nobody knew about it. And I totally dismissed the power of accountability. You will know about it. I’ll show you my progress, every week. I’ll just need enough discipline to write a blog post about it once a week, during the week-end. I’ll mention the problems I faced, solutions I chose, and how I managed ( or not ) to spend time on the side project I am working on.

My rules

A lot of people came up with similar “frameworks” for learning and side projects. Here are the rules that I chose. These are totally arbitrary:

  • Choose a project. No languages, libraries or frameworks restrictions.
  • Spend one month on this project
  • Just build a MVP ( Minimum Viable Product ). Don’t get drown into details. Get the big picture
  • Code will be on GitHub. Every blog post will link to the repository.
  • After a month, deploy the application, whatever the state is. From there, either choose to keep working on it for another month, or switch to a new project
  • If the project is over before a month, start a new month with a new project
  • Report your progress, problems and failures each week in a blog post

One month seems to be long enough with the time I have to create something decent. One week would be too short. If the project is going well and I’m enjoying it, I could spend another month on it, keeping the same rules, but updating the requirements of the project. Reporting my progress will probably help putting things in order, and not going in several directions at once.

First Side Project

For the first project, I chose to do an application related to the Blizzard video game HearthStone. Players build decks with cards based on the Warcraft world. The application should allow players to track their records and their progress with their different decks:

  • Users can access a catalog with all the cards and build decks.
  • Users can export their decks to the game.
  • Users can add the outcome of their games.
  • Users can see how their decks perform against other classes.
  • Users can login with Twitter or GitHub.

That should be more than enough to get me started.Let me know what you think and I’ll see you in a week!

Read more

Introduction

Data analysis is an important part of many applications. MongoDB provides several tools to run analytics natively using the aggregation framework.

The aggregation framework is based on the concept of pipeline. The concept is that you take input from a MongoDB collection and the documents from this collection go through a series of stages. Each stage performs a operation on its inputs. Each stage’s input is the output of the precedent stage. All inputs and outputs are documents.

If you are familiar with Javascript, the concept is the same when you chain methods. You may have encountered something like this:

const myData = myArray.map( (item, i) => {
    //do something special
}).filter( (item, i) => {
    //We filter in this stage
}).reduce( (item, i) => {
    //I need to reduce in this stage
})

In the example above, we have three stages. A map(), a filter() and a reduce(). The first stage ( map ) takes myArray as input. The output of this method becomes the input of my next method ( filter ). So on and so forth, until I finally don’t need to transform my data and have what I need.

But in MongoDB, every input and output will be documents. At the end, we have access to the output, the same way we would after a query.

So, in short, the aggregation framework works with pipelines. Pipelines work with a MongoDB collection. Pipelines are composed of stages. Each stage executes a data processing task on its input and produces documents as its output. This output is passed to the next stage.

So, now we now what it is, how do we use it?

Getting started with familiar operations

We will use a restaurants collection provided by MongoDB. You can download it here. Save it to a file.

Next, you need to import data into the collection. We will use mongoimport to achieve this. Open a terminal window and enter this command:

mongoimport --db test --collection restaurants --drop --file PATH/TO/JSON/FILE

Make sure you have a mongod instance running already. This command will import our data in the test database, inside the restaurants collection. Feel free to rename those if you want.

We have our data in place. We will start with some basic operations like the match, project, sort, skip and limit stages.

How our documents are organized

In our restaurants collection, documents have the following fields:

  • _id field ( mandatory in all MongoDB documents)
  • name field ( String )
  • cuisine ( String )
  • stars ( Integer )
  • address ( embedded document with the following fields:
    • street ( String )
    • city ( String )
    • state ( String )
    • zipcode ( String )
      )

$match

Let’s start with a filter to look for all restaurants with a Russian cuisine:

>db.restaurants.aggregate([
    { $match: { cuisine: 'Russian' }}
])

{ "_id" : ObjectId("5977aaed3abbae8aef525fdc"), "name" : "Abe Fisher", "cuisine" : "Russian", "stars" : 1.2, "address" : { "street" : "828 Jaduv Boulevard", "city" : "Hiburu", "state" : "TX", "zipcode" : "00004" } }

{ "_id" : ObjectId("5977aaef3abbae8aef53e606"), "name" : "Abe Fisher", "cuisine" : "Russian", "stars" : 2.4, "address" : { "street" : "463 Zujew Street", "city" : "Peamazut", "state" : "NH", "zipcode" : "00304" } }

{ "_id" : ObjectId("5977aadc3abbae8aef4715f1"), "name" : "Abe Fisher", "cuisine" : "Russian", "stars" : 4.5, "address" : { "street" : "105 Buwza Heights", "city" : "Asovikufu", "state" : "MA", "zipcode" : "00488" } }

{ "_id" : ObjectId("5977aada3abbae8aef45bf79"), "name" : "Abe Fisher", "cuisine" : "Russian", "stars" : 3.5, "address" : { "street" : "1774 Emcek Key", "city" : "Mossofnat", "state" : "NY", "zipcode" : "02312" } }

{ "_id" : ObjectId("5977aadd3abbae8aef47bf64"), "name" : "Abe Fisher", "cuisine" : "Russian", "stars" : 0.4, "address" : { "street" : "1172 Kohis Road", "city" : "Jihnahac", "state" : "MS", "zipcode" : "02585" } }

{ "_id" : ObjectId("5977aae33abbae8aef4b8cbf"), "name" : "Abe Fisher", "cuisine" : "Russian", "stars" : 4.2, "address" : { "street" : "1582 Ogujab Pass", "city" : "Vihjuani", "state" : "MN", "zipcode" : "03000" } }

{ "_id" : ObjectId("5977aadc3abbae8aef46e0e3"), "name" : "Abe Fisher", "cuisine" : "Russian", "stars" : 3.6, "address" : { "street" : "1076 Cugvu Glen", "city" : "Ittupa", "state" : "NH", "zipcode" : "03726" } }

This is the equivalent of running this query :

db.restaurants.find({cuisine: 'Russian'})

You can notice that we use the aggregate method. This is the method used to run an aggregation query. In order to aggregate, we need to pass an aggregation pipeline. A pipeline is an array of documents. Each document specify what process we want our data to go through.

Let’s add a second stage in our aggregation query.

$project

The $project stage allows us to control the output. We can tell to MongoDB which fields we would like to see or not.

db.restaurants.aggregate([
    { $match: { cuisine: 'Russian'},
    { $project: {
        _id: 0,
        name: 1,
        stars: 1
    }}
    }
])

{ "name" : "Abe Fisher", "stars" : 1.2 }
{ "name" : "Abe Fisher", "stars" : 2.4 }
{ "name" : "Abe Fisher", "stars" : 4.5 }
{ "name" : "Abe Fisher", "stars" : 3.5 }
{ "name" : "Abe Fisher", "stars" : 0.4 }
{ "name" : "Abe Fisher", "stars" : 4.2 }
{ "name" : "Abe Fisher", "stars" : 3.6 }

...

My second stage tells MongoDB to output only two fields, name and stars. I also explicitly excluded _id. By default, the _id field will be displayed is you don’t set it to 0.

The $match stage filters against the collection and passes the resulting documents to the $project stage one at a time. $project performs its operation by reshaping the fields accordingly and passes the output out of the pipeline and back to me.

Let’s add an additional stage.

$limit

The $limit stage will limit the number of results returned by the pipeline. Like so:

> db.restaurants.aggregate([
    { $match: {cuisine: 'Russian'}},
    { $limit: 3 },
    { $project: {_id: 0, name: 1, stars: 1}} 
    ])

{ "name" : "Abe Fisher", "stars" : 1.2 }
{ "name" : "Abe Fisher", "stars" : 2.4 }
{ "name" : "Abe Fisher", "stars" : 4.5 }

With $limit, I only return the first three documents.

Note: I put the $limit stage before the $project stage. If I put the limit stage last, it would mean that the project stage would have to process data on documents that I would not return anyway. By putting my stages in this order, the project stage only process three documents.

Some more examples

Here are a few random examples of tasks that you can accomplish:

  • In this example, I want to rename the stars field to rating. I also want to have the restaurant’s name in uppercase and I want the documents to be sorted by ratings.
> db.restaurants.aggregate([ 
    {$project: {restaurant_name: {$toUpper: "$name"}, _id: 0, ratings: "$stars"}}, 
    { $sort: { ratings : -1}}, 
    { $limit: 5 } ])

{ "restaurant_name" : "MODERN LOVE", "ratings" : 5 }
{ "restaurant_name" : "LAUBERGE CHEZ FRANCOIS", "ratings" : 5 }
{ "restaurant_name" : "DAI DUE", "ratings" : 5 }
{ "restaurant_name" : "FRANKLINVILLE INN", "ratings" : 5 }
{ "restaurant_name" : "LAUREL", "ratings" : 5 }

Two fields returned: restaurant_name in uppercase and ratings in descending order. I used the $toUpper operator to transform the names. The $project stage created a new field called restaurant_name and gave it the uppercase value of name. Same for ratings with the field stars. Notice that we prefix the value with a $ to tell MongoDB that we are talking about another field’s value.

Finally, let’s use the $group stage. In this example, we will group the restaurants by their cuisine types. Every time we encounter a certain cuisine, we will increase a counter by 1. At the end, we will know the most frequent cuisine types in our collection:

> db.restaurants.aggregate([ 
    { $project: { _id: 0, cuisine: 1}}, 
    { $group: { _id: "$cuisine", count: { $sum: 1}} }, 
    { $sort: { count: -1}}, 
    { $limit: 5} ])

{ "_id" : "American", "count" : 6183 }
{ "_id" : "Chinese", "count" : 2418 }
{ "_id" : "Café/Coffee/Tea", "count" : 1214 }
{ "_id" : "Pizza", "count" : 1163 }
{ "_id" : "Italian", "count" : 1069 }

In the project stage, we only keep what we need for our next stages, the cuisine field. Then, we group these results in documents where the _id field will be the particular cuisine and the count field will calculate the number of times we encounter that cuisine.

Next, we sort those results to have the highest count first and I limit the results to 5. In our collection, we mostly have American cuisine ( whatever that is ) with 6183 documents then Chinese with 2418 documents…

This is a rather quick introduction of what the aggregation framework allows you to do. There is of course a lot more possibilities. If you are curious about it, you can read more on the MongoDB docs

Feel free to share and comment.

Have a nice day!

Read more