Skip to main content

Posts

Reading List

Blogs I have recently read which are worth bookmarking for future reference. Dropbox sync engine rewrite How Query Optimizer works ? Encodings Grep Commands Erasure Coding  - Save storage space and increase fault tolerance. How Netflix Uses Kinesis Streams to Monitor Applications and Analyze Billions of Traffic Flows jsoup - JAVA HTML Parser Logging with Python  - A great blog from DataDog regarding best practices for logging with Python. Lambda with CloudWatch Logs Insights - Leverage Log Insights to deep dive on logs using a SQL like syntax. Primitive Data Types C# Vs Java
Recent posts

Code Review - Best Practices

Code review is a great learning and knowledge sharing tool not only for the new members of the team but for the long time company veterans as well. Having a code review process in place dramatically improves the code quality and helps detect bugs in an early stage. Much of what I wanted to write here has already been captured in this great post form smartbear. Another great blog on sa However, here are the important bits:   Author: Keep the CR small.  For bigger change, break down the review in meaningful chunks. Add TODO comments for future CR. Keep you changes under feature flag or in a separate feature branch to facilitate smaller incremental changes which are not ready to be released in production. Check this engineering blog from google on small CL . Provide a detailed context of the change. I prefer documenting context in commit message and will recommend to follow similar format from Linux project [ link ] Provide details of tests performed to verif

Book Review - The Phoenix Project

Here at Skybox Labs, we do regular lunch and learn session where a fellow colleague present on topics ranging from clean code, continuous integration, game development, machine learning to almost any areas where there are reasonable interests. One of the very recent lunch and learn series that I have attended was focusing on DevOps which got me interested to learn more about the topic. I was looking for recommended books and the lead presenter highly recommended that I start with 'The Phoenix Project' by Gene Kim Being inspired by the stellar reviews in Amazon, I have decided to get a copy and read it over the weekend. A fantastic book that contains a wealth of information and delivers it in an intelligent and interesting way; a story. The book successfully captures the events and struggles of most people who work in IT Operations and gives a very good explanation on why these problems exist, and how you can solve them. It portrays a very effective way of thinking in

Why using XOR might not be a good hash code implementation?

Using XOR for computing hash codes works great for most of the cases specially when order of computation does not matter. It also has the following benefits: XOR has the best bit shuffling properties of all bit-operations and provides better distributions of hash values. It is a quick single cycle operation in most computer  Order of computation does not matter. i.e. a^b = b^a However, if ordering of elements matter then it is often not a good choice. Example For simplicity consider you have a class with two string properties named Prop1 and Prop2  and your GetHashCode returns the xor of their hash code. It will work fine for most of the cases except cases where same values are assigned to different properties. It will generate same hash-code i.e. collision in that case as can be seen in the below example . However, using the modified approach as recommenced by Joshua Bloch's in Effective Java which uses prime multiplication and hash chaining provides more unif

Interesting bug - Line endings and Hash Code

I recently came across an interesting bug which emphasize how different line endings format can break your custom equality implementation if you do not carefully consider them. Context We have an application that periodically updates the local assets with latest updated resources. In a nutshell,  it makes an web api call to get the latest set of metadata and compare them against a locally stored metadata file. If they differs then we update the locally stored metadata file and download new/updated resources. Bug For a particular asset, associated metadata file was always getting updated although there were no visible changes detected using the revision history. Investigation My obvious suspect was the code responsible for doing the equality check between local metadata and the metadata received from the Web API. For verification, I setup a conditional break-point which will be hit when the equality returns false. After my debug hit the break-point, I looked into all the

SQL Performance improvement for User defined table types

Recently, I have dealt with an interesting performance issue with one of my SQL query and thought I will share the experience here. Context: We had a legacy stored procedure responsible for saving large amount of excel row data to our database tables. It was using   User Defined Table Types as one of the parameter to get a list of row data from excel. However, the stored procedure was taking very long time to save the large data set. Root Cause: After quite a bit of investigation using execution plan in SSMS, I was able to narrow down the performance issue to the following: Joining with User defined table type was taking >90 percent of the time A custom hash function which has been used multiple times as a join criteria was also quite expensive to compute. After doing additional research using stack overflow , I was able to figure out that the primary reason for the poor performance doing a  JOIN on Table Valued parameters is that : it does not keep statistics and a