Organizing and analyzing massive stores of unstructured data can be a daunting challenge. Questions arise of how to manage this data; How much will a solution cost? Where do we store it? How do we efficiently analyze it? Will our relational databases be able to effectively sort and query this data?

The Big Data Challenge

Our client, a leader in the Transportation and Logistics domain, was facing this big data predicament. Combined, their trucks travel roughly 8 million miles per day to deliver their cargo.  The Client needed a method to effectively analyze truck travel patterns to gain an understanding on a myriad of issues including how many “empty miles” were accrued on routes and subsequently make adjustments for more efficient deliveries. Utilizing their in-house logistics tracking software, the Client had been temporarily storing log files for analyzing and debugging issues related to the optimizer’s “selection process”. Due to the massive amount of data being pushed into these files, they were only retaining this data for a short duration. Additionally, since the data was unstructured, developers would have to manually extract, parse, and search the data every time they needed to perform an analysis.

Big Data Business Case

A solution was needed to add structure to these data logs, provide the ability to run ad-hoc queries when issues occurred and perform analytics against the data to improve trucking route efficiency. A traditional relational database system would be too resource-intensive due to volume and velocity. The Client needed a big data solution.
Requirements:

  • Processing and storage of high volume/velocity data
  • Ability to ad-hoc query and run analytics against data
  • Sustainable data indexing and organization
  • Data visualization for business users

Aptude Consulting Big Data Solution – The Hadoop Implementation

infographic

After obtaining information through our discovery and requirements gathering process, we architected a big data solution utilizing Hadoop in conjunction with a combination of other key open-source components to harness its full potential. In doing so, we created the MapReduced architecture illustrated below.

Our solution pre-processes and prepares the data to be consumed, creating a “solution” and “problem” file. These files are then aggregated and distributed: log files where sent to Solr for indexing and “solution” data to HDFS. Data is then passed into a sink to process and load it into a Hadoop component, which is then distributed to Solr Cloud and HDFS, respectively. The end result is structured data availability in multiple formats, with flexibility for low latency queries provided through Cloudera Impala and data visualization with OBIEE connectivity.

End Results

With minimal hardware resources and a collection of open-source software requiring no licensing fees, we realized the Client’s big data solution at a fraction of the cost a traditional relational database solution would have required. The Hadoop implementation resulted in cost and time savings, with an additional benefit from the boost in productivity they will achieve with their new analytical assets.

Conclusion

Could your organization benefit from an open source big data solution with Hadoop? If you handle large data sets that require analytical insights, then look no further. Aptude brings 14 years of IT consulting to the table, with an expertise in big data implementations and both Microsoft and Oracle business intelligence solutions.