Building a Big Data Architecture that Scales

Nessim Btesh
3 min readMay 5, 2017

A few months ago I started building an AI Marketing Analytics tool at my new job. One of the premises that comes with building an AI is that you need to store a lot of data to train the models. In my case the data that I needed could not be aggregated, so I needed to store millions and millions of records and they needed to be accesible almost real time.

Building a Big Data infrastructure is expensive. Given than we are a bootstrapped startup I could not built my ideal infrastructure which combines a mix of Hadoop, Spark, and ElasticSearch. Running that infrastructure has a huge cost associated with it. I needed an infrastructure that I could have up and running in a couple of weeks without much maintenance work. Whoever has worked with Hadoop understand that managing a highly distributed Hadoop system is hard and takes a lot of time to keep it up and running. My second option was using something like Mongodb for a micro beta, but lets face it that wasn't going last more than a week. The good news is that we live in the 21st century and AWS exists. Using AWS services is great because I don’t have to worry much about server maintenance.

My final architecture of choice looks like this:

The reason I choose this infrastructure is because it scales, and it scales really fast. I don’t have to worry much about scalability…

--

--