Member-only story

Designing Datalakes in AWS

Diwakar Dayalan

2 min readApr 20, 2021

This article is primarily to talk about datalake and how it could be implemented in AWS.

Journey of Datalake

At the high level you can categorize an environment into two regions

Application (Presentation Layer)
Data (Backend Layer)

In the data region there are multiple ways the data are stored, the journey started and traversed as below.

File System →hierarchical database →RBBMS →Datawarehouse

Then datawarehouse became hugely complicated and trouble to maintain.

DW → Data Mart → DataLake

Need for Datalake

Storing vast amount of data (structured / unstructured) into a structured DW/DM resulted in data loss. In short you can call this being data agnostic, it should be never be restricted to store only certain data formats.

To cater large scale & real time analytics and to cope up with the pace in which the new data getting ingested into DW/DM from new platform like IOT devices, mobile apps, there is a need to make the data layer schema agnostic.

Evolution of the architecture happened to be from DW → DM → DL

Where as the ideal data flow architecture should be in reverse order.

Data sources (DB/IOT/Mobile apps) → DL →DM →DW

In one line, the need of datalake could be defined as below

Designing Datalakes in AWS

Written by Diwakar Dayalan

No responses yet