Lambda Architecture in Big Data

Instructor: David Gloag

David has over 40 years of industry experience in software development and information technology and a bachelor of computer science

Big Data is becoming a focus point for many companies these days. In this lesson, we'll take a look at Big Data, and an architecture proposed that could simplify working with it. At the end of the lesson, you should have a good understanding of these relevant topics.

Quenching Our Thirst

We have a thirst for knowledge. It's like our favorite confection, we feel an incessant need for it. As that longing continues to grow, it multiplies in depth and breadth. We want the 24-hour weather forecast the day before, we want our income tax refunds back the next day, and we want to know where an earthquake will occur as far in advance as possible. All of this requires data, a lot of data. In fact, in many instances, more data than we can handle by traditional means. It makes sense then that new methods must be created to handle these situations. Work is currently underway to address the idea of Big Data, and to create architectures like Lambda to work with it.

What is Big Data?

As the name suggests, Big Data is the area that deals with large information sets. By large, we mean information sets that can't be handled in the usual fashion. Typically, we would rely on packages like Microsoft Access and Excel, or similar offerings from other vendors, to perform the manipulations we need. But that is becoming more and more difficult as the size of information sets push boundaries. For example, just think about the amount of information the Internal Revenue Service (US) or Revenue Canada must process at tax time. The need for memory, processing power, and storage is constantly increasing, and will continue to do so in the foreseeable future.

What is the Lambda Architecture?

The Lambda Architecture is a generic template or model, created by Nathan Marz, which is meant to provide a way to think about Big Data and the associated applications. It has 4 main characteristics:

  • Fault Tolerant - recovers from both hardware and software failures.
  • Use-Case Support - allows use in many different ways.
  • Scalable - can easily add additional computing resources.
  • Easily Extended - new features and capabilities can be added with ease.

In terms of structure, the architecture consists of three layers: batch, serving, and speed, and two input sources: new data and queries. From an overview perspective, it looks like this:

The Lambda Architecture

Here is a description of each piece:

  • Batch Layer - contains the information set, is unchangeable except for adding information.
  • Serving Layer - provides quick access to the batch information through a series of indexes.
  • Speed Layer - caches often used information from the serving layer for quick retrieval.
  • New Data - input data that is meant to update, or augment the batch layer.
  • Queries - information requests from the users of the system.

To unlock this lesson you must be a Member.
Create your account

Register to view this lesson

Are you a student or a teacher?

Unlock Your Education

See for yourself why 30 million people use

Become a member and start learning now.
Become a Member  Back
What teachers are saying about
Try it risk-free for 30 days

Earning College Credit

Did you know… We have over 200 college courses that prepare you to earn credit by exam that is accepted by over 1,500 colleges and universities. You can test out of the first two years of college and save thousands off your degree. Anyone can earn credit-by-exam regardless of age or education level.

To learn more, visit our Earning Credit Page

Transferring credit to the school of your choice

Not sure what college you want to attend yet? has thousands of articles about every imaginable degree, area of study and career path that can help you find the school that's right for you.

Create an account to start this course today
Try it risk-free for 30 days!
Create an account