Blog

Home / Resources / Blog Post

What is a Data Lake

0 Comments

Written by Teknita Team



August 19, 2022



James Dixon described the data lake:

If you think of a data mart as a store of bottled water—cleansed and packaged and structured for easy consumption—the data lake is a large body of water in a more natural state. The contents of the data lake stream in from a source to fill the lake, and various users of the lake can come to examine, dive in, or take samples.

A data lake is essentially a single data repository that holds all your data until it is ready for analysis, or possibly only the data that doesn’t fit into your data warehouse. Typically, a data lake stores data in its native file format, but the data may be transformed to another format to make analysis more efficient. The goal of having a data lake is to extract business or other analytic value from the data.

Data lakes can host binary data, such as images and video, unstructured data, such as PDF documents, and semi-structured data, such as CSV and JSON files, as well as structured data, typically from relational databases. Structured data is more useful for analysis, but semi-structured data can easily be imported into a structured form. Unstructured data can often be converted to structured data using intelligent automation.

Data lake vs data warehouse

The major differences between data lakes and data warehouses:

Data sources: Typical sources of data for data lakes include log files, data from click-streams, social media posts, and data from internet connected devices. Data warehouses typically store data extracted from transactional databases, line-of-business applications, and operational databases for analysis.
Schema strategy: The database schema for a data lakes is usually applied at analysis time, which is called schema-on-read. The database schema for enterprise data warehouses is usually designed prior to the creation of the data store and applied to the data as it is imported. This is called schema-on-write.
Storage infrastructure: Data warehouses often have significant amounts of expensive RAM and SSD disks in order to provide query results quickly. Data lakes often use cheap spinning disks on clusters of commodity computers. Both data warehouses and data lakes use massively parallel processing (MPP) to speed up SQL queries.
Raw vs curated data: The data in a data warehouse is supposed to be curated to the point where the data warehouse can be treated as the “single source of truth” for an organization. Data in a data lake may or may not be curated: data lakes typically start with raw data, which can later be filtered and transformed for analysis.
Who uses it: Data warehouse users are usually business analysts. Data lake users are more often data scientists or data engineers, at least initially. Business analysts get access to the data once it has been curated.
Type of analytics: Typical analysis for data warehouses includes business intelligence, batch reporting, and visualizations. For data lakes, typical analysis includes machine learning, predictive analytics, data discovery, and data profiling.

You can read more about Data Lake here .

Teknita has the expert resources to support all your technology initiatives.
We are always happy to hear from you.

Click here to connect with our experts!

0 Comments

Streamline Contracts & Strengthen Supplier Relationships with ECM

Mar 5, 2025

Managing contracts and supplier relationships in the metals industry can be overwhelming. With multiple suppliers, complex agreements, and strict regulations, a small mistake can lead to costly disruptions. Enterprise Content Management (ECM) solutions provide a...

Revolutionize Your Dealership Operations with Digital Efficiency

Mar 4, 2025

Managing piles of paperwork slows down dealerships. From sales agreements to service records, outdated manual processes create bottlenecks, increase errors, and reduce customer satisfaction. 📂 Content Services provide a game-changing solution, allowing dealerships to...

Managing Large Volumes of Technical Documentation Through ECM

Feb 27, 2025

Technical documentation is essential for businesses across various industries, from engineering and manufacturing to software development and healthcare. However, managing massive volumes of documents can quickly become overwhelming without the right tools. Enterprise...

Stay Up to Date With The Latest News & Updates

Join Our Newsletter

Keep up to date with the latest industry news.

Blog

Written by Teknita Team

August 19, 2022

0 Comments

Related Articles

Streamline Contracts & Strengthen Supplier Relationships with ECM

Revolutionize Your Dealership Operations with Digital Efficiency

Managing Large Volumes of Technical Documentation Through ECM

Stay Up to Date With The Latest News & Updates

Join Our Newsletter

Follow Us

Teknita LLC