Big Data

Big Data | News, how-tos, features, reviews, and videos

Swedish red lakehouse
group programmers team workers collaboration

sparkler celebrate party new year

What is Apache Spark? The big data platform that crushed Hadoop

Fast, flexible, and developer-friendly, Apache Spark is the leading platform for large-scale SQL, batch processing, stream processing, and machine learning.

Planning / strategy / management  >  Nurturing growth / scale / expansion

AWS simplifies data management, analytics with new services

A major theme at re:Invent 2022 was Amazon's efforts to ease data management, as AWS announced new ETL capabilities and features for collaboration, searching and cataloging.

Healthcare, collaboration

Amazon Omics aims to optimize biological data analysis at scale

The bioinformatics service, made generally available at AWS re:Invent, is designed to help researchers and scientists store and accelerate analysis of genomic and other related biological data types for precision medicine.

Cyber space, digital lines, data grid

AWS Glue upgrades Spark engines, backs Ray framework

Serverless data integration service in the Amazon cloud also adds support for built-in Pandas APIs and the Apache Hudi, Apache Iceberg, and Delta Lake formats.

Planning / strategy / management  >  Nurturing growth / scale / expansion

Starburst Galaxy gets data discoverability updates

At AWS re:Invent 2022, the company also announced support for AWS Lake Formation via Starburst Enterprise suite to help joint customers implement data mesh architecture.

Data streams through a businessman's head. / mindset / analysis / strategy / skills / knowledge

When is enough data enough?

Maybe we don’t need more data, we just need people who understand the data we already have and its value in a business context.

Swedish red lakehouse

Dremio Cloud review: A fast and flexible data lakehouse on AWS

Dremio Cloud leaps big data in a single bound with a fast SQL engine and optimizations that can accelerate queries dramatically. Plus it lets you use other engines on the same data.

iceberg under water 135415219

Why Apache Iceberg will rule data in the cloud

Apache Iceberg is an open table format that offers scalability, usability, and performance advantages for very large data sets. Here are five reasons Iceberg is optimal for cloud data workloads.

Team members collaborate / discuss / communicate in a data center.

Databricks adds data governance, marketplace features

The data marketplace and other features are expected to accelerate data engineering tasks with an option for data monetization down the road, Databricks said.

programming / coding elements / lines of code / development / developers / teamwork

Databricks open sources its Delta Lake data lakehouse

Databricks is open sourcing Delta Lake to counter criticism from rivals and take on Apache Iceberg as well as data warehouse products from Snowflake, Starburst, Dremio, Google Cloud, AWS, Oracle and HPE.

piggy bank one dollar bills money savings

12 programming tricks to cut your cloud bill

Cutting cloud costs is a team effort, and that includes developers. Here are 12 tricks for developing software that is cheaper to run in the cloud.

neural network

What is TensorFlow? The machine learning library explained

TensorFlow is a Python-friendly open source library for numerical computation that makes machine learning and developing neural networks faster and easier.

cliff diving taking the plunge dive into a project ocean swimming by aydinmutlu getty 2400x1600

What is a data lake? Massively scalable storage for big data analytics

Dive into data lakes—what they are, how they're used, and how data lakes are both different and complementary to data warehouses.

ai artificial intelligence ml machine learning robot touch human hand

Where AI has made real progress

Better data infrastructure has provided a big boost to AI’s growth, but some things still require a human.

Africa  >  Senegal  >  Ziguinchor Bridge, Casamance River

Working with Azure Managed Instance for Cassandra

Use open-source tools to build big data systems that bridge on premises and cloud.

Machine learning megaguide: Amazon, Microsoft, Databricks, Google, HPE, IBM

Download InfoWorld's massive roundup of Amazon, Microsoft, Databricks, Google, HPE, and IBM machine learning toolkits

Public cloud megaguide: Amazon, Microsoft, Google, IBM, and Joyent compared

The top five public clouds pile on the services and options, while adding unique twists

Quick guide: Learn to crunch big data with R

Get started using the open source R programming language to do statistical computing and graphics on large data sets