What I Learned After One Year of Building a Data Platform From Scratch
These tactics interact. Sometimes the very act of merging multiple datasets adds substantial value. Joining data correctly is hard! Other non-glamorous ways to add value include quality control, labelling and mapping, deduping, provenancing, and imposing data hygiene.
Abraham Thomas • The Economics of Data Businesses
The ETL process for getting data out of transactional systems and into an EDW was always a bit burdensome, but with massive volumes of high-velocity data it becomes a real problem.
Thomas H. Davenport • Big Data at Work: Dispelling the Myths, Uncovering the Opportunities
The last core data stack tool is the orchestrator. It’s used quickly as a data orchestrator to model dependencies between tasks in complex heterogeneous cloud environments end-to-end. It is integrated with above-mentioned open data stack tools. They are especially effective if you have some glue code that needs to be run on a certain cadence, trigg... See more
Data Engineering • The Open Data Stack Distilled into Four Core Tools
