From the Blogosphere
Data Warehouse and Data Lake | @ExpoDX @Schmarzo #BigData #Analytics #DataLake #AI #ArtificialIntelligence
Maybe the best way to understand today’s role of the data warehouse is with a bit of history
By: William Schmarzo
May. 15, 2019 05:00 AM
This blog was written with the thoughtful assistance of David Leibowitz, Dell EMC Director of Business Intelligence, Analytics & Big Data
So data warehousing may not be cool anymore, you say? It’s yesterday’s technology (or 1990’s technology if you’re as old as me) that served yesterday’s business needs. And while it’s true that recent big data and data science technologies, architectures and methodologies seems to have rendered data warehousing to the back burner, it is entirely false that there is not a critical role for the data warehouse and Business Intelligence in digitally transformed organizations.
Maybe the best way to understand today’s role of the data warehouse is with a bit of history. And please excuse us if we take a bit of liberty with history (since we were there for most of this!).
Phase 1: The Data Warehouse Era
Figure 1: The Data Warehouse Era
The data warehouse served as a central integration point; collecting, cleansing and aggregating a variety of data sources from AS/400, relational and file based (such as EDI). For the first time, data from supply chain, warehouse management, AP/AR, HR, point of sale was available in a “single version of the truth.”
Using extraction-transform-load (ETL) processing wasn’t always quick, and could require a degree of technical gymnastics to bring together all of these disparate data sources. At one point, the “enterprise service bus” entered the playing field to lighten the load on ETL maintenance, but routines quickly went from proprietary data sources, to proprietary (and sometimes arcane) middleware business logic code (anyone remember Monk?).
The data warehouse supported reports and interactive dashboards that enabled business management to have a full grasp on the state of the business. That said, report authoring was static and not really enabled for democratizing data. Typically, the nascent concept of self-service BI was limited to cloning a subset of the data warehouse to smaller data marts, and extracts to Excel for business analysis purposes. This proliferation of additional data silos created reporting environments that were out of sync (remember the heated sales meetings where teams couldn’t agree as to which report figures were correct?) and the analysis paralysis caused by spreadmarts meant that more time was spent working the data rather than driving insight. But we all dealt with it, as it was agreed that some information (no matter the effort it took to acquire) was more important that no data.
Phase 2: Optimize the Data Warehouse
Then Hadoop was born out of the ultra-cool and hip labs of Yahoo. Hadoop provided a low-cost data management platform that leveraged commodity hardware and open sources software that was an estimated to be 20x to 100x cheaper than proprietary data warehouses.
Man soon realized the financial and operational benefits afforded by a commodity-based, natively parallel, open source Hadoop platform to provide an Operational Data Store (now that’s really going old school!) to off-load those nasty Extract Load and Transform (ETL) processes off the expensive data warehouse (see Figure 2).
Figure 2: Optimize the Data Warehouse
The Hadoop-based Operational Data Store was deemed very good as it helped IT Man to decrease spending on the data warehouse (guess not so good if you were a vendor of those proprietary data warehouse solutions…and you know who you are T-man!). Since it’s estimated that ETL consumes 60% to 90% of the data warehouse processing cycles, and since some vendors licensed their products based upon those cycles – this concept of “ETL Offload” could provide substantial cost reductions. So in an environment limited by Service Level Agreements (because outside of Doc Brown’s DeLorean equipped with a flux capacitor, there’s still only 24 hours in a day in which to do all the ETL work), Hadoop provided a low-cost, high-performance environment for dramatically slowing the investment in proprietary data warehouse platforms.
Things were getting better, but still weren’t perfect. While IT Man could shave costs, he couldn’t make the tools easy to use by simple data consumers (like Executive Man). And while Hadoop was great for storing unstructured and semi-structured data, it couldn’t always keep up to the speed relied upon for relational or cube based reporting from traditional transactional systems.
See blog “The Data Warehouse Modernization Act” for more details on the role of the Hadoop-based Operational Data Store and how it has helped to “modernize” today’s existing data warehouse environment.
Phase 3: Introducing Data Science
Figure 3: Introducing Data Science
The characteristics of a data science “sandbox” couldn’t be more different than the characteristics of a data warehouse:
Finance Man tried desperately to combine these two environments but the audiences, responsibilities and business outcomes were just too varying to create an cost-effectively business reporting and predictive analytics in single bubble.
Ultimately, the analytic sandbox became one of the drivers for the creation of the data lake that could support both the data science and data warehousing (Operational Data Store) needs.
Data access was getting better for the data scientists but we again were moving towards proprietary process and a technical skill reserved for the elite. Still, things were good as IT Man, Finance Man and Marketing Man could work through the data scientists to drive innovation. But they soon wanted more.
See the following blogs for more details on the complementary nature of the data warehouse and the data lake:
Phase 4: Creating Actionable Dashboards
The reports and dashboards created to support executive and front-line management in Stage 1 were the natural channel for rendering the predictive and prescriptive insights, effectively closing the loop between the data warehouse and the data lake. With data visualization tools like Tableau and Power BI, IT Man could finally deliver on the promise of self-service BI by providing interactive descriptive and predictive dashboards that even Executive Man could operate (see Figure 4).
Figure 4: Closing the Analytics Loop
See the blog “Creating Actionable Dashboards” for more details on how to convert existing reports and dashboards into actionable reports and dashboards!
And Man was happy (until the advent of Terminator robots began making decisions for us).
The post Data Warehouse and Data Lake Analytics Collaboration appeared first on InFocus Blog | Dell EMC Services.
At CloudEXPO Silicon Valley, June 24-26, 2019, Digital Transformation (DX) is a major focus with expanded DevOpsSUMMIT and FinTechEXPO programs within the DXWorldEXPO agenda. Successful transformation requires a laser focus on being data-driven and on using all the tools available that enable transformation if they plan to survive over the long term. A total of 88% of Fortune 500 companies from a generation ago are now out of business. Only 12% still survive. Similar percentages are found throughout enterprises of all sizes.
Register Today and SAVE ▸ Here
Speaking Opportunities ▸ Here
Sponsorship & Exhibit Opportunities ▸ Here
Silicon Valley Faculty ▸ Here
Silicon Valley Schedule ▸ Here
CloudEXPO Has Been the M&A Capital For Cloud Companies
CloudEXPO has been the M&A capital for Cloud companies for more than a decade with memorable acquisition news stories which came out of CloudEXPO expo floor. DevOpsSUMMIT New York faculty member Greg Bledsoe shared his views on IBM's Red Hat acquisition live from NASDAQ floor. Acquisition news was announced during CloudEXPO New York which took place November 12-13, 2019 in New York City.
Our Silicon Valley 2019 schedule will showcase 200 keynotes, sessions, general sessions, power panels, and hands on tutorials presented by 150 rockstar speakers in 10 hottest conference tracks of 2019:
CloudEXPO Silicon Valley 2019 Show Prospectus ▸ HERE
Prospectus At-a-Glance ▸ HERE
CloudEXPO is the single event where technology buyers and vendors meet to experience and discus cloud computing and all that it entails. For more than a decade, sponsors and exhibitors of CloudEXPO benefit from unmatched branding, profile building and lead generation opportunities through our following unique tools. For more information on sponsorship, exhibit, and keynote opportunities call us at 954 242-0444 or contact us ▸ Here
FinTech and Blockchain Are Now Part of CloudEXPO 2019 Program
Financial enterprises in New York City, London, Singapore, and other world financial capitals are embracing a new generation of smart, automated FinTech that eliminates many cumbersome, slow, and expensive intermediate processes from their businesses.
Accordingly, attendees at the upcoming 23rd CloudEXPO, June 24-26, 2019 at Santa Clara Convention Center in Santa Clara, CA will find fresh new content in full new FinTech & Enterprise Blockchain track.
ServerlessSUMMIT & Kubernetes at CloudEXPO Silicon Valley
As you know, enterprise IT conversation over the past year have often centered upon the open-source Kubernetes container orchestration system. In fact, Kubernetes has emerged as the key technology -- and even primary platform -- of cloud migrations for a wide variety of organizations. Kubernetes is critical to forward-looking enterprises that continue to push their IT infrastructures toward maximum functionality, scalability, and flexibility.
Kubernetes is critical to forward-looking enterprises that continue to push their IT infrastructures toward maximum functionality, scalability, and flexibility.
Serverless and Kubernetes are great examples of continuous, rapid pace of change in enterprise IT. They also raise a number of critical issues and questions about employee training, development processes, and operational metrics.
DevOpsSUMMIT at CloudEXPO Celebrates Its 12th Event in Six Years
Cloud-Native thinking and Serverless Computing are now the norm in financial services, manufacturing, telco, healthcare, transportation, energy, media, entertainment, retail and other consumer industries, as well as the public sector.
The widespread success of cloud computing is driving the DevOps revolution in enterprise IT. Now as never before, development teams must communicate and collaborate in a dynamic, 24/7/365 environment. There is no time to wait for long development cycles that produce software that is obsolete at launch. DevOps may be disruptive, but it is essential.
ServerlessSUMMIT and DevOpsSUMMIT at CloudEXPO expands the DevOps community, enable a wide sharing of knowledge, and educate delegates and technology providers alike.
There's a real need for serious conversations about Serverless and Kubernetes among the people who are doing this work and managing it.
DXWorldEXPO Showcases Cutting-Edge IoT, Artificial Intelligence, Machine Learning, and Digital Transformation
Now is the time for a truly global DX event, to bring together the leading minds from the technology world in a conversation about Digital Transformation. DX encompasses the continuing technology revolution, and is addressing society's most important issues throughout the entire $78 trillion 21st-century global economy.
DXWorldEXPO® has organized these issues along 10 tracks, 22 keynotes and general sessions, and a faculty of 222 of the world's top speakers.
DXWorldEXPO® has three major themes on its conference agenda:
Technology - The Revolution Continues
Global 2000 companies have more than US$40 trillion in annual revenue - more than 50% of the world's entire GDP. The Global 2000 spends a total of US$2.4 trillion annually on enterprise IT. The average Global 2000 company has US$11 billion in annual revenue. The average Global 2000 company spends more than $600 million annually on enterprise IT. Governments throughout the world spend another US$500 billion on IT - much of it dedicated to new Smart City initiatives.
For the past 10 years CloudEXPO® helped drive the migration to modern enterprise IT infrastructures, built upon the foundation of cloud computing. Today's hybrid, multiple cloud IT infrastructures integrate Big Data, analytics, blockchain, the IoT, mobile devices, and the latest in cryptography and enterprise-grade security.
Digital Transformation is the key issue driving the global enterprise IT business. DX is most prominent among Global 2000 enterprises and government institutions.
About DXWorldEXPO LLC
DXWorldEXPO LLC is a Lighthouse Point, Florida-based trade show company and the creator of DXWorldEXPO - Digital Transformation Conference & Expo. The company produces and presents the world's most influential technology events including CloudEXPO, DevOpsSUMMIT, and FinTechEXPO.
Silicon Valley Faculty Highlight
Digital Transformation Blogs
CloudEXPO TV Power Panels