NTT DATA FORMAT

1 of
Published on Video
Go to video
Download PDF version
Download PDF version
Embed video
Share video
Ask about this video

Page 1 (0s)

NTT – Luiss Guess Experience. 16/11/2023. BUSINESS AND MARKETING ANALYTICS.

Page 2 (10s)

Canio Mancaniello Executive Manager with experience in defining EDWH and in complex transformation projects towards the Cloud Data Platform. Computer Engineer with specialization in artificial intelligence and MBA obtained in 2014..

Page 3 (34s)

Antonio Sisbarra Advanced Data Engineer with experience in defining complex solutions and transformations in projects towards the Cloud Data Platform. Master Degree in Computer Science with specialization in Data Analytics obtained in 2019..

Page 4 (52s)

agenda. NTT Data Group. Data Intelligence. Use Case Experience.

Page 5 (1m 0s)

4 zzz o Inbox: New Thc Ori •inal Emo-i by Shigetaka 9 o iTo- 13 4 5mn fi FAX cvs.

Page 6 (1m 8s)

NTT: group. Billions in annual revenues. $500. Millions of investments in a startup fund on innovative technologies.

Page 7 (1m 37s)

NTT data: Group. Shape Description automatically generated with medium confidence.

Page 8 (2m 4s)

NTT DATa: italy. We anticipate the future with intelligence, in a collaborative environment where we continue to grow together, as a community and a society. We support the digital growth of the country and of the companies with which we collaborate, in Italy and abroad, generating more and more synergies and collaborations to face more important challenges, more innovative, inclusive and sustainable projects..

Page 9 (2m 29s)

NTT DATA Italy matrix organization. SECTORS CAPABILITIES Telco & Media Financial Services Utilities & Energy Industry Public Sector AIS - Application & Infrastructure Services SES - SAP & Enterprise Solutions DST - Digital Strategy & Technology.

Page 10 (2m 40s)

DST Italy Internal organization. Business Advisory.

Page 11 (2m 57s)

agenda. NTT Data Group. Data Intelligence. Use Case Experience.

Page 12 (3m 5s)

The House of D&I. COMMUNITY. D&I Strategy Data Consulting Data Governance Data Analysis Data PM / Agile.

Page 13 (3m 21s)

Data Intelligence Italy. Data Strategy & Consulting.

Page 14 (3m 47s)

Data Intelligence Italy. xOps Data Science & Artificial Intelligence Data Experience Cloud & Hybrid Data Platform Data & Analytics Governance Data Strategy.

Page 15 (3m 57s)

Data Intelligence Italy. xOps Data Science & Artificial Intelligence Data Experience Cloud & Hybrid Data Platform Data & Analytics Governance Data Strategy.

Page 16 (4m 7s)

Data Intelligence Italy. xOps Data Science & Artificial Intelligence Data Experience Cloud & Hybrid Data Platform Data & Analytics Governance Data Strategy.

Page 17 (4m 17s)

Data Intelligence Italy. xOps Data Science & Artificial Intelligence Data Experience Cloud & Hybrid Data Platform Data & Analytics Governance Data Strategy.

Page 18 (4m 27s)

Data Intelligence Italy. xOps Data Science & Artificial Intelligence Data Experience Cloud & Hybrid Data Platform Data & Analytics Governance Data Strategy.

Page 19 (4m 37s)

Data Intelligence Italy. xOps Data Science & Artificial Intelligence Data Experience Cloud & Hybrid Data Platform Data & Analytics Governance Data Strategy.

Page 20 (4m 46s)

Data Intelligence Italy. xOps Data Science & Artificial Intelligence Data Experience Cloud & Hybrid Data Platform Data & Analytics Governance Data Strategy.

Page 21 (4m 56s)

E2E Capabilities. Solution Design Solution Implementation Analysis & Feasibility Change Management Standard definition Best Practice Auditing and Monitoring Data Model design Function Model design Data Management design Interfaces design Regulatory rules design Implementation and management processes Business rules implementation Regulatory (Privacy, SOX, etc.) rules implementation Interfaces to access reporting/analysis (channels) implementation User test System deployment Documentation and deliverables DeVops Training Organizational change management support Technological evolution support Regulatory (privacy, security, etc.) processes definition support Strategy evolution architecture Cloud approach Governance and project management support Methodological approach Business assessment Analysis model definition Information source discovery Roles and responsibility matrix Objectives definition support KPIs definition support Bus Matrix Standard & Best Practice RollOut Conceptual Buz model definition Strategy, Consulting and Governance Requirements definition Application Maintenance support solutions service PoC definition approach (“Try&Buy”) Feasibility study Technical and business match analysis Roadmap definition Products installation and maintenance AMS support Environmental management.

Page 22 (5m 35s)

agenda. NTT Data Group. Data Intelligence. Use Case Experience.

Page 23 (5m 43s)

Real world Use Case: from DB to Analytics in a big Business Company.

Page 24 (5m 55s)

Main Goal (1/2). Unified star schema of tables in Databricks Delta Lake: Build library to download data from Synapse Optimize notebooks process extract and load data in Delta Lake Optimize tables in Delta Lake for stakeholders Schedule the notebooks to extract and load data Switch to Synapse only and leave Teradata sources.

Page 25 (6m 12s)

Main Goal (2/2). Unified star schema of tables in Databricks Delta Lake: Build library to download data from Synapse Optimize notebooks process extract and load data in Delta Lake Optimize tables in Delta Lake for stakeholders Schedule the notebooks to extract and load data Switch to Synapse only and leave Teradata sources.

Page 26 (6m 19s)

[image] Immagine che contiene testo schermata Carattere Marchio Descrizione generata automaticamente.

Page 27 (6m 28s)

Intro to Databricks Cloud platform: What is Databricks?.

Page 28 (6m 48s)

What is a notebook?. Set of cells, containing code or descriptive text Each cell can be executed separately from each other, using a different language (SQL, Python, Shell commands) Each cell can contain: One or more transformations on tables, variables, data structures Read data from lots of sources (batch, streaming, databases) Make data analysis and apply Data Science algorithms Each notebook can be scheduled in a workflow, also applying dependencies towards other notebooks.

Page 29 (7m 12s)

What is Delta Lake DB?. Open format: open-source Apache Parquet format and is fully compatible with the Apache Spark ACID transactions: Delta Lake enables ACID (atomicity, consistency, isolation, durability) Time travel: Delta Lake’s transaction log provides a master record of every change made to the data, which makes it possible to recreate the exact state of a data set at any point in time. Data versioning makes data analyses and experiments completely reproducible. Schema enforcement and Schema Evolution Merge, update, delete: Delta Lake supports data manipulation language (DML) operations including merge, update, and delete commands.

Page 30 (7m 40s)

Immagine che contiene Carattere, logo, Elementi grafici, testo Descrizione generata automaticamente.

Page 31 (7m 53s)

ASIS: “BA” database in Databricks.

Page 32 (8m 0s)

Weak points in the BA process. Teradata library to download data: lots of manual settings to make No optimization on BA tables (no partitioning, no caching) Manual run of each notebook to populate the start schema (more than 50 runs each month by hand :O ) No parametrization of code No comments and descriptions on main parts of notebooks Lots of “unofficial” notebooks in the official path of Business Analytics team.

Page 33 (8m 20s)

TOBE: The new «CDM» DB from Synapse (Cloud). Microsoft Azure databricks Workflows Jobs Caricamento CDM Q Search data, notebooks, recents, and 2Domenica SWO O Tteo CTRL + p pmllabdbrks01a v cw Asslcu. sa razz ntt3qxm@posteitaliane.it Run now Runs Tasks o CW sa MESE.

Page 34 (8m 33s)

TOBE: Extracting data from Synapse. Immagine che contiene testo, schermata, software, numero Descrizione generata automaticamente.

Page 35 (8m 45s)

Improvements in Databricks Delta Lake. Lots of query of Data Stakeholders on last ANNOMESE Partitioning of tables for ANNOMESE From minutes to seconds to read portions of data Temporary Views on Databricks, better than complex query on Synapse Data source Spark Lazy approach.

Page 36 (9m 14s)

TOBE: Better notebooks organization in Databricks.

Page 37 (9m 59s)

TOBE: Notebooks workflow scheduling and temporal window.

Page 38 (10m 25s)

Future works: improve Analytics approach (1/5). As of now, there are only handwritten rules to categorize clients Current situation of analytics: notebooks to get statistics but not predictions or automatic smart classification Lots of correlations and data to explore and discover, but in a modern and smart approach.

Page 39 (11m 2s)

Future works: improve Analytics approach (2/5). As of now, there are only handwritten rules to categorize clients Current situation of analytics: notebooks to get statistics but not predictions or automatic smart classification Lots of correlations and data to explore and discover, but in a modern and smart approach.

Page 40 (11m 35s)

Future works: improve Analytics approach (3/5). Categorization by Clustering What is the best product to advertise? Which is the one that fits best with the needs of the specific client? Which model to adopt? Neural networks, Decision Trees, … Maximize probability of selling product.

Page 41 (11m 53s)

Future works: improve Analytics approach (4/5). Categorization by Clustering What is the best product to advertise? Which is the one that fits best with the needs of the specific client? Which model to adopt? Neural networks, Decision Trees, … Maximize probability of selling product.

Page 42 (12m 3s)

Future works: improve Analytics approach (5/5). Categorization by Clustering What is the best product to advertise? Which is the one that fits best with the needs of the specific client? Which model to adopt? Neural networks, Decision Trees, … Maximize probability of selling product.

Page 43 (12m 15s)

Conclusions. More than 30 notebooks migrated to CDM (Synapse data source) Build a scheduling logic and workflow for new DB Improved performance in reading tables for analysis Reusable code, cleaner notebooks, descriptive cells Working in SQL and Python Easier ingestion with Synapse library, no more manual credentials Simple Dual Load with SQL and Python ML Models to train in the future business context Data Science in a smarter way.

Page 44 (12m 35s)

Restricted NTT DATA Italia S.p.A NTT DATA Italia S.p.A.