The 5 Most Important Things in Data Science KIRK BORNE Principal Data Scientist,
KIRK BORNE Principal Data Scientist,
THE 5 MOST IMPORTANT THINGS IN DATA SCIENCE Kirk Borne [@KirkDBorne] Principal Data Scientist and Executive Advisor Booz Allen Innovation Center, Washington DC PRESENTED FOR METIS, DEMYSTIFY DATA SCIENCE CONF SEPTEMBER 27, 2017 H T T P : / / W W W. B O O Z A L L E N. C O M / D A T A S C I E N C E
SUMMARY I will highlight the five most important things in data science, providing a short illustrative (hopefully enlightening and informative) example from my own experience for each of these: The Data, The Science, Data Storytelling, Data Ethics, and Data Literacy. Since the primary focus of data science is discovery (new insights, better decisions, and value-added innovations), I will include an overview of the different flavors of machine learning for discovery in big data, plus a summary of the different levels of analytics maturity and what they mean for real world data science applications. I will finish with a review of the top characteristics of leading candidates for data scientist positions within my organization. 5 4
THE 5 MOST IMPORTANT THINGS IN DATA SCIENCE 1. THE DATA One example of one type of data in the world: 1 = 1 Zettabyte 5
BIG DATA VOLUME IS BIG, BUT BIG DATA VARIETY IS BIGGEST ENABLER OF DISCOVERY (Welcome to the IoT of Big Discovery!!) 3+1 V s of Big Data: Volume = most annoying V Velocity = most challenging V Variety = most rich V for discovery Value = the most important V y ~ x! x ^ x Combinatorial Growth! (all possible interconnections, linkages, and interactions: high variety for discovery!) y ~ 2 ^ x (exponential growth) y ~ 2 * x (linear growth) https://www.linkedin.com/pulse/exponential-growth-isnt-cool-combinatorial-tor-bair 6
BIG DATA USE CASE IN ENVIRONMENTAL SCIENCE: From Data to Information to Knowledge to Understanding: The science of X-informatics is born! 7
BIG DATA ANSWERS OUR BIG QUESTIONS: WHO, WHAT, WHEN, WHERE, HOW, AND MAYBE WHY If we collect a thorough set of data (high-dimensional, with many attributes & features) for a complete set of items within our domain of study, then we would have a perfect statistical model for that domain. In other words, Big Data becomes the descriptive, predictive, and explanatory model for a domain X => X-informatics: o Environmental, Climate, Bio-, Geo-, Astro-, Urban, Health, Security, Biomedical informatics, and more Anything we want to know about that domain is specified and encoded within the data. The goal of Data Science is to decode all of the signals, and discover the knowledge that the data represent: using the Scientific Method 8
THE 5 MOST IMPORTANT THINGS IN DATA SCIENCE 2. THE SCIENCE 1.2 2 True Negatives 1 0.8 Decision Boundary: Class A or Not Class A 0.6 False Positives 0.4 0.2 True Positives 0-15 -10-5 0 5 10 15 False Negatives 9
DATA SCIENCE FOLLOWS THE SCIENTIFIC METHOD CYCLE 1. Data Collection: observation and characterization 2. Formulation of a hypothesis: diagnosis & classification 3. Deduction: formulation of a predictive test 4. Experimental design and testing 5. Evaluation: error characterization and minimization 6. Review results: validate or revise hypothesis https://www.oreilly.com/ideas/10-signs-of-data-science-maturity -- http://www.boozallen.com/datascience 10
DATA SCIENCE MATURITY Booz Allen Hamilton Data Science 1) is curiosity in action that creates & organizes knowledge 2) embraces & personifies a culture of experimentation 3)...follows rigorous scientific methodology (i.e., experimental, disciplined, ) 4) systematically explores the world through observation & experiment 5) relentlessly asks the right questions, and searches for the next one 6) is testable and repeatable 7) adopts a fast-fail collaborative culture (knowing that errors are informative) 8) attracts & retains diverse participants (granting them freedom to explore) 9) is a way of doing things, not a thing to do 10) presents insights by illustrating and telling the data s story. https://www.oreilly.com/ideas/10-signs-of-data-science-maturity -- http://www.boozallen.com/datascience 11
THE 5 MOST IMPORTANT THINGS IN DATA SCIENCE 3. DATA STORYTELLING 3 12
KNOWING THE KNOWABLE THROUGH DATA SCIENCE Don t just explain to us how you used Machine Learning (= algorithms that learn from experience), but tell us what you discovered, why you did it, and what it now means! 1) Class Discovery: Find the categories of objects (population segments), events, and behaviors in your data. + Learn the rules that constrain the class boundaries (that uniquely distinguish them). 2) Correlation (Predictive and Prescriptive Power) Discovery: Find trends, patterns, and dependencies in data, which reveal new governing principles or behavioral patterns (the DNA ). 3) Novelty (Surprise!) Discovery: Find new, rare, one-in-a-[million / billion / trillion] objects, events, and behaviors. 4) Association (or Link) Discovery: (Graph and Network Analytics) Find the unusual (interesting) co-occurring associations / links / connections. 13
THE 5 LEVELS OF ANALYTICS MATURITY Explain the level of analytics maturity that your Data Science is attempting to achieve. 1) Descriptive Analytics Hindsight (What happened?) 2) Diagnostic Analytics Oversight (real-time / What is happening? Why did it happen?) 3) Predictive Analytics Foresight (What will happen?) 4) Prescriptive Analytics Insight (How can we optimize what happens?) (Follow the dots / connections in the graph!) 5) Cognitive Analytics Right Sight (the 360 view, what is the right question to ask for this set of data in this context = Game of Jeopardy) Finds the right insight, the right action, the right decision, right now! = Next Best Action! Moves beyond simply providing answers, to generating new questions and hypotheses. As data scientists, we must not only Walk The Talk, but we must also must Talk The Walk. 14
THE 5 MOST IMPORTANT THINGS IN DATA SCIENCE 4. DATA ETHICS 4 https://weaponsofmathdestructionbook.com/ 15
Quote from H.G. Wells (1903; writer) Statistical thinking will one day be as necessary for efficient citizenship as the ability to read and write. Well, that day is here now! Statistical & Data Literacy Matters! 16
Quote from Ronald Coase (economist) If you torture your data long enough, it will confess to anything. 17
Quote from somebody (?) It is now beyond any doubt that cigarettes are the biggest cause of statistics 18
THE 5 MOST IMPORTANT THINGS IN DATA SCIENCE 5. DATA LITERACY 5 https://cdn.andertoons.com/img/toons/cartoon6517t.png 19
Quote from a famous politician in the 1990 s I am shocked that half the students in this country score below average on our standardized tests. 20
Data Literacy in 2 parts: Data Science and Data Ethics http://www.kirkborne.net/cds151/ 1) How to use data 2) How to use data correctly http://dilbert.com/strip/2000-11-13 http://dilbert.com/strip/2008-05-07 21
Data Literacy For All A Reading List http://rocketdatascience.org/?p=356 My journey from Astrophysics into Data Science was motivated significantly by a strong desire to build Data Literacy in the next-generation workforce! Learn about my journey here: https://youtu.be/w19sguvx7lw 22
THE 6 MOST IMPORTANT THINGS IN DATA SCIENCE 6. (BONUS) THE DATA SCIENTIST 6 http://www.marketingdistillery.com/2014/11/29/is-data-science-a-buzzword-modern-data-scientist-defined/ 23
THE MOST IMPORTANT V OF BIG DATA = VALUE! https://twitter.com/dez_blanchfield/status/645139875440668672 24
SAILING THE 7 SEAS OF DATA: THE INDIVIDUAL S JOURNEY TO DATA SCIENCE MATURITY The Seven Seas (C s) of Data Scientists: 1) Cognitively Curious (ask questions the right questions!) 2) Creative (design thinker) 3) Courageous problem-solver (rocks the culture, willingness to fail) 4) Cool under pressure (tolerance for ambiguity) 5) Continuous life-long learner (hackathons, online classes, ) 6) Communicator (data storyteller) 7) Collaborative ( data science is a team sport ) + 3 more: 8) Critical Thinker 9) Computational 10) Consultative 25
DATA SCIENTISTS ARE EXPLORERS EXPLORING VAST AND ENDLESS SEAS OF DATA! https://www.pinterest.com/pin/377106168772298092/ If you want to build a ship, don t drum up people to gather wood and don t assign them tasks and work, but rather teach them to yearn for the vast and endless sea. - Antoine de Saint-Exupery 26
THANK YOU! LET US EXPLORE & BUILD A BETTER WORLD WITH DATA SCIENCE! Booz Allen Hamilton LISTEN @KirkDBorne @BoozDataScience READ, BUILD, and EXPLORE www.boozallen.com/datascience Tips for Building a Data Science Capability The Mathematical Corporation 10 Signs of Data Science Maturity The Field Guide to Data Science The Data and Analytics Catalyst Explore: sailfish.boozallen.com in Machine Intelligence Learn how AI and Machine Intelligence empower The Mathematical Corporation PARTICIPATE datasciencebowl.com These slides here: http://www.kirkborne.net/demystifyds2017/ 27