Nand Kishor Contributor

Nand Kishor is the Product Manager of House of Bots. After finishing his studies in computer science, he ideated & re-launched Real Estate Business Intelligence Tool, where he created one of the leading Business Intelligence Tool for property price analysis in 2012. He also writes, research and sharing knowledge about Artificial Intelligence (AI), Machine Learning (ML), Data Science, Big Data, Python Language etc... ...

Full Bio 
Follow on

Nand Kishor is the Product Manager of House of Bots. After finishing his studies in computer science, he ideated & re-launched Real Estate Business Intelligence Tool, where he created one of the leading Business Intelligence Tool for property price analysis in 2012. He also writes, research and sharing knowledge about Artificial Intelligence (AI), Machine Learning (ML), Data Science, Big Data, Python Language etc...

3 Best Programming Languages For Internet of Things Development In 2018
921 days ago

Data science is the big draw in business schools
1094 days ago

7 Effective Methods for Fitting a Liner
1104 days ago

3 Thoughts on Why Deep Learning Works So Well
1104 days ago

3 million at risk from the rise of robots
1104 days ago

Top 10 Hot Artificial Intelligence (AI) Technologies

2018 Data Science Interview Questions for Top Tech Companies

Want to be a millionaire before you turn 25? Study artificial intelligence or machine learning

Here's why so many data scientists are leaving their jobs

Google announces scholarship program to train 1.3 lakh Indian developers in emerging technologies

How Data Science Helps Us Ask The Right Questions: And Why IBM Never Became The King of Photocopies

By Nand Kishor |Email | Jun 21, 2017 | 9669 Views

Leaders sometimes ask questions that get in the way of solving the problem that really matters to them. We can learn a lot from a real-life example of two business titans.

During the 1960s, Big Blue had the opportunity to buy or license Xerox‚??s new reprographic photo process (think: copies). IBM hired a consulting firm to answer the following question: ‚??If a more reliable, cheaper and faster processing firm were available, how many more copies from the originals would people make in a given year?‚?? However important that inquiry may have been, IBM asked the wrong question -- by miles. According to Paul Schoemaker and Steven Krupp‚??s article in the MIT Sloan Management Review, The Art of Asking Pivotal Questions (MIT Sloan Mgm‚??t Rev. Winter 2016), IBM ‚??ignored a new segment of the market that turned out to be many times larger; namely, copies of copies of copies.‚?? This was a huge overlooked opportunity. What if IBM had asked instead:

"How might the new Xerox process change when and how people make copies, and what might this grow to in total number of copies made in future years?"

Id. The answer to the right question might well might well have resulted in IBM owning this new technology. Xerox may never have become a verb. We‚??ll never know. What we do know, however, is that IBM paid dearly for asking the wrong question, and that enterprises should take the time they need to craft the correct questions to guide their initiatives. Spend considerable time aiming and little time firing.

Why this example? In the recent past, we have observed that data science challenges that ask the right questions up front have produced extreme value outcomes that its clients could never have conceived of. The Harvard Medical School example below demonstrates how asking the right question can produce extraordinary results.

World-class data science challenges have three important traits: (1) they atomize (often called decomposition in light of the scientific process known as systems decomposition) the problem into its component parts; (2) they extract the competition‚??s most difficult questions to serve as a precondition for proceeding, and (3) they take those extracted questions abstract them from the domain. Collectively, these processes are known as DEA. They allow hundreds of thousands of the world‚??s best problem-solvers to analyze and solve problems without even knowing what challenge was being posed -- e.g., helping Harvard Medical school (‚??HMS‚??) tackle DNA sequencing; or aligning the tethers (longerons) that hold in place the International Space Station‚??s solar arrays in order to maximize energy capture within tightly prescribed parameters. In the latter case, abstraction from the domain allowed a teacher in Italy to produce the winning solution.

By atomizing a challenge, a crowdsourcing community breaks it down into small component parts. This reaps a number of benefits for its clients. First, community members self-select to compete in those sub-challenges where they feel they have a comparative advantage and can win. Second, atomization allows for parallel development (as opposed to sequential progress). With so many community members choosing to compete in specific challenges, one may find 143 contestants working on challenge (A), 110 on (B), and 79 on (C). When each of those challenges finishes, it can be resynthesized with the others to bring together a whole, e.g., (A)--(C). The alternative is sequential development, which requires that (A) be finished before (B), and then on to (C) and so forth. This leaves development vulnerable to weak links in the chain, and can be painfully slow. Atomization also allows for more efficient participation. It‚??s more likely for a contestant to be an expert at just improving the algorithm used in a program than to be an expert at both improving the algorithm and the user interface for the program serving that algorithm. Likewise, this also improves the self-perceived probability of winning, and therefore the participation of the contestant in the atomized contest.

Data science also abstracts a problem from its domain (e.g., genetics or space or predictive toxicology) into the denominator that unifies crowdsourcing communities: mathematics. This has to be done very carefully keeping in mind that every domain has tacit or implicit assumptions from its practitioners, which need to make it as constraints into the abstracted problem. While this is a restraint, the positives from abstraction can far outweigh the hard work needed to do it correctly. Rather than restrict the competition to scientists who specialize in a single field, abstraction encourages new perspectives by virtue of much participation from a much larger community with cross-domain experience. The paradigm shifts that this strategy effects are the beauty of data science and systems decomposition.

At HMS, DEA resulted in precisely such an extreme value outcome. Consider the statistics. The medical school wanted to minimize the distance between strings in order to increase the efficacy of its work in genomics. A previous attempt had processed 100,000 sequences in 15,622 seconds (260.4 minutes). HMS then first looked inward and dedicated a full-time resource (salary: $120,000) to meet the challenge. The developer lowered the processing time to 2,845 seconds (47.6 minutes), a significant (but still unsatisfactory) result. Harvard Catalyst, a university-wide clinical science center with the backing of the university and situated at HMS, wanted to see if crowdsourcing could be applied with a traditional academic science community. The partners first atomized the problem in order to encourage subject matter experts within the community to self-select and answer those questions that presented them a comparative advantage. The challenge had to be devoid of highly domain-specific immunogenomics concepts in order to be attractive to and interest non-life science participants. See Karim R. Lakhani, Prize-based Contests can Provide Solutions to Computational Biology Problems 109 (Nature Biotechnology Feb. 2015).

By abstracting the sequence alignment problem as one involving string matching, the challenge was made accessible to a much larger audience of contestants from diverse fields. HMS devised a scoring metric that supported the medical school‚??s ‚??goal of achieving both improved accuracy and computational efficiency (speed).‚?? Id. That metric was disclosed to contestants and was the only measure used to award prizes. HMS‚?? contest ran for two weeks and offered only $6,000 in prize money, with top-ranking contestants receiving cash prizes of up to $500 each week.

HMS‚?? edit distance challenge drew 733 participants, of whom 122 submitted algorithms. The submitters represented 69 countries. According to Harvard Business School Professor Karim Lakhani, who helped oversee the contest in his role as the Principal Investigator of the Harvard-NASA Tournament Lab at the Harvard Institute for Quantitative Social Science, ‚??none [of the contestants] were academic or industrial computational biologists and only five described themselves as coming from other R&D or life sciences in any capacity.‚?? Id. Eighty-nine (89) completely different methods were explored and used in the 122 submitted algorithms. It is hard to conceive of any research effort that could so inexpensively and easily achieve an equivalent scale of effort to address a specific problem in such a short duration (2 weeks). HMS maintained a narrow focus and asked the right question: How can we dramatically minimize the time that it takes to process the distance between genomic edits strings? That focus never wavered, even when employing crowdsourcing for the first time, and it bore fruit.

How did HMS fare in its first foray into crowdsourcing? The results were astounding. With a winning solution of 16 seconds (976 times faster than the first attempt), several solutions were ‚??very near the theoretical maximum for the data set.‚?? Id. at 110. This extreme value outcome represents a shift not only in how HMS approached this complex immunogenomics issue, but also the future of its innovation initiatives.


We learn to ask the right questions at an early age. At an intersection, for example, a child might ask his parent: ‚??Does red mean we must stop or just should stop.‚?? The validity of the question will be confirmed by the answer in that case. Years later we ask questions about all aspects of our lives -- jobs, finance, relationships etc. We hope to ask the right questions at the right time. In the business world, the example above of IBM‚??s misguided question resulted in an entire industry not shifting as it might have. The implications for IBM were profound, just as they were for Xerox.

When we take on complex scientific problems using data science, asking the right questions at each stop is critical to the process. Failure to do so may make the difference between frustration and profound innovation. Aim carefully and with proper consideration in order to sculpt the right question. You may not get a second chance. Continue Reading>>

Source: Forbes