Data infrastructure optimization, availability & security software
Data integration & quality software
The Next Wave of technology & innovation

Data Science Training: Emergent Discipline or Big Ed Boondoggle?

Vincent Granville tells it like it is.

Books, certificates and graduate degrees in data science are spreading like mushrooms after the rain. Unfortunately, many are just a mirage: some old guys taking advantage of the new paradigm to quickly re-package some very old material (statistics, R programming) with the new label. . .

Is a degree in Data Science a valuable career commodity or just Big Ed exploiting Big Data buzz?

The Bellwether Book

There was a time when textbooks enjoyed a luxurious life cycle. A first edition might be introduced into a curriculum and survive for 3-5 years before a second edition is released by the publisher. This was especially true for science and math texts, whose core curriculum did not waver for decades.

Old Books, Old School?

Things have changed, even for Basic Big Science. Today a virtual Organic Chemistry text is hosted at Michigan State. It was revised regularly from 1999 to 2010. WikiBooks hosts an organic chemistry text that was last revised only two weeks before it was added to a list for this report. The Daley & Daley online text is not only free (freemium), but uses a different teaching method – organizing its content by mechanisms rather than functional groups.

If this trend applies to relatively static content like introductory organic chemistry, what does it mean for an emerging discipline still struggling to list what topics should be learned?

M.S. or “Mess”?

Agreement about a common curriculum for data science is some ways off. Yet for a discipline still in a state of rapid growth and flux, colleges and universities clearly aren’t waiting for the dust to settle over the data science curriculum. On the contrary, some are setting tuition at levels guaranteed to stress-test the federal government’s college loan program.

Data Science education is offered by a dizzying array of institutions. While some are certificate programs aimed at practitioners, others are Masters or even PhD level programs. A representative sample of the programs available: Predictive Analytics (Northwestern, DePaul), Information Knowledge Strategy (Columbia) , Data Science (Elmhurst, Syracuse, George Mason, NYU, Washington, University of Charleston, USC, Illinois Institute of Technology, Northern Kentucky, Chapman, Indiana), Master of Information and Data Science (UC Berkeley – online), Data Mining (Stanford, UCSD, Central Connecticut), Analytics (NC State, USF, Rutgers, Louisiana State, Michigan State, University of Chicago, Texas A&M, Bowling Green), Business Intelligence and Analytics (CMU), Data Analytics (CUNY, Maryland, Cornell), Business Analytics (UT-Austin, Iowa, Drexel, Michigan-Dearborn, Michigan State, Fordham, Tennessee, ASU) and Strategic Analytics (Brandeis).

As tax subsidies for public universities have decreased, and as health care benefits have risen for faculty, clearly data science – or whatever name is given to the program – is seen as a cash cow. For example, the infrastructure-light UC Berkeley program will set you back a sweet $60K. The one-year revenue for UT-Austin’s Business Analytics program is forecast to be $1.7M.

There are also non-academic offerings, whose objectives may be more in alignment with some practitioners. For example, Zipfian Academy’s 12-week, $14,000 program does not suffer from the publish-or-perish, tenure, and instruction-by-proxy (i.e., grad student) pitfalls that afflict some academic settings.

Then again, some programs may lack the rigor needed for the most critical applications.

Learn from the Peddler

Practitioner-oriented offerings also include those by software vendors, not all of whom educate only to peddle their wares.

For example, Hadoop Training and Certification from Hortonworks does promote the company’s Data Platform, but may identify more real world scenarios than the ones found in the ivory tower of your choosing.

Vendor-Sponsored Learning: Fastest ROI?

“Just Get Me Started”

The practically minded may wish to try “Intro to Hadoop and MapReduce: How to Process Big Data” at Udacity, taught in collaboration with Cloudera. Udacity also offers “Exploratory Data Analysis using R.” This course includes instructors drawn from data science teams at Facebook.

Learn QlikView, Syncsort, Tableau or StreamInsight and some data science will come your way along with a bit of agile.

More than Books and Fees

Zipfian’s cofounder Ryan Orban claims that there’s more to learn than how to use R and whether there is any usable Big Data software outside the Apache tent. As Orban told Information Week :

At the end of this project, not only do you have a full data science project under your belt, but we also require you to write up and present your results, because your insight is only as good as how effectively you can communicate it. If you’re talking about complicated predictive models, you need to not only understand how they work, but also be able to explain how they work in layman’s terms so that people can understand them and get on board.

In a post titled “The Curse of Big Data,” Granville reminds readers “that when you search for patterns in very, very large data sets with billions or trillions of data points and thousands of metrics, you are bound to identify coincidences that have no predictive power . . . The question is: how do you discriminate between a real and accidental signal in vast amounts of data?”

If you believe the answer to these and other questions is “education,” you may want to turn to Thomas Davenport’s Big Data @ Work (2014). Take his Big Data Readiness Survey, which includes these signposts:

• Our data scientists and analytics professionals act as trusted consultants to our senior executives on key decisions and data-driven innovation.

• We have programs either internal or in partnership with external data science and analytical skills in our employees.

Well, no.

Apart from an elite few, not many organizations are ready for what Davenport calls Analytics 3.0. Instead the existing workforce will need education from some source or other — because so far there is no evidence of a much-heralded binge of analyst hiring.

Related Posts