Monday, October 12, 2015

Geomorphology, meterology and Big Data

Edit: I found a glaring quantitative error, and corrected it.


I'd like to open with an apology to my long-suffering and patient loyal readers.

To be polite, my recent writing has been sparse at best. The reason is that I've been working a lot- I haven't had a "real" day off in three weeks. I think- this is my third weekend without a whole day off. I took Saturday evening off and will be back in the office on Sunday. And what, one might ask, does the author do for a living that requires so much work?

I'm a programmer, working in direct mail (a.k.a. "junk mail"). And what causes overtime in this field?

To keep it simple, there are really only two factors: workload and workforce. Here in the United States of America, the month of October is the time for senior citizens (folks who are 65 years of age or older) to make some choices regarding their prescription drug benefit provider. I'm not an expert on this, but the bottom line for my company is that in September and October we experience a huge spike in this business from these clients. This year, however, workforce came into play. There is another manufacturing plant that our plant has a fairly close relationship with, and they are currently shorthanded (as is our plant) and they also have a client that is doing a similar ad campaign. The deadlines have been tight, and everyone's resources- human and physical- have been severely stressed. My role in this has been pretty much support- but it's been for both plants. Or, wL > wF.

Having said that, I wanted to take a bit of a light-hearted look at Big Data.

There's been a few topics that I've been wanting to write about, but some recent twitter activity led me here. Just for the record, I'm currently listening to Steely Dan's "Midnight Cruiser", and thinking about data.

Data. Steely Dan. Yeah, not much of a connection there.

I'm not sure if the average reader realizes that data geeks even enjoy music.  To be blunt, we do.

But... back to data. Geomorphology is a real word. I was introduced to this term by my wife, who has a geology degree. Geololgy one- liner: she has rocks in her head. She said so. Anyway,  I find the big data landscape falling somewhat messily onto this collision of mismatched terminologies.

As I am not a true "data" person, I often laugh at data terminology and enjoy extending it to its ridiculous, but plausible limits limits.

Point:"data lakes".

Everyone pretty much understands (more or less) what "big data" is. Pretty much like everyone understands what "crime" is. Or "pornography". Alles klar?

In other words, aside from I.T. insiders and those who follow big data, no one really knows what big data is- or how pivotal it can be.

So, I suppose this is a call to action: how do you define your data?

I do not have a lot of data, relatively speaking. "Relatively speaking", of course, is a HUGE qualifier.

When I think about my personal data, I think in terms of things that matter to me- in the "real world",  these things have little value. In the real world, I tend to generate lots of data which has no value to me personally. For example, I've been on twitter for around three and a half years, and in that time have posted nearly 2900 tweets.That sounds like a lot of tweeting, but in reality it's far less than three tweets per day. What would be interesting to me would be a breakdown of my top hashtags.

But, as usual, I have digressed.

The personal data that I track is only in a few categories. I use data to catalog stuff, for the most part: books, videos, music and Legos. I also keep a pedometer log.

Most- if not all- of this data is useless to pretty much anyone except me. But, here we get a peek into the actual application of data science IRL. All data is data, but of all that data, which is most relevant to you? Does Lego care how many 3001 blue elements I own? I think not. They probably do care, however, about my age, where I purchase Lego products, and how much I spend on Lego in a month or year.

This is truly the science and ART of data science. Much of what I tweet on the subject of data science and data analysis is somewhat technical, focused on languages, algorithms and "sciencey" stuff... but business and ethics are also huge, and seem to be marginalized.

"What is the greatest Rock 'N' Roll song of all time"? A valid question. Of course, it is a question that cannot be answered- at least, not with data. Usually, there seem to be three contenders: "Hey Jude" (The Beatles), "Stairway To Heaven" (Led Zepplin) and "Freebird" (Lynyrd Skynyrd).

Likewise, a data scientist must be in tune with business: what is your best product/service? Data science should not only answer that question, but give stakeholders the answers to the five great press questions: Who? What? When? Why? and How? When a data scientist returns valid, data-based answers that are clearly communicated to these questions, the stakeholder has a valid representation of their business based on science and art.

Sorry- I never got around to the humor of Big Data... maybe another time.

As always, I am hochspeyer, blogging data analysis and management so you don't have to.

No comments:

Post a Comment