Sunday, October 25, 2015

So, how was your internet day, dear?

I'm guessing that most readers of this blog spend a fair amount of time connected (at work, on the internet, in the cloud, streaming music, messages, videos, driving a late model car, or interacting in some other way with data or the IoT). Even when I am at work, I am contributing to the data glut as I wait for certain processes to finish doing their... processing.

Creating certain types of files and compiling programs are two time intensive tasks I normally perform. As these tasks are also resource heavy, rather than stare at the progress bars, I turn to my phone and play a round of Bejeweled Blitz or Angry Birds or check my Twitter feed... all of which add to the data glut. Mind you, these activities are innocuous- I'm filling time rather than wasting it staring at progress bars. I'm guessing that the activities I choose to fill the time might be considered a waste of time, but as I have a feel for how long certain process require, Angry Birds & Co. keep me alert and sharp.

The odd thing is that quite often my "decompress" time after work is still internet time. My employer uses a "netnanny" so there are many sites which one simply cannot access at work. This doesn't bother me greatly, as I don't need additional distractions.  So, when I get home, I generally logon to two computers- one to play "Rail Nation" and the other for blogging, tweeting and everything else.

In essence, I guess I'm actually decompressing from my work e-framework by logging in to another network or two.

For those who are not constantly connected, this may seem to be quite the conundrum. However, even though I do not consider myself to be in the "continuously connected" category, I thy to keep my social media interactions in the "constantly interacted with" category.  And this, I find, is my greatest challenge.

Five years. It's a good measure of time and goals...I think the Soviet Union was famous for these. So, going back 5 years...

The Soviet Union did not exist. I had a Facebook, Google, and Yahoo account. At some point, a Pinterest account was created. Apart from these, I have LinkedIn, Amazon, Newegg and "memberships" on a couple dozen other sites which I infrequently visit.

I'm most active these days on Twitter, then Facebook and finally LinkedIn. Now, this is not to say that I derive social media value from these services in the order in which they are presented; rather, I derive social media value where it exists.

For example, even though the vast majority of my internet social interaction may be generated on Twitter and reposted on Facebook, it does not mean that the main focus of my internet time is on Twitter. Like many folks, I shamelessly self promote- one has to in order to be heard above the static and background noise which is the Internet.

Every so often- generally once every month or so- I post a disclaimer on Twitter indicating that i am NOT a Data Scientist, related data professional , analyst or statistician, because, partly, I have a coworker who initially said, "You're a complete fraud!" To which I replied, "No, I'm completely honest.". He still says it, but mostly in jest these days, mostly because I've shown him comments, retweets and favorites of things I've tweeted or RT'ed. My "thing" is this: although I have no formal background, I have a serious amateur interest in stats, big data, analytics, tech, IoT,  and STEM/STEAM, and am a huge proponent of them.

I also love the maker scene. Although I'm not hugely into it right now, our family has three Arduino Uno's, a pair of Raspberry Pi model B's, a couple of Linux (Ubuntu) PCs and lots of Legos, as well as an original Gilbert (pre-Meccano) Erector set, as well as one or two Meccano sets. Don't ask me how many PCs we have- I literally have no idea.

So, to close... I haven't done this in some time, but here are some previous posts you might enjoy. And, in keeping with the spirit of the5 year theme mentioned earlier- here is a selection of 5 older posts, which are currently #6 through #10 in all-time popularity. FWIW, this is post #185... I'm planning something special for #200, which may occur near Christmas (hopefully!)




I hope you are able to enjoy one or more of these. They're far from perfect, but they're honest, and I hope you have at least a bit of fun reading them- I really enjoy writing them!

As always, I am hochspeyer, blogging  data analysis and management so you don't have to.
 





Friday, October 23, 2015

Shopping at the local Euromart

I'm not certain that I have stated this categorically before, but Jennifer and I do a great deal of our grocery shopping at a store downtown which is part of a local chain of stores called Shop N Save. The basic selling point of these stores is that they sell all sorts of imported foodstuffs and household consumables. Jennifer and I spent the formative years of our marriage in Europe, far away from the "normal" support system of family and friends that most young couples enjoy...  our "early" years were spent in West Germany and Germany (we were there through the Weidervereinigung, so even though our two oldest children were born in the same hospital, they have different countries of birth on record. Therefore, Shop N Save is a fun and money-saving place for us to shop, partially because we very carefully research our food purchases, and partly because this chain has such a neat selection of stuff. And partly because it takes us back to some familiar foods we enjoyed in Europe

The products pictured below are herbal teas from Poland.  Whilst we were in Europe, herbal teas were not a part of normal conversation, but we've learned through keen observation that there's an herbal fix, remedy or aid for most common ailments. Generally speaking, these products have English translation stickers- I didn't look at the translation for the first one, but I'm guessing it has something to do with one's solid or liquid waste, and the regulation thereof.



                                              

The next one is an aid for varicose veins. I showed this to a coworker (who is married to an OTB [off-the-boat, born in Poland] woman, and he said that in no way would a woman that had those legs have varicose veins. I guess truth in advertising is abused everywhere. The last one shows a young mother breastfeeding.  I'm guessing by the brown flower or leaf decorations at the bottom left and right that, when taken an hour before feeding, will allow the newborn and infant to enjoy chocalatey mother's milk. What will they think of next? 

Oh, well, enough of herbal tea. Let's talk about data.

As some readers are probably aware, I work for a company that produces large volumes of "high quality" direct mail. That's "junk mail" to most readers. We print stuff for a wide variety of clients- insurance, video services, sweepstakes, nonprofits and Federal agencies, to name a few. As a programmer, I look at LOTS of data. Some is good. Some is very good.

And some is atrocious. 

The sad part about the last comment is that in some cases, the client actually OWNS the data- it doesn't come from a rented list. In other words, the person who is receiving the mailpiece has actually done business with our client before.

In the wonderful world of programming, we have the ability to at least skirt some bad data. For example, if the first name and last name are mere initials, we can throw some logic in to substitute "Dear Neighbor" in a salutation or "Current Plutonium Purchaser" in the address block. We often alert clients to these situations, and the clients themselves are often aware that these situations may arise, and they give us instructions to cover these situations.

Sometimes, though, an unforseen situation arises, and despite all safeguards, the best option available to the programmer is to warn the press that the imaging is not a mistake- the data is messed up.

I was recently QCing a job prior to printing, and came across an interesting record. Normally, when we pull signoff records for a client, we include a "longest name", and often a "shortest name". The longest name shows the client that a name will fit where it needs to fit. The shortest name shows them what a short name looks like. The problem is this: the shortest name is often an initial, and rather than spending a few extra bucks on programming or data cleanup, the client often lets the data go to press  "as is". This particular client had a special field for a "first+middle name". The problem I ran into was that the middle name was not properly cased in 100% of the records I looked at. So, I alerted the press to the situation so that they would not stop running- which can be expensive.

That's all for now- next up, an update on overtime.

As always, I am hochspeyer, blogging data analysis and management so you don't have to.








Monday, October 12, 2015

Geomorphology, meterology and Big Data

Edit: I found a glaring quantitative error, and corrected it.


I'd like to open with an apology to my long-suffering and patient loyal readers.

To be polite, my recent writing has been sparse at best. The reason is that I've been working a lot- I haven't had a "real" day off in three weeks. I think- this is my third weekend without a whole day off. I took Saturday evening off and will be back in the office on Sunday. And what, one might ask, does the author do for a living that requires so much work?

I'm a programmer, working in direct mail (a.k.a. "junk mail"). And what causes overtime in this field?

To keep it simple, there are really only two factors: workload and workforce. Here in the United States of America, the month of October is the time for senior citizens (folks who are 65 years of age or older) to make some choices regarding their prescription drug benefit provider. I'm not an expert on this, but the bottom line for my company is that in September and October we experience a huge spike in this business from these clients. This year, however, workforce came into play. There is another manufacturing plant that our plant has a fairly close relationship with, and they are currently shorthanded (as is our plant) and they also have a client that is doing a similar ad campaign. The deadlines have been tight, and everyone's resources- human and physical- have been severely stressed. My role in this has been pretty much support- but it's been for both plants. Or, wL > wF.

Having said that, I wanted to take a bit of a light-hearted look at Big Data.

There's been a few topics that I've been wanting to write about, but some recent twitter activity led me here. Just for the record, I'm currently listening to Steely Dan's "Midnight Cruiser", and thinking about data.

Data. Steely Dan. Yeah, not much of a connection there.

I'm not sure if the average reader realizes that data geeks even enjoy music.  To be blunt, we do.

But... back to data. Geomorphology is a real word. I was introduced to this term by my wife, who has a geology degree. Geololgy one- liner: she has rocks in her head. She said so. Anyway,  I find the big data landscape falling somewhat messily onto this collision of mismatched terminologies.

As I am not a true "data" person, I often laugh at data terminology and enjoy extending it to its ridiculous, but plausible limits limits.

Point:"data lakes".

Everyone pretty much understands (more or less) what "big data" is. Pretty much like everyone understands what "crime" is. Or "pornography". Alles klar?

In other words, aside from I.T. insiders and those who follow big data, no one really knows what big data is- or how pivotal it can be.

So, I suppose this is a call to action: how do you define your data?

I do not have a lot of data, relatively speaking. "Relatively speaking", of course, is a HUGE qualifier.

When I think about my personal data, I think in terms of things that matter to me- in the "real world",  these things have little value. In the real world, I tend to generate lots of data which has no value to me personally. For example, I've been on twitter for around three and a half years, and in that time have posted nearly 2900 tweets.That sounds like a lot of tweeting, but in reality it's far less than three tweets per day. What would be interesting to me would be a breakdown of my top hashtags.

But, as usual, I have digressed.

The personal data that I track is only in a few categories. I use data to catalog stuff, for the most part: books, videos, music and Legos. I also keep a pedometer log.

Most- if not all- of this data is useless to pretty much anyone except me. But, here we get a peek into the actual application of data science IRL. All data is data, but of all that data, which is most relevant to you? Does Lego care how many 3001 blue elements I own? I think not. They probably do care, however, about my age, where I purchase Lego products, and how much I spend on Lego in a month or year.

This is truly the science and ART of data science. Much of what I tweet on the subject of data science and data analysis is somewhat technical, focused on languages, algorithms and "sciencey" stuff... but business and ethics are also huge, and seem to be marginalized.

"What is the greatest Rock 'N' Roll song of all time"? A valid question. Of course, it is a question that cannot be answered- at least, not with data. Usually, there seem to be three contenders: "Hey Jude" (The Beatles), "Stairway To Heaven" (Led Zepplin) and "Freebird" (Lynyrd Skynyrd).

Likewise, a data scientist must be in tune with business: what is your best product/service? Data science should not only answer that question, but give stakeholders the answers to the five great press questions: Who? What? When? Why? and How? When a data scientist returns valid, data-based answers that are clearly communicated to these questions, the stakeholder has a valid representation of their business based on science and art.

Sorry- I never got around to the humor of Big Data... maybe another time.

As always, I am hochspeyer, blogging data analysis and management so you don't have to.

Monday, October 5, 2015

"How was your weekend?"

A typical Monday-at-work will often early on experience the "How was your weekend" question. It's pretty safe, really, and it's pretty much a soft question- there's almost NO way that this ice breaker can fail. Please note- I said ALMOST.

"Weekend" means the end of the workweek: no work, no school. Downtime. Sometimes, occasionally, it means that despite prior commitments, stuff that needs to get done at home, or an astronomical event that last occurred in 1982 and won't happen again until 2033, you have to work.

I'm trying really hard not to sound like I'm complaining, but the plain fact is this: I do not relish working on Sundays. Sunday has always been the quieter day of the week for me- relax, maybe catch a sporting event on TV, and do a bit of home-related stuff (grass mowing, etc.).

Not this weekend, though. This weekend was nonexistent for me.

I rolled into the office around 2230 on Friday night. It looked like a fairly busy night from the start; I had no way of knowing exactly HOW busy it would be. Busy ended up being NINE hours of overtime of Friday, and another SEVEN on Sunday. The downside to all of this was no weekend. The upside is overtime pay. We just had a few medical expenses (optometrist, veterinarian), so the extra buckazoids are much appreciated. Still, my "weekend" is shot. Gone. Nonexistent.

It's now Monday, approximately 1000. I had stuff I wanted to do that won't get done today because my sleep is totally hosed. Yes, the money is great- and needed- but the time lost to work cannot be recovered. Friday ended up being a 15 hour day, and Sunday was another seven hours (hmm, that sounds familiar!).

It is now Tuesday. We're so busy that I already have 21.5 hours on the clock, and only eight of these are "straight" time- the majority are overtime. And this week shows no promise of slowing down... I predict major overtime for at least the next three days.

Wednesday update- sixteen hours of straight time, and nineteen of overtime. Crazy night.

Thursday- I think the O.T. now is ~22 hours. And my supervisor has even more.

Friday- The week has officially ended with 29.5 hours of overtime. I am going in to work (tomorrow) for the second Sunday in a row.

So, here I sit on Saturday afternoon, decompressing. I'm in the Secret Underground Lair, trying to finish this blog on one computer and playing Railnation on another PC. I'm tired beyond words, but not exactly sleepy... I still have things on my mind. 

Sunday- One is not required to work overtime in my company, but as the reader can see, we've been busy. My supervisor has put in as much overtime as I have (and possibly more), so I went in to the office today. I had anticipated a "light" day- boy, oh boy, was I ever mistaken. I ended up working on three projects in addition to the one I was there for. Total time on Sunday? 13.5 hours. Adding insult to injury, Jennifer had a lovely roast for dinner which I missed out on. I had also planned on a nap- which obviously never happened. Lastly, as I had anticipated a "light", short day, I didn't bring a lunch. Fortunately, I am prepared for situations such as this, and grabbed a canned pasta meal from my emergency office supply.

The last day I had off was two Sundays ago. Not bragging. Not complaining- I get paid to do this. I'm just stating a fact.

That's all from the SUL (Secret Underground Lair) for now. No data news to report, no Arduino progress report, and no Raspberry Pi update.

As always, I am hochspeyer, blogging data analysis and management so you don't have to.