Monday, April 18, 2016

My Sunday, 10K and not so big data

After 200+ blog posts, my addlepated brain isn't even going to request a fact-check on this one: I don't like running. Never have, never will. Our church has, in recent years, supported World Vision, which (among many other things) helps Africans obtain clean drinking water. One of their fundraisers is the team in the Chicago Marathon. It's a wonderful cause, and many folks from our church- including committed couch potatoes- have stepped up and taken up running just to be able to help out. It truly is a great cause, but it's not me. Any number of quotes could sum up my position here, but Donnie and Marie Osmond's theme song from their television series of many years ago sums it up pretty well: "She's a little bit country, and he's a little bit rock 'n' roll". Clint Eastwood's "Dirty Harry" was even more spot-on: "A man's gotta know his imitations".

Yeah. "Don't like" doesn't even begin to describe my feelings about running.

When I was in the Air Force, running was a thing we had to do. The area I was stationed at was quite hilly, and I constantly had shin splints. We used to get together a few times a week after class (it was a training installation) to run. Yuck.

I've long since given it up, but at the time I smoked.

Sooo..., just before the command was given to start the run, someone would yell, "SMOKERS TO THE REAR!" and about a dozen of us would quickly move to the rear in Cirque du Soliel precision, and the run would commence. We would start off as soon as we had lit up. I kid you not: we smoked while running.

So yeah, even though I've quit smoking, I still hate running. I respect those that do run- for whatever reason- but it ain't me, babe. No, no, no, it ain't me babe.

So... 10K: it's not running. It's steps.

I've worn a pedometer on a regular basis for the past few years. I've considered other devices, such as a Fitbit, but in the end I've always decided in favor of my Omron pedometer. As much as I'm a fan of big data, there are just some things I don't want to put in the cloud, and my exercise activity is one of them. I download data from my pedometer to my PC, and I can view it in a nice frontend there. Or, I can download a .csv and use my own analytical tools.

10K steps is a good benchmark. There seems to be a concensus on the internet that 10K is a good daily goal to reach for us in the sedentary Western daily routine. As I have a desk job (AND work at night!), 10K is generally a challenge. So today was a good day.

As always, I am hochspeyer, blogging data analysis and management so you don't have to.

Tuesday, April 12, 2016

Data Intellligence... and a "slight" oops

I want to start off by stating for the record (whatever that is) that although I'm okay with doing it, data entry is NOT my forte, and it is not something I normally enjoy doing. Mind you, I'm not bad at 10-key entry- I learned it through trial-and-error, and am not bad at it still. Still, it isn't one of my favorite data activities.  And yet, in the course of building Forty-Two (my database), I find that I am tasked to do data entry at every turn.

So, ... I do data entry... because I must.

I'm trying pretty hard to keep the blog from becoming a somewhat witty changelog for my database, so I'll throw out a few speeds and feeds and then move on blog-wise.

The other day, I had just posted something on twitter, and twitter's AI recommended a few connections. One of them them turned out to be a long lost relative. I'm not going to go into the details here, but I'm kinda excited, as this is the 1st time I've run into a relative online based upon a machine learning suggestion.

WAIT FOR IT...

I accidentally pressed "Publish" instead of "Save" the other day, and I think that's a first for me. For those who read the unedited, incomplete blog post, I apologize.  In any event...

It's Saturday night in my world. My Friday at work was pretty quiet, and for the first time in several weeks I got home at a decent hour. Got a decent amount of sleep, had a great time of worship at The Bridge, and then Jennifer and I took Meerkat over to our local Walmart Neighborhood Market, where (amongst other things) we picked up a couple of their most excellent take and bake pizzas. As its late, I'm just finish this with these few updates and call it a night.

My last task for the night is some Windows Media Player updates. By "updates", I mean I'm ripping several CDs and deleting a few from the library.

As always, I am hochspeyer, blogging data analysis and management so you don't have to.

Wednesday, April 6, 2016

Happy birthday, Meerkat! (*and: my Bacon number!)

I haven't mentioned Meerkat, our trusty Subaru Outback, in some time. It's a good time, though, as Meerkat's 2nd birthday was the 2nd of April. Granted, she's a bit older than that... but not much- that's the day we brought her home. And for those who may be wondering why I refer to our car with a feminine pronoun, it's merely borrowing from the naval tradition of referring to ships with a feminine pronoun.  And Jennifer reminded me that much like that day two years ago, it snowed. So, Happy Birthday, Meerkat!



Oh, and we just turned 11,000 miles (17,600 km) on the odometer. Now, that doesn't sound like a lot, and in truth it really isn't. According to http://project.wnyc.org/, the average commute for my ZIP (postal) code is around 28 minutes, which is above the national average of 25.4 minutes. My average commute is under eight minutes; on a bad day, getting caught by a train, my commute time will still be under twelve minutes.

In an effort to keep this from becoming a database changelog, I'm just going to say that work on the Lego datasaet is proceeding nicely. I haven't done anything with Forty-Two lately, but that's okay as my time has gone into the dataset.

Lastly, I'd like to introduce you to a term I use quite often IRL: "my good e-buddy". I use this most often when I'm referring to someone whom I've never met except for online contact, or someone I do know IRL, but who is more of an acquaintance than anything else. The reason I mention this is my good e-buddy @lindaregber had an interesting tweet the other day which I liked well enough to RT. She posted a study from Research at Facebook which states that there is an average separation of 3.57 for Facebook users; in other words, you are only separated from ANY Facebook user by <4 persons. That's pretty cool; I'd like to see something along the lines of The Oracle of Bacon for "ordinary folks". Although I'm not a famous person, and I have never acted, I have a Bacon number of 3, which means I'm closer to Kevin Bacon than I am to any random Facebook user! How? Alex Trebec was in Dying Young  with Vincent D'Onofrio, who was in JFK with Kevin Bacon. So, how do I get the 3? Alex has a Bacon number of 2; I met Alex in Frankfurt a.M. after trying out for Jeopardy in a Stars & Stripes sponsored event.

As always, I am hochspeyer, blogging data analysis and management so you don't have to.

Saturday, April 2, 2016

This IS normal(ized) in my world

I've mentioned once or twice (I think) that I'm not a professional DBA; I do this stuff for fun! (??? ORLY?)

Yup, 'tis true: it's something of a hobby, or repast, or possibly a fairly intense interest. In any event, this interest has made me money before, and hopefully will do so again. This particular post is especially ironic from a few angles: for starters, I've also mentioned that I try to do as much normalization of a database right from the start, and if you read the last post AND/OR you're a dba or involved with databases, you'll note that I've pretty much thrown out all of the data normalization conventions for this flat database. The reason is pretty simple: I know my client (me!), and I know what data WILL be needed, and what data MIGHT be needed. Also, in the end, this Excel workbook is still data.

So, to recap, I have a .txt file that I have converted into something resembling a flat database. Three of the current four columns have repetitive (if not identical) data. Why?

The answer is simple in my context, but if this were someone trying to apply a fix to an existing problem for, say, a paying client, and trying to explain that you had to create repetitive data to make the resulting database entries more efficient... well, not necessarily so simple.

I was talking with one of my coworkers earlier this week, who is a data guy at heart. A month or so ago, he had taken a stab at trying to extract those target part numbers from the text file and had dissatisfying results. The other day, I showed him the new and improved .xlsx data file, and warned him that this was something akin to anti-normalization. When he looked at the file, he cringed in agreement. However, once I explained exactly why I had created columns with duplicate or near duplicate data, he agreed with my process. And therein lies the proverbial "rub": even though I've formatted this .xlsx file as a flat database (which came to me as a .txt file, which I'm guessing was extracted from some other format), it isn't a database- even a flat one.

It's a dataset. Period. And as it is a dataset, it needs cleansing, not normalizing.

Now, having said that, the casual reader is probably wondering how this dataset will be utilized. Well, THIS is where it gets interesting (ORLY??). Before I can import ANY data into the database, it needs to be usable. In my world, this means I need to be able to count pieces by size, color, family and type; in other words, I need to make this usable by Forty-Two (the master database). So, once I clean up the data, my goal is to import all of it into Access as a part of the inventory.

That day is still far away.

As always, I am hochspeyer, blogging data analysis and management so you don't have to.