Sunday, February 28, 2016

Managing data better

I just discovered a fairly large data loss. As mentioned in a previous post, I rebuilt my laptop due to a failed HDD. Unfortunately (no one is looking, you can raise your hand if this has ever happened to you), I had a fair amount of data on that drive which was not backed up.

To paraphrase Hall and Oates, its gone. So, to paraphrase the unofficial motto of Chicago ("Vote early, vote often"), Save early, save often.

As promised in my previous blog, I'm going to spend a bit of time on Lego, because as far as my data is concerned, it is handled a bit differently than the other items which I am cataloging.

Lego and I go way back, but I didn't attempt to catalog my Lego collection until fairly recently. I've given consideration to including Lego in the "big database" (Forty-Two), but have found that- at least for me in this particular application, Excel is the better tool for me to use. Let me try to unpack that a bit.

Some time ago, I had the opportunity to observe how several different businesses utilized software to work with data. Some preferred Microsoft Access, and most preferred Microsoft Excel, or, to be a bit more generic, spreadsheets were preferred over databases. In only one business were both used- but independently, rather than in a complimentary fashion. The preference of software had little to do with function, generally speaking. Rather, it was more about culture and familiarity. In every case I observed, the results could have been improved simply by not just "thinking outside the box," but merely by thinking.

In my case, a spreadsheet seems to be the best solution- based upon my experience. My primary reasoning behind this is because Lego is a single thing which does not need to be linked (or, related) to anything else. And, even though I may be interested in a bit of analysis of the Lego "population", in the larger scheme of things Lego exists in its own unique bubble: shapes, colors, themes, sizes. I could build tables based upon these and other categories, but once again, they would relate only to Lego.

Forty-Two, on the other hand, illustrates quite well the differences between a flat database (the Lego spreadsheet) and a relational database (Forty-Two). With the Lego spreadsheet, I can see a snapshot of all the facts of my Lego collection. But with Forty-Two I can write an ad hoc query to tell me where all Lord of the Rings media is located. It would show where each book, soundtrack, game and video is located. It would also show the number of copies in a given format. It would also tell me the last time the media was viewed.

There's also one additional aspect of Lego databasing which is unrelated to software, but makes cataloging it so much easier: storage.

I'm an AFOL (Adult Fan of Lego). AFOL is a title; more of a descriptor, actually, as it really doesn't carry any of the clout that, say, CCNA carries. Still, it differentiates me from most adults who play with their kids while playing with Lego. And, its somewhat hard to say that without sounding like some sort of pompous jerk, because it sounds like I'm slamming parents who "play" Lego with their kids. Quite the contrary! If you're a parent who engages with your kids over a pile of Lego, kudos to you!

An AFOL, though, uses Lego as their primary creative medium. There are professional Lego artists out there who make a good living by building amazing models for corporate clients. There are also educators who use Lego in either a standard classroom setting, or in a program for ASD (autism spectrum disorder) kids. Even architects use Lego for models. And although each of these examples is an example of adults working with Lego for a living, the typical AFOL is something else.

They're a bit of a subject expert on Lego. They probably have some advanced building skills. But mostly, I think, they are something of a type of a Leonardo da Vinci. They are probably the precursors of the maker movement, which is another subject entirely.

Back on track: AFOLs need organization, and the English word for this is storage. Over the course of many years, many bricks (elements) will be collected. I've found that the (current) best way to keep track of these is to put them in plastic bags (connected in groups of 10) with a card inside indicating the count and the date of the count. These, in turn, reside in drawer organizers.   

As always, I am hochspeyer, blogging data analysis ad management so you don't have to.



Saturday, February 27, 2016

Feeds, Needs and Speeds

Well, it's starting to get real. The database, that is. As I posted earlier on Twitter, 200 rows of data in a single column do not a RDBMS make, but it's a start (the count is currently 224). As Jackie Gleason quipped in one of his signature lines, "And away we go!"

The casual reader may be wondering at this point: there are actually folks out there still building relational databases? Isn't the non-relational NoSQL model more popular in terms of new deployments, versatility and just plain coolness?

I'd like to explain a bit of my personal journey that brought me to Forty-Two, the database.

Long ago, like teenagers everywhere, I faced highschool graduation without a plan. Not just a clear plan, mind you- NO PLAN. Computer Science was in its infancy at the time, and nearly nonexistent in most high schools. I was not one of the cool kids, nor was I a jock or a brain, but I was also not a nerd. However, I knew one or two nerds. And the nerds were highly focused in their scholarly discipline: they were not into math and science; they were into math. They were geometry slingers, wearing leather slide rule holsters on their hips. They had glasses, thick glasses. Below average complexion. Few social skills. Pocket protectors in their left shirt pockets. And behind the pocket protectors... punch cards for their next "program".They were the late 70's analogues of Drs. Sheldon Cooper and Leonard Hofstadler (The Big Bang Theory).

I was not them. Well, sort of not like them. I liked history. I was (probably) the worst kind of history nerd: I was a military history buff. I started out with Avalon Hill games like France 1940 and Panzer Blitz, and progressed to Tobruk and Squad Leader, eventually culminating in the non-Avalon Hill classic Fletcher Pratt's Naval Wargame. The point is this: as the complexity increased, the playability decreased, as did the number of folks willing to take on the rules. But, I digress.

I was accepted to Rosary College (later Dominican University). I declared history as my major, and spent two unremarkable years there. I eventually had five colleges or universities under my belt, with no undergraduate degree to show for all of the buckazoids invested.

Long before this became a mainstream theory in education, I discovered that we do not all learn in the same way, and that higher education was not really the best choice for me.

Fast forward a few decades. I've mentioned this fairly recently- I was working as a data analyst at an "action sports" company. The company designed paintball equipment, and had it manufactured overseas. My job as the data analyst was to take Wal Mart RetailLink data and dice and slice it into what my employer could use.

The problem was my employer was using either Office 2000 or 2003, which limited Excel to a maximum of ~64K  (I believe it was 63,536) rows. As time went on, my data often exceeded this artificial limitation, and I was forced to use Access just to grab the Monday morning data. Once again, skipping several steps, I became adept at moving data between Excel and Access.

Fast forward one more time to today. I use all sorts of tools to do my job; Access and Excel aren't really part of my professional portfolio of commonly used programs on the job, but I use them at home,.. pause for effect.

Yes, although I may have mentioned a bit about Forty-Two before, I don't think I've said too much beyond it was my own little database dev world. Here's where the title comes in: way back when I was an I.T. reseller, we used to often qualify sales by talking about speeds and feeds- equipment specifications. Needs are also important (besides completing the alliterative trilogy).

So, Forty-Two is an obvious reference to Douglas Adams works, and is so named because its' goal is to answer that elusive question: what is the meaning of Life, the Universe, and Everything. It is starting out life as an Access database, currently with only one table. Previous iterations have taught me to take it easy with adding tables, so my aim is to get this Titles table to be mostly complete before adding additional single- or very few-column tables and then finally starting to create the relationships.The first table is called Titles simply because it holds titles: books, videos, software... if its media, then its name goes here. Why? Forced normalization: why go through the normalization process when I can start out with a relatively clean dataset?

This is starting to turn into a wall of words (by my standards, anyway!), so stay tuned... Lego is next!

As always, I am hochspeyer, blogging data analysis and management so you don't have to.

Friday, February 26, 2016

A database against the rules



In my previous blog I noted that I had restarted work on my database. In this post I'd like to talk a bit about database design and construction.

I'm guessing that most of my readers who are interested in data will fall into one of a very small number of categories. The first is a professional DBA who is either interested in seeing how other DBA's do things, or possibly here just for a laugh. The next might be someone tasked with throwing together an ad hoc database with a good amount of existing data, and presenting it in an easily digestible format without creating all sorts of dashboards,relationships, tables, etc. I'm certain there are many other scenarios which may pique one's curiosity, but my aim in documenting my journey is to stimulate an appreciation for and a desire for the application of data.

*For database neophytes: please tread carefully. A long time ago, I read something someone wrote that stuck with me: they said that a great database design starts on paper. So, unless you're like me and can "see" what your database will look like (and/or have built similar databases), design your database first, then start coding. 

Every database has a raison d'être. It's highly unlikely that anyone would wake up one day and say, "Hey, a database of all of the dry pasta in the house would be useful!" Obviously not... I think! (For the record, if you ABSOLUTELY need something to keep track of your pasta, a barcode scanner (or smartphone scanner app) and a spreadsheet would probably be a better solution.

When I was in high school, my appetite for data was whetted by my love of music. Personal computers were just starting to come on the scene, and I didn't even know what a spreadsheet was. Everything I did was done in ink on looseleaf paper. After a while, I graduated from just cataloging my music to staging popularity polls with friends and acquaintances. Quite surprisingly, most of them objected to my methodology, which was quite simple, and leveled the playing field. It worked like this:

A person could vote for up to ten songs and ten albums.I made sure to tell them to rank their choices by most favorite, then next most favorite, etc., so that the top song on a ballot would receive 10 points, the next 9, etc. My reasoning was this: just because 10 voters like Hey Jude, not all like it equally, and the vote should be weighted accordingly. 

For some reason, no one liked that idea.

But that, for me, was the start of databases. Stay tuned!

As always, I am hochspeyer, blogging data analysis and management so you don't have to. 



Sunday, February 21, 2016

Time to enter the data

Before the advent of memes on the internet- in fact, long before the internet was commonly available- there were radio and television commercials. Well, there were commercials in the parts of the world where private companies, as opposed to the State, owned broadcasting companies... I'm not sure how State-owned and operated media operate(d). But, where private enterprise ruled, these commercials were the memes of their day. They had memorable taglines that everyone seemed to know.

"Plop, plop, fizz, fizz. Oh what a relief it is."

"I can't believe I ate the whole thing,"

"Where's the beef?"

"Have you any Grey Poupon?"

"Time to make the donuts."

There's a metric s***-tonne more, but the Dunkin' Donuts one resonates most with me when it comes to nostalgia and data entry. There were a few different commercials with this theme, but they all had one thing in common: a haggard, sleep-deprived, middle-aged man in pajamas rousing himself from desperately needed sleep to go to the store to make donuts at an unspeakable hour just so that you, the hungry consumer, could enjoy a fresh pastry for breakfast or your morning coffee break. In forcing himself to provide a (*ahem) "much-needed service" (selling us donuts), the character in the commercial was sympathetic and appreciated.

Mind you- this is an analogy, not a direct comparison. The only thing that the character in the commercial and I have in common is that we have to force ourselves to perform a task. In my case, that task is data entry. My end product, on the other hand, is pretty much purely intended for my own consumption: after all, no one has a desire to know any of the specifics of my database.

So. Even though I'm not known to be a great goal-setter, on Friday I made myself a simple one: set aside one hour for some data entry Saturday evening. I'm happy to report I accomplished this. However, there are caveats,

For starters, although I enjoy working with data, data entry is not my cup of tea. I do not type rapidly- the last time I was clocked I came in at around 32WPM- but I'm pretty accurate. Jennifer made a wonderful impromptu dinner, and over the course of that dinner (about an hour) I managed to enter an "impressive" thirty-six movie titles into a table! (The curse of owning your data is that you're responsible for your data!) So, where do we go from here?  

Well, "Forty-two" is back up and running- even though it may be the laughing stock of data bases on the internet. Currently one table with thirty-six entries... but yes, it's a database. Not searchable or relational, but still a nascent database.

I had also intended to start my Lego spreadsheet, but that's another project for another day. My blog spreadsheet is up to date, and with that I think all of my data news s covered.

As always, I am hochspeyer, blogging data analysis and management so you don't have to.


Thursday, February 18, 2016

CC

Or, #200

Before anything else, I'd like to extend a huge "THANK YOU" to anyone and EVERYONE who has taken a bit of time to read this blog!

And, as far as I can tell, at least 20% of my readers reside in countries where the official or predominant language is not English, so I'm dedicating this post to YOU!

I'll be the first to admit that my list of "pet peeves" is long and extensive. But, as I've dedicated this to my non-native English speakers and readers, I'd like to spend some time on the quirks of the English language (American, not Queen's... that's a whole different topic!)

For starters, there's vocabulary and usage. I'd like to highlight two of my favorite abused and overused words: tactical, and organic. Both of these words have very plausible, somewhat specific dictionary definitions. And both are horribly abused in pop culture and advertising. Per Google, tactical is "of, relating to, or constituting actions carefully planned to gain a specific military end." 

More specifically, tactical refers to the smallest units of battle- fireteams, squads, platoons or even companies. Above that is operational, and at the top of military planning and action is strategic. I often see ads for "tactical flashlights". Hmm- I wonder how large a strategic flashlight might be, and how many car batteries would be required to power it. Tactical also seems to refer to the militarized version of a common item, making it somehow superior. We purchased some shortbread cookies in tin canisters a few years back, and I quite enjoyed telling everyone that once the cookies were finished, these containers would be converted to "tactical snack containers", as they are approximately the same size and shape of a popular potato snack, except that they are metal rather than cardboard... hence, they are tactical. I also have a repurposed stainless steel water container- approximately .5 liter volume. It's been repurposed because it leaks- it is now a tactical utensil carrier. It keeps my eating utensils safe while I commute to work in a modern automobile on two- and three-lane paved roads at near-highway speeds.

Organic. Don't even get me started....

Okay, I went there. The opposite of organic is inorganic- like chemistry. Definition? "Not consisting of or deriving from living matter."

In other words... inedible. Poison. Stuff that will kill you. There is no such thing as inorganic food. So, it follows that there is ALSO no such thing as organic food, in the sense in which it is advertised: in actuality, ALL food is organic. There is such a thing as non-GMO food, or all-natural, or free-range.

So, basically, tactical is just silly, while organic is offensive.

That's it for a few of my favorite misused words.

My database is close to being brought back from dormancy to project status. I've made a few fundamental design changes- all I need now is data entry!

As always, I am hochspeyer, blogging data analysis and management so you don't have to.



Monday, February 15, 2016

Super Bowl L

I did not watch this year's Super Bowl.

For what it's worth, I have not watched a Super Bowl since the Chicago Bears humiliated, dominated, and destroyed the cheating, the Clam-bakes..., er... New England pre-Deflate-gate Patriots 46-10 in Super Bowl XX.

If the Bears are not there to win, or the Packers are not there to lose, the Super Bowl is unimportant to me.

In a way, I am a true sports fan. Not in the way that most folks would define a sports fan. I am a fan and a follower of teams that are MY teams. Da Cubs. Da Bears. Da Hawks. Ferrari. Subaru. I could care less about all other sports or teams. And, if you don't know what sports the previously mentioned teams are players in, that's okay.

So, you might be wondering how I spent "Super Bowl" Sunday? For starters, Mr. T. and I had a little adventure.  We made a pilgrimage to the Central Continental Bakery, where we procured some exquisite pączki. The interesting thing about Central Continental Bakery is this- when you step through the door, a part of you steps into Europe. Mr. T had enjoyed pączki before, but they were from a different bakery. Once he had experienced "the real deal", though, his taste buds were forever tuned to the "real deal"

That's all for now- the next post is #200. It should  be special.

As always, I am hochspeyer, blogging data analysis and management so you don't have to.

Saturday, February 6, 2016

Happy e Day

This was originally going to be Potpurri #3.

I changed my mind about the title when I found about e.

Now, I consider myself to be fairly well read as far as English Literature goes (vs the general population). I've read all of the "required" high school reading- Shakespeare, various classic and modern poets, and a few plays, as well as having read a substantial amount of both fiction and nonfiction outside of the classroom. Granted, the "classroom" list might be a bit old, but so are the classics of literature. Math, though... that's another story.

For those of us (and I include myself in this list) who are relatively clueless about the "higher" functions of math, I'd like to introduce a new player- new to me, at least.

Last year, I made a big deal about pi Day, as it was a centennial pi Day (3.14.15). This year is not a special e Day, but 2018 will be (2.7.18).

I should back up here a bit and mention that math and I did not get along well in high school. In fact, if a subject could have been personified, math would have been my scholastic bully... for all four years of high school- even though I had math subjects for only two years. My first year, I studied geometry, and did so poorly at it I nearly went to summer school.

I survived that, though, and had algebra as a sophomore. Although in algebra, I didn't get close to being banned to summer school (remedial education), I never really quite "got it". At this point in life, I was so turned off by math that I dropped a philosophy class later in college when I figured out that the class was actually (sort of) math: symbolic logic.

Flash forward a few decades to today. I'm integrating much of what I used to hate and misunderstand into everyday life. My current challenge is to embrace what was formerly despised.

Which brings me to today. It's a tradition where I work for 3rd shift to produce a short email detailing what was done overnight.  These emails are fairly short and to the point. Somewhere in the past year I started adding factoids about the current month at the bottom of each email. To be honest, I'm not sure how many folks in my department read these emails, but I try to keep them at least a little interesting. That's how I learned about "e" Day.

So, let me explain the little I know about e. This is wikipedia's definition. As I've mentioned, my mathematical foundation is shaky at best, but I think I finally have an understanding of irrational numbers! That's my explanation. :) Another e-day noteMany celebrate e-Day as Euler’s constant day on February 7 (2/7) in the month/day date format. It is a day where people recognize the significance of the number e, which is approximately 2.71828. Although this is the most commonly recognized holiday for e-Day, there are other worldwide events that are observed that have nothing to do with the e number constant.

Euro day occurred in European countries that simultaneously adopted the Euro on January 1, 2002. In New Zealand, eDay is a day where people can get rid of e-waste or old electronics such as computers and old appliances, so they can be recycled rather than being placed in a landfill. Engineer’s Day is observed in Paducah, Kentucky on February 21 where many have an egg drop contest, create edible cars and tape people to walls. Eday is also an island in Northern Scotland.

And, for those who are interested in sports in the United States, e Day coincides with Superbowl Sunday. I could care less, as the Packers will not be there to lose, nor will the Bears be there to win.

That's all for now. We will be celebrating e Day with pączki, purchased from Central Continental Bakery, a lovely European-style bakery in the northwest suburbs of Chicago. These pączki are incredible- stepping into this bakery is like stepping into Europe!As always, I am hochspeyer, blogging data analysis and management so you don't have to.


Wednesday, February 3, 2016

You said this was about data analysis!

Regular readers are familiar with the tagline that ends every one of my posts, "As always, I am hochspeyer, blogging data analysis and management so you don't have to." There's a bit of truth there, as well as a bit of attempted humor.

For starters, I've been using "hochspeyer" as my online persona for years. "Hochspeyer" (upper cased H) is a village in Germany where Jennifer and I lived for a few years.  My online name (hochspeyer, lower cased h) is something I've been using for years. And although the topic of data doesn't come up as frequently as I'd like, I do try to feature it when possible. One problem I have is that I don't do as much analysis as, say, when I was actually employed as an analyst... for 15 USD/hr.

Pregnant pause for effect.

Yes, I once made only $15/hr working as a data analyst (this might be lots of money in some parts of the world, but in the greater Chicago area, its not- especially when you're feeding, clothing and providing housing for five persons). According to payscale.com, it should have been (at a minimum) 39K per year. According to an inflation calculator, the USD inflation rate between then and now is ~17.6%, so I was roughly 2K under the base salary for an analyst, or roughly 5%. However, at that point in my life, it was the only gig I could score, and the plus side was I learned quite a bit about basic data analysis, Excel, Access, customers, internal and external deadlines, and adding value to both your output and your position.

As a temp, I did analysis for $12/hr. My boss there was a lady who was the Quality Manager. The plant was a contract manufacturer of aluminum die castings for the automotive industry. Although they produced products for all three of the "Big 3" (General Motors, Ford, and Chrysler), their main customer was General Motors (GM). This was when Saturn was an exciting GM product line. The company had made a significant investment in equipment that was specifically designed for small cars, such as the Saturn line. Unfortunately, the Saturn line had disappointing sales for GM, and was ultimately scrapped. I apparently was doing a good job there, however: there were two rounds of layoffs in the company before I was let go. My value add there wasn't special- I integrated Excel data into an existing Access database. I also had the temerity to give reasons when they asked why I couldn't extrapolate results (when one has neither data or experience, one is up the creek without a paddle in terms of validity!)

With overtime, computer crashes and life in general, I haven't had much time to think about data as of late, much less do any analysis. However, the further I get from 10K blog viewers, the stronger the pull is to do a bit of analysis on blog hits to date.

So, here we go: blog speeds and feeds.

I started writing this blog in December of 2012. As of January 2016, I've published 196 posts to this blog- roughly five posts per month. As I'm no professional writer, I've got to say that this paltry volume is a challenge to maintain.

I've been tracking numbers from Day One, and they break down like this:

As far as continents go, only Antarctica has zero readers- Antarctica is 7th.

South America comes in 6th. With the limited amount of data I get back from Google (it's free, so I'm not complaining), I can only guess that language may be a factor in the readership here.

In terms of continents, Australia has the fewest number of countries... 1, and it is predictably 5th in the ranking. However, I think a few more countries could probably be considered a part of Australia for statistical purposes- New Zealand comes to mind immediately, but that's about it. If anyone has any ideas, please comment. The rest of the continents are as follows:

Africa (4th) is actually tied with Australia in terms of total readers, but these readers are distributed among a few countries, so I have to award the spot to Africa.

Europe comes in 3rd, which is slightly surprising because of the widespread use of English there. Twenty-seven countries represent Europe.

Asia is represented by seventeen countries and comes in 2nd, mainly because of Chinese readers.

My top market, again not a surprise, is North America, with a bit over 75% of my global total in readership. 

That's it from here for now. Before signing off, I'd like to extend special e Day (2/7) greetings to Prof. Diego Kuonen!

As always, I am hochspeyer, blogging data analysis and management so you don't have to.