Wednesday, July 13, 2016

Big Data, Small Data, and 42 Data

Its been a bit since I've written, so a quick update is in order.

Work had not been particularly busy as of late; my reason for the delay in writing has more to do with software than anything else. I finally had the opportunity to upgrade my laptop to Microsoft Office Pro 2010 from 2007, which has been on my to-do list ever since the hard drive replacement a while back. With the installation of 2010, I've FINALLY gotten back to working on the database.

I seem to be unable to stop referring to von Moltke the Elder, and I'm not stopping now. I found an older, saved version of the database, and decided it was not usable, so I'm starting afresh (again!) with Forty-Two, with a much-improved (I hope!) design.

Before I got too deep into this database, I thought I'd do a bit of web scraping and see if I could find a "music template" for an Access database. After a fair amount of searching, I discovered that Access templates- generally speaking- do not exist... at least not in the same league as Excel or Word templates. The best explanation that I've found for this was on an Access board, and I paraphrase here: "Access is pretty much a sandbox developer's environment. You won't find many templates. Period."

So, I'm back to doing it the way I've always done it: making it up as I go. Well, LEARNING as I go.

I suppose I should take this opportunity to make once of my periodic disclaimers: I'm not an expert, but I have a deep interest in Big Data, the Internet of Things (sometimes referred to as the Internet of Everything), data analysis, databases, STEM and the Maker Movement. Okay, back to our regularly scheduled program.

Some time ago- not long after I'd discovered the joy of caring for and feeding databases- I ran into a statement which I thought was a bit curious. It was about database design, and the author stated that the best way to design a database is with pencil and paper. I eventually understood his premise and agreed with him up to a point. My personal perspective is that this can be a great starting point, especially if you're completely new to databases, are looking at a completely new database, or if you're designing with a certain goal in mind. I've built small databases, for example, that were for crunching data in a small project (<200 data points) of mostly text. I've used existing databases and added my own queries to provide quality and efficiency reports for ISO 9001:2008 (at least, I THINK that's the spec!). There have been others as well, but my project....

Let me introduce (or reintroduce) everyone to my pet project, Forty-Two. It's called that quite simply because its ultimate function will be to answer the question, "What is the meaning of Life, the Universe and Everything?", which, for those who who are not immersed in in-print memes, is a reference to Douglas Adams' "The Hitchikers' Guide to the Galaxy".  And this was before I had even heard of Python, and Guido van Rossum's homage to Monty Python's Flying Circus. This is THE truth that I have found: I.T. and literature are strange bedfellows.

So here I sit writing about the database.

Once again back to von Moltke: the best plan can fail. Vis à vis my database. I had a grand thought for normalization: make a names table.The names table is as simple as it sounds: first name, middle name or last name- all are contained in one table.  The problem with this theory reared its ugly head almost immediately: music groups do not fit this model. So, there is a new- unplanned  table: music groups. Although the individual members of said groups may be part of the database at some shining point in the future, for now there is a groups table that just lists the names of groups- its the only way to make soundtracks and other compilations work.

That's it for now- I want to publish this entry.

As always, I am hochspeyer, blogging data analysis and management so you don't have to.

.

No comments:

Post a Comment