A homegrown DBA exploring #notsobigdata: #notsobigdata

Showing posts with label #notsobigdata. Show all posts

Thursday, June 4, 2015

(Around) 2.5 years, or, my personal Interwebz v2.1

Around two and a half years ago, I posted my first blog- December 4th, 2012 (it's a pretty short post- even by my current blogging standards!). Even way back then (figuratively speaking) I strove to keep data in the mix as the unifying theme of this blog. And at the end of May, 2015, I'm still blogging about life in general with data as the underlying theme.

It's June of 2015 now, and I've been working on my Twitter account. I consider Twitter serious work primarily because it is an integral element of my online presence strategy- and two and a half years ago, I didn't even know what an online presence strategy was, let alone seeing the need for one. After all, Twitter is one of the places where this blogs posts, but more importantly, it's where I interact with the world of #bigdata and #IoT.

I'm a firm believer in cross-pollenization and shameless self-promotion when it comes to creating a larger presence on the internet, so here's an invitation: if you have an understanding of any of the following terms, have a personal or professional interest in them, or possibly have heard one or more of them before and would like to learn more about them... here's a sample of the topics I address on my (hochspeyer) account on twitter:

#bigdata, #notsobigdata, #smalldata, #microdata, #database, #analytics, #stats, #IoT (Internet of Things), #IoE (Internet of Everything), #M2M (machine to machine), #machinelearning and, of course, #hadoop and #python. I touch on a lot of related topics on that account, but if Big Data and the Internet of Things are topics which you are interested in, you may enjoy following.

My other twitter account (hochspeyer1) still has a tech focus, but has more of a personal touch, looking more at #maker and #programming topics, as well as life in general.

Whew!

The state of the state of my microdata projects: as of today (Jun 4, 2015), the SUL (Secret Underground Lair) is still theoretically experiencing remodeling; by this I mean that while all of the furniture and shelving are in their (probable) final positions, everything that came out and came from upstairs still is in need of homes. Forty-two. my database project, is currently on hold. The Lego database (currently an Excel 2007 workbook) is currently in a data entry phase, and will eventually be a part of Forty-two. Even the Raspberry Pi, Arduino and Python projects are all on hold pending the completion of the SUL upgrade/remodel. Lastly, I'm building my media library, one song at a time. I'm using Windows Media Player, which I suppose is extremely lazy, but the interface is familiar and it does pretty much all that I want, so for now it suffices.

That's all from the SUL for now. As always, I am hochspeyer, blogging data analysis and management so you don't have to.

Sunday, August 25, 2013

#notsobigdata- an epiphany of sorts

I recall writing recently that I was going to lay off of the data-focused posts for a bit and get back to the format of life in and around the Secret Underground Lair for a bit. I was even ready to go with a new topic, a music topic that is near and dear to my heart: why did (fill in the name of a popular and talented musical artist- singer or band) record that atrocity? Furthermore, why did it sell and why did it receive massive airplay? Alas, that will have to wait for another day, because I had one of those moments, and now the Secret Underground Lair's Data Vault will never be the same.

As I had briefly mentioned in an earlier blog, I had updated a table which had the effect of blowing out all of my queries. This is not important right now, as the total amount of data is still under 5Mb(!) and there are still fewer than 900 records in the main data table. For the purpose of updating this table, though, I decided to make a query which would simply list movie titles and formats. So, I sat down with the query wizard and proceeded to make a simple query which would output an alpha list of only movies, as well as their formats.

Simple, right? Muahaha, I guess the air in the Secret Underground Lair was a bit thin when I tried to do this, because I kept on trying the same thing and I kept on getting the same error. Suddenly, as I was staring at the Criteria in Design View, it hit me: it was not working because when I tried to run with the criterion I had selected, the criterion was not recognized. In plain English, I wanted what I thought was data, but Access interpreted it as a displayed name.

Since there are only two pieces of data in the table, I figured out that Access wanted to see the record's autonumber rather than the name displayed in the target table. I switched the so that Criteria==9 (where 9==the record number of the piece of data that I actually wanted to use), and it worked. Which means, if I'm feeling adventurous, I'll save a copy of the database and figure out how the relationships are set up, and then correct so they are pointing to the desired field in the source table. Then, I will eliminate the autonumber field, setting up the only remaining data as the primary key.This will lighten the database a bit, and make it much easier to write queries, as the criteria will all be in plain English. And going forward, all new single field tables will not have an autonumber.

Next time, I'll try to get those songs out, as well as an idea for a relatively ultralight database that I had some ideas about. Until then,...

As always, I am hochspeyer, blogging data analysis and management so you don't have to.

Wednesday, August 21, 2013

What's in a name? Or, for Bowie fans, Changes.

As I progress through a deeper experience (and hopefully a greater understanding) of not-so-big-data, I've come to realize that Ye Olde Blogge needs a bit of a tech refresh. Nothing really out of my comfort zone, but a few changes to make that old, familiar sweater feel more comfortable in 2013. Therefore, I've made a few minor changes to the look, and updated the title- which now more closely aligns with my philosophy on data.

I'd like to take a few steps back and try to explain why #notsobigdata is vitally important, possibly even more than Big Data or Megatrends. By the way, I did not read John Naisbitt's Megatrends, but I do remember seeing it in bookstores (does anyone remember bookstores?)

#notsobigdata, though, is both important and timely. It does not necessarily appeal to larger corporate consumers, but rather to ordinary folks, SOHOs and SMBs... a whole lotta acronyms that identify data producers and consumers... folks that, if they knew how to gather or interpret data, could possibly compete more successfully with the big players in their respective industries, or budget money better.

My #notsobigdata is focused on insurance and entertainment at this particular moment. Insurance and entertainment may seem like strange bedfellows to the casual observer, but have you ever considered how much media you have purchased in the past year? Have you ever considered how you would replace it should something catastrophic happen? In other words, if you have a collection of media that is stolen or destroyed, how would you recoup that loss?

The "cloud" is a consideration, I suppose. You could store all of the data about your collection(s) there. The problem I have with the cloud is that its hardly private- and this is true of all of your web-based email as well. I've never been really comfortable with posting all sorts of photos on the internet, which is the primary reason I don't post many pictures- and the ones that I do post are generally not of persons. I also take steps to control PII (personally identifiable information)- which is why nicknames are often used in this blog.

Not to belabor the point, as I've mentioned this at least a few times- the biggest problem I have with data is actually entering it into the tables or worksheets (I'm currently using both Excel and Access for this project). It's not that I mind doing it, it's just that I'm not particularly quick (~30 wpm). And, I don't have enough data in some of the tables yet to justify making some nice-looking front-end which could speed the data entry process. I do see a future for dashboards, though, and possibly some sort of map-like application. Alas, those things are the toys which may be in the database of my future- but for now, I have to be content with building my as yet to be glorious database one cell at a time.

As always, I am hochspeyer, blogging data analysis and management so you don't have to.

Monday, August 12, 2013

#notsobigdata Query fail

As Mel the cook on (the television show) Alice is purported to have said, "The best D fence is a good O fence". If true, that Mel was a man wise beyond his years. I'll bet he also moonlighted as a data analyst when he wasn't flipping greasy burgers.

Once again, I've lead off with a comment that is seemingly unconnected to anything- but when it comes to relational databases in which most everything is related to nearly everything in some way, shape or form, as the administrator, manager or designer one needs to be constantly aware of what is going on with the Precious. Er, I mean, the data.

A few days ago I had the opportunity to do a little bit of Show and Tell with a buddy and my database. As the database started out as a catalog of our audio and video collections, it's still very heavy with those sorts of records. My buddy was fairly amazed at the amount of movies we have (I'm not sure I've mentioned this before, but I coined the term "videot" with our son Daniel in mind, as he's constantly spurting out factoids and sundry other movie trivia... if I didn't know better, I'd think the "D" in IMDB.com stood for "Daniel" rather than "Data"). I've tried very hard from the onset to make the database accessible (no pun intended) and logical, and judging by my buddy's reaction, I think I succeeded. We were looking at all sorts of records and I was explaining as we went along some of the features and why it had been set up in the way that it was. Then he said, "Wouldn't it be pretty easy and useful to add another column in here that tells where each item is stored?" That, gentle readers, was an epiphany: in a instant he went from one who had never seen a database before and barely knowing what it could do, to making a suggestion to make it more powerful and useful. I told him, "Yes, thanks- I'd been planning on doing that, but it will take one or two more tables to do it properly."

Here's where planning and design come into play. I could add a column, and have a drop-down list for my locations... IF the number of locations was immutable and fairly small. I've done this in other databases where the dataset itself was finite and would not ever shrink or grow. In my situation, our organization is sometimes quite good, and at other times nonexistent. Additionally, we're always looking for ways (well, I am, at least) of improving storage... it does not always work as planned. In fact, a thought just occurred to me that involves the P-Touch. The more I think about this particular idea, the more I feel the need to utilize Visio and Excel- I have a rudimentary asset tag system, but it needs a bit of tweaking....

STOP! Find a happy data place, find a happy data place, find a happy data place.

This week's goal is to continue with data entry and to work on the physical area around Jennifer's PC, as I have some stacks of paper that need to go to a better place. Also, ALL of the queries need to be rebuilt. Although I do use ad hoc queries fairly often, I rely quite heavily on permanent queries for many things, such as deduping and making lists to output to Excel for analysis and counting. And yes, I am aware that in the #notsobigdata world, these tasks could be done in Access, but my little black book of best software practices says that one should use the best tool for the job. The reason for all that? I was going to show my buddy how queries worked, but I had broken all of them when I added a new lookup table to the primary table, so now they are all gone and need to be reconstructed. Vigilance is the price one must pay for clean data.

And now, it's time to wrap this up. I'm checking out of the Inn at the Stream of Consciousness and getting back to work. And, ...

As always, I am hochspeyer, blogging data analysis and management so you don't have to.

Postscript- Every so often, I will feature some older posts- here are today's group.

http://hochspeyer.blogspot.com/2013/01/who-does-that-anyway.html

http://hochspeyer.blogspot.com/2013/01/like-data-we-are-in-it-sense-fully.html

http://hochspeyer.blogspot.com/2013/01/friday-night-in-secret-underground-lair.html

http://hochspeyer.blogspot.com/2013/01/outpatient-postoperative-status-ok.html

http://hochspeyer.blogspot.com/2013/01/something-new.html

Saturday, August 10, 2013

#notsobigdata

Yes, I did create the twitter hashtag #notsobigdata in the wee hours of Wednesday, August 7. It is for my Not-so-big-data theme. And even though I realize that Big Data is the future, Not-so-big-data is the here, and the now, ... and the foreseeable future.

Not-so-big-data can still be a lot of data. For example, I once worked as a data analyst for a Wal Mart vendor. This was back in the Excel 2000 days, when Excel had a limitation of 65,536 rows. Even though I reported on only six to eight SKUs on each weekly report, there were weeks where the data exceeded Excel's capabilities, and I had to grab the data in Access, and then slice and dice it into chunks that the Excel of the day could handle. It was also at this time that I became familiar with what is seriously the best mouse ever devised- the Logitech Trackman Marble. I had never had to select 60,000+ rows of Excel data before, but this mouse made that task easy. We own four of them.

Getting back to not-so-big-data...

Not-so-big-data is what we deal with on a daily or weekly or monthly basis. Some of it is eclectic and ad hoc, often never gracing a ledger or spreadsheet... data like fuel economy or utility usage, but maybe bits and pieces of the family or organizational budget. The bottom line is that these dollars and cents (or whatever one's local currency happens to be) are important, at least for the day or week or other short period of time in question. Jennifer is a perfect example of a not-so-big-data consumer.

And not only is she a perfect example of the not-so big-data consumer, she's also an expert not-so-big-data manager and analyst. As an example, our has meters for gas, water and electricity. Even though these utilities employ meter readers, we send in our readings every month, because when the utilities don't read the meters, they estimate usage. And sometimes, the meter readers just plain make mistakes. We were on the wrong end of a meter reading error once- a 100 USD mistake. Jennifer caught this, and the mistake was corrected.

Along a similar line of thought, I was doing some data entry in one of my Access tables on Thursday before going in to work, and I ran into some duplicate records, or "dupes" for short. There were at least six sets of dupes. Egad! I was not happy to find them, as I am typically pretty vigilant about my rules. This particular table ("Names") currently only has two columns- the autonumber and the name. Right now, the only purpose the table serves is to force normalization in the current primary ("Media") table. The Names table is the sole source of data for the Artist/Author column in the media table- I do not allow free data entry into areas that are prone to seeing repetitive data. As I was working with some books, I glanced over at the bookshelf and replaced the dupes with authors which had not yet been entered into this table. Adding insult to injury, I know exactly when and how the errors occurred.

I had been entering the data of some compilation music cassettes a few weeks ago, and as each cassette has a number of artists, I got a bit careless and just entered several- instead of checking to see if there already was a record. And so for a moment's complacence, I had to spend time fixing rather than creating. I hope to eliminate this from ever happening again by double-checking to see that there are no more dupes in the table, and then setting the duplicates property of the field to == no duplicates.

As always, I am hochspeyer, blogging data analysis and management so you don't have to.

A homegrown DBA exploring #notsobigdata