Saturday, August 10, 2013

#notsobigdata

Yes, I did create the twitter hashtag #notsobigdata in the wee hours of Wednesday, August 7. It is for my Not-so-big-data theme. And even though I realize that Big Data is the future, Not-so-big-data is the here, and the now, ... and the foreseeable future.

Not-so-big-data can still be a lot of data. For example, I once worked as a data analyst for a Wal Mart vendor. This was back in the Excel 2000 days, when Excel had a limitation of 65,536 rows. Even though I reported on only six to eight SKUs on each weekly report, there were weeks where the data exceeded Excel's capabilities, and I had to grab the data in Access, and then slice and dice it into chunks that the Excel of the day could handle. It was also at this time that I became familiar with what is seriously the best mouse ever devised- the Logitech Trackman Marble. I had never had to select 60,000+ rows of Excel data before, but this mouse made that task easy. We own four of them.

Getting back to not-so-big-data...

Not-so-big-data is what we deal with on a daily or weekly or monthly basis. Some of it is eclectic and ad hoc, often never gracing a ledger or spreadsheet... data like fuel economy or utility usage, but maybe bits and pieces of the family or organizational budget. The bottom line is that these dollars and cents (or whatever one's local currency happens to be) are important, at least for the day or week or other short period of time in question. Jennifer is a perfect example of a not-so-big-data consumer.

And not only is she a perfect example of the not-so big-data consumer, she's also an expert not-so-big-data manager and analyst. As an example, our has meters for gas, water and electricity. Even though these utilities employ meter readers, we send in our readings every month, because when the utilities don't read the meters, they estimate usage. And sometimes, the meter readers just plain make mistakes. We were on the wrong end of a meter reading error once- a 100 USD mistake. Jennifer caught this, and the mistake was corrected.

Along a similar line of thought, I was doing some data entry in one of my Access tables on Thursday before going in to work, and I ran into some duplicate records, or "dupes" for short. There were at least six sets of dupes. Egad! I was not happy to find them, as I am typically pretty vigilant about my rules. This particular table ("Names") currently only has two columns- the autonumber and the name. Right now, the only purpose the table serves is to force normalization in the current primary ("Media") table. The Names table is the sole source of data for the Artist/Author column in the media table- I do not allow free data entry into areas that are prone to seeing repetitive data. As I was working with some books, I glanced over at the bookshelf and replaced the dupes with authors which had not yet been entered into this table. Adding insult to injury, I know exactly when and how the errors occurred.

I had been entering the data of some compilation music cassettes a few weeks ago, and as each cassette has a number of artists, I got a bit careless and just entered several- instead of checking to see if there already was a record. And so for a moment's complacence, I had to spend time fixing rather than creating. I hope to eliminate this from ever happening again by double-checking to see that there are no more dupes in the table, and then setting the duplicates property of the field to == no duplicates.

As always, I am hochspeyer, blogging data analysis and management so you don't have to.



No comments:

Post a Comment