A homegrown DBA exploring #notsobigdata: Microdata

Showing posts with label Microdata. Show all posts

Thursday, June 4, 2015

(Around) 2.5 years, or, my personal Interwebz v2.1

Around two and a half years ago, I posted my first blog- December 4th, 2012 (it's a pretty short post- even by my current blogging standards!). Even way back then (figuratively speaking) I strove to keep data in the mix as the unifying theme of this blog. And at the end of May, 2015, I'm still blogging about life in general with data as the underlying theme.

It's June of 2015 now, and I've been working on my Twitter account. I consider Twitter serious work primarily because it is an integral element of my online presence strategy- and two and a half years ago, I didn't even know what an online presence strategy was, let alone seeing the need for one. After all, Twitter is one of the places where this blogs posts, but more importantly, it's where I interact with the world of #bigdata and #IoT.

I'm a firm believer in cross-pollenization and shameless self-promotion when it comes to creating a larger presence on the internet, so here's an invitation: if you have an understanding of any of the following terms, have a personal or professional interest in them, or possibly have heard one or more of them before and would like to learn more about them... here's a sample of the topics I address on my (hochspeyer) account on twitter:

#bigdata, #notsobigdata, #smalldata, #microdata, #database, #analytics, #stats, #IoT (Internet of Things), #IoE (Internet of Everything), #M2M (machine to machine), #machinelearning and, of course, #hadoop and #python. I touch on a lot of related topics on that account, but if Big Data and the Internet of Things are topics which you are interested in, you may enjoy following.

My other twitter account (hochspeyer1) still has a tech focus, but has more of a personal touch, looking more at #maker and #programming topics, as well as life in general.

Whew!

The state of the state of my microdata projects: as of today (Jun 4, 2015), the SUL (Secret Underground Lair) is still theoretically experiencing remodeling; by this I mean that while all of the furniture and shelving are in their (probable) final positions, everything that came out and came from upstairs still is in need of homes. Forty-two. my database project, is currently on hold. The Lego database (currently an Excel 2007 workbook) is currently in a data entry phase, and will eventually be a part of Forty-two. Even the Raspberry Pi, Arduino and Python projects are all on hold pending the completion of the SUL upgrade/remodel. Lastly, I'm building my media library, one song at a time. I'm using Windows Media Player, which I suppose is extremely lazy, but the interface is familiar and it does pretty much all that I want, so for now it suffices.

That's all from the SUL for now. As always, I am hochspeyer, blogging data analysis and management so you don't have to.

Monday, May 18, 2015

Data, defined (part 2)

Right after I hit the "PUBLISH" button on my last blog, I realized that I wasn't done. I know I had the option at that point to pull the piece back and add the other thoughts, but I don't like to throw out a wall of words just because I'm not done... I'd much rather give the reader a break and come back another day, and so here we are today with a continuation of sorts, taking a closer look at microdata.

But first, an update from the home front.

Sunday the 17th was the third Sunday that Jennifer had spent in the Dallas area. Our older son was off at a convention, leaving Mr. T and I a very quiet weekend. That's a good thing, too, as I still managed to rack up a sleep deficit. I've mentioned a few times that I'm a programmer that works nontraditional hours. I refer to my band of coworkers and myself as Nightstalkers. The big plus and big drawback of being a Nightstalker is that one often gets to stay at work until the job is done, which can sometimes mean a fairly long day, but the plus is that we are compensated for that time. Saturday ended up being a late day for me- nearly eleven hours, and then a technician was coming over to the house for the Spring air conditioning checkup at noon. At some point before noon I decided that I could not stay awake, so I asked Mr. T to wake me up when the tech arrived. The tech arrived and did his thing. I wrote a check for his service, and then went back to bed, getting up some time around 2030. Looking back, I really don't remember too much of what I did except for a bit of work on the Lego database. I was back in bed ~0430, and up Sunday a little after 1230.

Sunday was warm and the humidity was palpable. I opted for some breathable training attire to cut the grass. I have to say that I am perfectly capable of wearing some pretty nice-looking clothing combos, but fashion has little place in my workout or working outdoors clothing choices. As it was both sunny and windy, I had an Aussie-inspired wide-brimmed hat with a chinstrap. The short sleeved shirt and shorts were both black sweat wicking workout attire, and the footware: orange sneakers. Blood orange red, actually. New Balance all terrain running shoes. Peer reviewed, double blind studies utilizing FLOOS and LRBL have verified that these shoes allow me to cut the grass 19.3% faster than the average suburbanite. You read it on the Internet- it's got to be true!

After cutting the grass, I figured I'd take a walk. One would think I'd have learned my lesson from the last time I did this (two weeks ago, actually). No. No I didn't. I grabbed a fanny pack (these workout shorts don't have pockets) and headed out. Approximately an hour later I walked back into the house, drenched in sweat carrying an empty half liter water bottle.

All of that is a great segue to microdata. Why? Well, for starters, I have an Omron pedometer. I have the option of publishing my workout data to their website- in which case, my data would be a part on Omron's small data, and quite possibly, fitness big data. My choice, though, is to upload the data to the Omron tracking program on my computer, making it MY microdata. In the FWIW category, I logged 6.2 miles (13.64km) today- my best day in nearly two months of tracking.

The Lego database is growing slowly. I'm using Excel 2007, and having to relearn some things. I'm sometimes asked what should someone learn in Excel to be useful on the job. Well, it depends on the job. Every place where I've used Excel I've needed at least a few things that no one else asked for- and none of these were financial or statistical environments (which tend to be a lot more predictable in terms of desired skills). The Lego counts stand as follows: Basic bricks- 1 part number, 12 colors, 1666 elements. Plates- no counts as yet. Technic- 3 part numbers, 3 colors, 1049 elements. Total elements (pieces)- 2715.

As always, I am hochspeyer, blogging data analysis and management so you don't have to.

Wednesday, May 13, 2015

Data, defined

Although it is not my intent, I am certain that this post has the potential to step on a few toes, possibly bruise an ego or two, or ruffle some feathers. I may even get someone mad. Really e-mad.

For starters, I do not have any letters, diplomas, certifications and am not currently professionally employed in whatever one might consider the "data community". Whatever that might be. I do not claim to be an expert or have any special expertise or training in the areas of Big Data, the Internet of Things/Everything, Statistics, Analytics or The Cloud. I was once employed as a data analyst working with Small Data for a short time.

Whew!

So, who and what exactly am I?

I'm a guy who tweets (and retweets) primarily on the subjects of Big Data, IoT, programming and related topics. As far back as high school- maybe even earlier- I've been interested in data. It was either my music collection or Fletcher Pratt's Naval Wargame that gave me my start in classifying and quantifying. I remember even attempting to do a few music surveys way back when, and some of the respondents were unhappy because the polls were not simple popularity contests, but the answers were weighted based upon their position on the poll. Fast forward to today. I'm currently building a flat database of my Lego collection in Excel 2007 (why 2007? Because that's what I have on the computer nearest to the Legos!). This, in turn, will be added to my master database Forty-Two- so named because it answers the question of Life, the Universe and Everything.

Having said ALL of that, I'd like to start off by saying that the term "data" may not be as concrete as we are lead to believe. In my world, data comes in the following flavors: Big Data, Not-So-Big Data, Small Data, Micro Data, and Statistics. Depending upon the size of the dataset(s) and one's perspective, most- if not all- data can fit into more than one classification. Really? Sure. Case: say there's a hypothetical high school senior who is one of the stars of his basketball team. He's a good defender, doesn't get a great deal of fouls (below the league average), and is about average in scoring- except he leads the league in free throw percentage. Several colleges and universities are interested in him- they've got data on this fellow going back to 6th grade. That's data- to them. To me, a person who could care less about basketball- it's nothing more than a bunch of irrelevant stats. On the other hand, these same scouts would not be impressed by the number of PhD's that follow me on Twitter.

So, how big is a Big Data dataset? I asked a coworker. He wasn't sure, but thought a mail list might qualify. Don't laugh too soon- some of the mail lists I've seen have more than 10 million names. To me, though, I'd put that in the Not-So-Big Data or Small Data categories. The IoT, Amazon, Google, Youtube and Wikipedia definitely fit into the Big Data category, but to the average person, these can be tough to visualize. So, for what I think might be a decent, understandable Big Data dataset, I propose the 2010 U.S. Census. It was a 10 item questionnaire (with a few extra answers possible) that mailed to 135,000,000 addresses representing approximately 309,000,000 persons.

Small Data could be a database, a website or the phone directory of a small to medium sized city- the lines are pretty fuzzy here.

Lastly, there's microdata. I'm not sure if this term is used anywhere else, but I find it to be a convenient term for personal data- data generated and maintained by one person or one family for their own use and not often formally shared. A cataloged collection of coins, stamps, recipes, exercise/workout logs or Legos- all of these are Microdata in my worldview.

Thanks for your patience- I hope you enjoyed this. I generally write a lot less... I'm not a fan of writing or reading walls of words!

As always, I am hochspeyer, blogging data analysis and management so you don't have to.

A homegrown DBA exploring #notsobigdata

Thursday, June 4, 2015

(Around) 2.5 years, or, my personal Interwebz v2.1

Monday, May 18, 2015

Data, defined (part 2)

Wednesday, May 13, 2015

Data, defined

Popular Posts

About Me

Search This Blog