Showing posts with label database. Show all posts
Showing posts with label database. Show all posts

Monday, November 19, 2018

The Lego database, reboot

The "reboot" seems to be something of a thing in Hollywood these days, and so it only follows that Life would imitate "art" as I reboot my Lego database.

For those who have been following this blog for some time, I think you may be aware of my Lego database project. It is one component of "42", my personal, ultimate repository of those possessions of mine which I have chosen to catalog.

I've just started using Microsoft Office 365, and the Lego data came from rebrickable.com. Their dataset is very comprehensive, but I don't think I've ever downloaded ANY Lego dataset that was usable by me as downloaded... and this data is no exception.

I mentioned that the data is quite comprehensive, which means it contains things which I don't necessarily need or want. For example, I do not care about any decals, paper or cardboard items, books, or even certain parts of the Lego product line. So, these need to be removed. The data also appears to come from more than one source, and formatting is necessary. Punctuation needs to be removed from many entries, and part names need to be standardized. Some data needs spelling changes- there are cases of the Queen's English being used, so "windscreen" and "tyre" need to be changed to "windshield" and "tire". These are the major changes that need to happen before I can even think about exporting to an Access database. And these are just a few examples.

But, I digress. Here are the numbers as of December 4th.

When I first started out, there were approximately 29,000 records. As of last night, after culling out items which I knew I would not be inventorying, I had 27,164 unique items. When I deduped the group to include ONLY the base part numbers- excluding all decoration variants, I ended up with a working list of 8,747 unique base part numbers. As I copy the part descriptions to the new inventory list, they are being further culled.

I am now at the point where everything must be done by hand. I find myself going back and forth between rebrickable and my flat database, verifying that the part number in the list is a part that I (may) actually own... and want to count. At a certain point, some of the data is subjective, and even though it is valid, I will count a complete assembly rather than, say, a special tile, wheels and tires as separate parts.

As always, I am hochspeyer, blogging data analysis and management so you don't have to.


Sunday, May 15, 2016

Axl Rose, and Data.

It's been busy at work lately. And the rain has been frequent. And... it's Springtime in my little corner of the world. Consequently, our lawn got a "little" out of control.

(Spoiler alert: this post contains references to Guns 'N' Roses and their songs... 
and a few other musical references!)

Jennifer was kind enough to help out and mow the front lawn, which is what everyone passing our home sees. The back... well, that's another story- and my bailiwick.

My time had arrived. It had not rained for over a day. I looked out the back window and saw the grass blowing like Dust in the Wind. As my gaze landed upon the compost bins, I had to rub my eyes because I thought I saw a black tophat sitting on top of one of the compost bins, and a bandana on top of the other. I shook my head, blinked and looked again. Without warning, the yard had gone from broad daylight to a starless night. The yard's verdancy had turned to a monochrome with rough-cut video quality- I kid you not!. Our neighbor's white fence had been replaced by a gaggle of Marshall stacks, spotlights were illuminating the compost, and Slash, with trademark shades, ciggie and Les Paul, and partially obscured by the output of several smoke machines was furiously spewing out riff after riff, while Axl crooned, "You're in the jungle, baby" as he seemed to float over the grass. I covered my eyes, and shook my head. When I looked again, the moment was over. My yard was green, and back.

Seven Marshall stacks- fourteen cabinets and seven heads 
 Sheesh! Less Monster, more cowbell... maybe.

So, I cut the grass. Yes, it was long- long overdue for cutting. I'm not certain how long it took to complete the task, but I had to make several trips to the compost to empty the mower's bag. When that was done, I decided to tackle another yard task: the ivy. And, to be fair to the myriad of horticulturalists, botanists and assorted green thumbs in my audience, I'm not really certain what this plant is. It has dark green leaves, a woody stalk, and will sprout roots along the stalk. It's also quite capable of climbing. So, we've decide that it must go.

I got most of it off of the red mulberry. Now, technically the mulberry should be a bush, as it has multiple trunks. This one, though, is ~30' (over 9 meters) tall, so I call it a tree. Many websites also call it a tree.

Our Mulberry last Winter

I put the mower back in the garage, and grabbed a pair of leather gloves, a few yard waste bags and some clippers, and proceeded to tear into the ivy. For reference, the bags are constructed of a double walled heavy brown paper, and are approximately 16" x 12" x 35" (40.6cm x 30.5cm x 88.9cm). I filled five of them, and I'm probably only about one-third of the way done. This was sometime last week, and since then, we've had rain at least four times, so the next time I tackle this project, the stubble from the first round should be visible, which should make the next phase a bit easier.

Finally- data! (Sorry, no data pictures!)

On Saturday morning (May 14th), I FINALLY completed the 1st phase of the Lego Peeron data normalization! Speeds and feeds are appropriate here, so here we go!

The original Peeron database I'm using is from March 19, 2012. It contains 18,510 parts (rows). After the first passes through the data- in which the data started out as a text file, and then was converted to an .xlxs file, and then the data had its initial cleansing where a "base" part number was created- the data ended up totaling 16,218 parts. It should be noted that this not a fixed number, as there are more parts that need to go to the "Stickers etc" worksheet. I also want to emphasize that I am cleaning data and not normalizing; after all, all of this data still resides in an Excel file, so it is still a flat file. My next task is to impose order onto this file, so that will require the creation of a unique part number. This part number will be the primary key once it is imported into Access.  

Before any of this happens, I need to come up with a standardized format for my unique part number- currently called the "DB_Tracking_Number". Ugly, but good enough for now. I can't really automate this, once again because of the lack of standardization in the Peeron data, and my attempt to impose my own personal spin on Lego organization.

So, for now, I have ~ 16,000+ part numbers to create.

As always, I am hochspeyer, blogging data analysis and management so you don't have to.



Wednesday, March 30, 2016

Data cleansing, or Purge #4

I had about 50% of what passes for a normal length post written when I decided to scrap it and start afresh. I think I've mentioned that I truly hate rewriting anything, but the post just wasn't going anywhere. More importantly, I had no logical transition to a data breakthrough.

For the past few weeks I've been working on my Peeron-based database (Working with someone else's data and Meanwhile, back in the Secret Underground Lair...). It had been slow going, and I asked a coworker about it. My shy coworker suggested that my solution might be found in Excel.

I thought about that for a bit. I'm not bad with Excel (and that statement alone could be the subject of SEVERAL blog posts!), so I did a bit of exploring. In Excel 2007, on the Data tab, resides "text to columns". As I recall, this feature has been available for some time, but this has been the first time that I've used it. Recapping my challenge: I've got a text file (the Peeron data) which has the part numbers I need, plus much other ancillary data which I may or may not need.  Important part: my source data is a .txt file.

So, I copied the master file to create a new working file. In the old master file, I executed text to columns; in the new (working), and then copied column A to the working file. This accomplished a few things: it preserved Peeron's numbering, as well as their descriptions.

Skipping a few steps, what I have now are four columns: Column D is unmodified Peeron data, column C is a size column, column B is called "Base w/mods" and column A is the "Base" column. I plan to add at least one more column at some point which will be a custom "internal" part number.

But... for now, I'm happy with my data, which is more than I've honestly been able to say about it for a few years! Seriously, this is one of those rubber meets the road blogs,where the plan actually comes together and progress is made! As promised, I've added another sheet to the workbook, and have begun extracting the extraneous data entries from the main spreadsheet, such as stickers, obsolete and moved part numbers.

Most all of the remainder of the process is going to be manual, but the text to columns wizard probably saved me upwards of sixty+  hours of work, assuming 300 rows per hour with the old cut and paste method I had been using. And I say upwards, because I think I only achieved 300 rows per hour once; generally, I've been at 200 or fewer rows per hour.

I'm desperately trying to get this published. I've said on more than one occasion that sometimes blogs just seem to write themselves. Well, this is not one of those occasions.

I'm going to close with a few speeds and feeds. As best I can tell, there were 18,512 rows of data in the original Peeron file. With the new and improved format, I've culled out close to 900 rows of data which are of no use to me. I still have quite a way to go, but for the first time in some while I'm seeing that there is a pot of gold at the end of this particular rainbow.

As always, I am hochspeyer, blogging data analysis and management so you don't have to.


Friday, February 26, 2016

A database against the rules



In my previous blog I noted that I had restarted work on my database. In this post I'd like to talk a bit about database design and construction.

I'm guessing that most of my readers who are interested in data will fall into one of a very small number of categories. The first is a professional DBA who is either interested in seeing how other DBA's do things, or possibly here just for a laugh. The next might be someone tasked with throwing together an ad hoc database with a good amount of existing data, and presenting it in an easily digestible format without creating all sorts of dashboards,relationships, tables, etc. I'm certain there are many other scenarios which may pique one's curiosity, but my aim in documenting my journey is to stimulate an appreciation for and a desire for the application of data.

*For database neophytes: please tread carefully. A long time ago, I read something someone wrote that stuck with me: they said that a great database design starts on paper. So, unless you're like me and can "see" what your database will look like (and/or have built similar databases), design your database first, then start coding. 

Every database has a raison d'être. It's highly unlikely that anyone would wake up one day and say, "Hey, a database of all of the dry pasta in the house would be useful!" Obviously not... I think! (For the record, if you ABSOLUTELY need something to keep track of your pasta, a barcode scanner (or smartphone scanner app) and a spreadsheet would probably be a better solution.

When I was in high school, my appetite for data was whetted by my love of music. Personal computers were just starting to come on the scene, and I didn't even know what a spreadsheet was. Everything I did was done in ink on looseleaf paper. After a while, I graduated from just cataloging my music to staging popularity polls with friends and acquaintances. Quite surprisingly, most of them objected to my methodology, which was quite simple, and leveled the playing field. It worked like this:

A person could vote for up to ten songs and ten albums.I made sure to tell them to rank their choices by most favorite, then next most favorite, etc., so that the top song on a ballot would receive 10 points, the next 9, etc. My reasoning was this: just because 10 voters like Hey Jude, not all like it equally, and the vote should be weighted accordingly. 

For some reason, no one liked that idea.

But that, for me, was the start of databases. Stay tuned!

As always, I am hochspeyer, blogging data analysis and management so you don't have to. 



Thursday, June 4, 2015

(Around) 2.5 years, or, my personal Interwebz v2.1

Around two and a half years ago, I posted my first blog- December 4th, 2012 (it's a pretty short post- even by my current blogging standards!). Even way back then (figuratively speaking) I strove to keep data in the mix as the unifying theme of this blog. And at the end of May, 2015, I'm still blogging about life in general with data as the underlying theme.

It's June of 2015 now, and I've been working on my Twitter account. I consider Twitter serious work primarily because it is an integral element of my online presence strategy- and two and a half years ago, I didn't even know what an online presence strategy was, let alone seeing the need for one. After all, Twitter is one of the places where this blogs posts, but more importantly, it's where I interact with the world of #bigdata and #IoT.

I'm a firm believer in cross-pollenization and shameless self-promotion when it comes to creating a larger presence on the internet, so here's an invitation: if you have an understanding of any of the following terms, have a personal or professional interest in them, or possibly have heard one or more of them before and would like to learn more about them... here's a sample of the topics I address on my (hochspeyer) account on twitter:

#bigdata, #notsobigdata, #smalldata, #microdata, #database, #analytics, #stats, #IoT (Internet of Things), #IoE (Internet of Everything), #M2M (machine to machine), #machinelearning and, of course, #hadoop and #python. I touch on a lot of related topics on that account, but if Big Data and the Internet of Things are topics which you are interested in, you may enjoy following.

My other twitter account (hochspeyer1) still has a tech focus, but has more of a personal touch, looking more at #maker and #programming topics, as well as life in general.

Whew!

The state of the state of my microdata projects: as of today (Jun 4, 2015), the SUL (Secret Underground Lair) is still theoretically experiencing remodeling; by this I mean that while all of the furniture and shelving are in their (probable) final positions, everything that came out and came from upstairs still is in need of homes. Forty-two. my database project, is currently on hold. The Lego database (currently an Excel 2007 workbook) is currently in a data entry phase, and will eventually be a part of Forty-two. Even the Raspberry Pi, Arduino and Python projects are all on hold pending the completion of the SUL upgrade/remodel.  Lastly, I'm building my media library, one song at a time. I'm using Windows Media Player, which I suppose is extremely lazy, but the interface is familiar and it does pretty much all that I want, so for now it suffices.

That's all from the SUL for now. As always, I am hochspeyer, blogging data analysis and management so you don't have to.


Saturday, April 25, 2015

Data, evolving

Quick- fill in the blank: I do some of my best work in ______________________.

If you filled that blank with "the bathroom" congratulations! You, Freddie Mercury and I think alike. I believe Freddie Mercury claimed to have come up with the idea and melody for "Crazy Little Thing Called Love" in ten minutes while soaking in the bathtub in Műnchen (Munich), Germany, and the same thing happened to me Thursday evening!

Well, sort of... I didn't actually come up with a #1 U.S. Billboard chart single for a supergroup in a hotel in the Black Forest in Germany, but I think I've come up with the way I want to get my Lego data into Access.

A few years back, I had designed a flat database for my Lego collection in Microsoft Excel 2003. It was pretty nice and fit my needs at the time quite nicely. It was composed of several pages where the data was entered, and then a couple of pages which gave some bird's eye level analysis via links to the data pages. The project was scrapped mainly due to my impatience with data entry and my inability to actually get an accurate count of all of those elements- besides, they are much more fun to build with than count.

So, my plan is to go back to that format, but instead of merely having links within the workbook, have links from the Excel 2007 workbook to the Access 2007 database. And why 2007 rather than 2010 or 2013? Expediency: the Legos are close to a machine in the Secret Underground Lair (SUL) which runs 2007. The SUL redo is going very slowly, but if I am successful in liberating space for my laptop in my corner of the SUL, then it will be done on the 2013 versions of Excel and Access.

I'd like to get started on this first- creating the spreadsheet with the data, and then importing the data into an Access table and then (hopefully!) linking each data cell in Excel to its corresponding Access field so that the database can be updated via Excel. Its been some time since I did this, and it was on the "pre-ribbon" interface, so it looks like I'm going to be relearning some Excel tricks.

In other data news, Jennifer and I have been walking more as the weather has become a bit more pleasant. My replacement pedometer (identical to my expired Omron model) is chugging along, but its going to be at least another month before I have daily month-to-month data available.

As to my new glasses: the lady who fitted me for frames was slightly surprised as to how quickly I adjusted to my new spectacles, although I think I'm still breaking them in. She also said something that caught me off-guard: I wear my lenses lower than most folks.

 I've never worn gradient lenses before- I've worn reading glasses since the age of eighteen and I'm now 50+; my new specs have three prescriptions per lens: top for distance, middle for computer work and bottom for reading. Although the glasses really do a good job (especially while driving), they are singularly ineffective in the place where I need them the most: computer. Here's the deal: before I went in for the exam, I had made some careful observations and mental notes about my work environment and how I typically used glasses, as well as my concerns about night driving. The gradient lenses the optometrist came up with (pretty much six prescriptions- three per lens!) are fantastic for general purpose applications (life!). My hope in getting this type of lens would be avoiding a second pair of glasses. To quote "Mad Dog" Tannen, "You thought wrong, dude." (*BLAM*) I was very quickly able to pick up the reading area of the lenses, and the distance area (which I don't really need but seems to be beneficial in low light/nighttime conditions) also "snapped" into place pretty quickly.  I really need the computer part, though, and this is where I'm experiencing issues. Simply put: the lens cannot focus on the whole screen- even when I push the frame up tight to my face for the best focus. In the "good ole" 15" CRT days, these glasses would have been miraculous; with the 23" CRT's I have at work and at home, the lenses are just not up to it. And honestly, at home it really doesn't matter, because I'm generally not looking at the whole screen. At work, though- I do a lot of page layout and composition, and I generally NEED to take in the whole 23" diagonal screen at a glance- and more often than not, at a 90 degree offset. So, after only two days, I was back getting fitted for single vision computer glasses! (Just a hint here: when your livelihood depends upon a durable medical appliance- even if your employer or insurance doesn't pay for it: BUY IT!) The good news is that the second par of specs was massively discounted, and I should have them in time for my return from vacation.   

One final note before I call it a night: my twitter account @CjoelHarrison grew 10% in twenty-four hours!

As always, I an hochspeyer, blogging data analysis and management so you don't have to.


Sunday, December 28, 2014

Another look at resolutions

I don't know how things are in your world, but in my world there seem to be a lot of "gotchas". A "gotcha!" is something like, "the other guy blinked" or, "I should've seen that coming". Today's gotcha was the latter. It happened at The Bridge this evening.

I had experienced a fairly fitful sleep Saturday, which was to be expected, I suppose, after being off for two days and then coming to work for one. This time of the year can be something of a sleep challenge for shift workers such as myself. I normally start work around 2000, and am generally done before 0500 the following morning. Our company gives us Dec 24th and 25th, as well as Dec 31st and Jan 1st as holidays. The problem is that this year the 24th fell on Wednesday, and the 25th on Thursday.

As best as I can figure, for the two day holiday, I actually net about a day and a half-after all, I've gotta sleep sometime!

So there I was, short of sleep on Saturday afternoon. I got up, bathed quickly and had a bit of tea. Then I poured a can of Monster into a plastic bottle, we piled into Meerkat, and were off. We arrived at the Bridge and sat down on one of the padded benches in the vestiblue where we usually wait for the doors to the auditorium to open. Tonight's wait was a bit longer than usual, and there were none of the customary sound checks. I overheard someone saying that some technical difficulties had been exerienced. Not long after, however, the doors opened and we funneled in. As we headed toward our seats, I noticed Karl Clauson with a microphone. It was going to be a great service.

I think I've mentioned Karl once or twice before in this blog. His life can only be described as "amazing".

Here's where everything sort of hits the proverbial fan, though. This blog is entitled, "Another look at resolutions".

I don't really recall if it was Friday or Saturday, but Jennifer and I had briefly discussed resolutions. I've never been a fan of resolutions (see Resolutions and other useless things), and Jennifer and I agreed on this point.

Karl spoke on resolutions. Gotcha!

I don't know what it is about our church, but the "gotchas" seem to happen all of the time! Karl's teaching should be up on The Bridge's website in a few days- click on the "Watch or listen to last week's sermon" link to see/hear it.

No data today. The floor of the Secret Underground Lair is covered in the confetti that is the byproduct of shredding old paperwork.  I took that picture about an hour ago- it looks worse now! I think that shredder manufacturers really need to come up with a more user friendly way of transferring shredded paper into one's recycle or trash. Each and every "consumer/SOHO" shredder I've ever used has had this challenge- and this is only a small part of the mess.

I stopped not long after bag #3 was almost full. Tomorrow is Sunday, and I tentatively have three things on my agenda: to finish the shredding and clean up the mess, to put the new motherboard in Jennifer's computer, and to test my new monitor. I guess that's actually four.

At some point in time, I need to get back to working on my database, and to resume my adventures in programming in Python. I suppose that those are worthy tekkie resolutions for 2015.


As always, I am hochspeyer, blogging data analysis and management so you don't have to.

Wednesday, October 22, 2014

The elusive balance

This is a little bit off of the beaten path for me, but something brought it to the fore last week. I was playing an online game when I got into a bit of a "contest" with a rival player. The rival wrote a very eloquently virulent multi-post letter about how evil my actions were and consequently this player needed to resort to extreme actions in the game. Neither of us was entirely right in our in-game stances and actions, and I did bristle on the first reading. After rereading it, though, I became convinced of something much more important than in-game behavior, and that is that I was spending too much time with this game. Make no mistake- this game is right up my e-alley and is well-designed. The problem is that it is a 24/7 RTS (real time strategy game), and therein lies the rub- it CAN be a major time suck. And I do have an offline life.

So, I made an announcement in-game that I had languishing projects, and have drastically cut back my involvement. I don't know how much time I've reclaimed, but I do know I've resumed work on my database.

The thing I find interesting about balance is that while it is ultimately a positive in one's life creating and maintaining it is certainly work- and not the fun sort of work. I'm certain most everyone experiences some sort of work which they really enjoy... for me, that might be blogging or photography. For Jennifer, it's creating delicious meals in the kitchen. For Mr. T., it's digital modelling... I could go on, but you could probably give an example or two from your own life.

Balance, I think, requires planning, steadfastness and a certain dash of practicality. As I was wrapping up my last job of the day, I had a few options for what to do when I got home. I don't usually go right to bed, but wind down a bit on a PC. I thought about a bit of weight training, which I desperately need to get back in to, but our training area outside of the Secret Underground Lair (SUL) is still not ready for prime time. So, Plan B was implemented: catch up on the aforementioned game, and get a blog entry at least started. I'm also trying to do a bit every day with my database in the SUL- this had been a goal when I first started working on it, but as many things go... well, the plan sounded good on paper!

Today, though, I have a very concrete data entry goal: enter or update twenty-three records in the Media_Title table. Twenty-three? Yes- it's a strange number, but it's the number of slots I have open in a small CD rack, so that is my goal for the day. They are stacked up behind me on a folding table, gently whispering just below the whizzing of three computer fans, calling out with their data entry siren song....

Right. Data entry was, is, and will always be... WORK!

As always, I am hochspeyer, blogging data analysis and management so you don't have to.