Monday, November 19, 2018

The Lego database, reboot

The "reboot" seems to be something of a thing in Hollywood these days, and so it only follows that Life would imitate "art" as I reboot my Lego database.

For those who have been following this blog for some time, I think you may be aware of my Lego database project. It is one component of "42", my personal, ultimate repository of those possessions of mine which I have chosen to catalog.

I've just started using Microsoft Office 365, and the Lego data came from rebrickable.com. Their dataset is very comprehensive, but I don't think I've ever downloaded ANY Lego dataset that was usable by me as downloaded... and this data is no exception.

I mentioned that the data is quite comprehensive, which means it contains things which I don't necessarily need or want. For example, I do not care about any decals, paper or cardboard items, books, or even certain parts of the Lego product line. So, these need to be removed. The data also appears to come from more than one source, and formatting is necessary. Punctuation needs to be removed from many entries, and part names need to be standardized. Some data needs spelling changes- there are cases of the Queen's English being used, so "windscreen" and "tyre" need to be changed to "windshield" and "tire". These are the major changes that need to happen before I can even think about exporting to an Access database. And these are just a few examples.

But, I digress. Here are the numbers as of December 4th.

When I first started out, there were approximately 29,000 records. As of last night, after culling out items which I knew I would not be inventorying, I had 27,164 unique items. When I deduped the group to include ONLY the base part numbers- excluding all decoration variants, I ended up with a working list of 8,747 unique base part numbers. As I copy the part descriptions to the new inventory list, they are being further culled.

I am now at the point where everything must be done by hand. I find myself going back and forth between rebrickable and my flat database, verifying that the part number in the list is a part that I (may) actually own... and want to count. At a certain point, some of the data is subjective, and even though it is valid, I will count a complete assembly rather than, say, a special tile, wheels and tires as separate parts.

As always, I am hochspeyer, blogging data analysis and management so you don't have to.


No comments:

Post a Comment