Warning!: Tech Content Ahead

 Yeah, I'm a full - though somewhat retired - nerd.  

What's that, you say?  Duh?  I know.  But every so often I feel like I need to do something technical to keep those skills ... less dusty.

And I suppose this might be of interest only to me and three other folks in the world, but here we go.

I am very much missing a number of much older tools.  For many years, my favorite text file editor was a little program called PE - made by the great folks at the WordPerfect Corporation.  That is still my favorite all-time word processor.  Word is nowhere near the tool that WP was, even before Windows, but that's because Microsoft doesn't care.  They have no competition for Office, so who the bleep gives a rat's hind end.

I sure as hell do.  

Because I also desperately miss a long-gone tool called "SuperKey".  Made by the Borland organization, that tool was a record and use macro tool that worked in any application.  I would use it to copy data from Lotus123 into a WP document, or into a batch file.  Or automate any terrible operation.  I remember wanting to see if I could produce a graph that made sense to me in Lotus123, and so, to do so, I created a column of 1000 formula that would generate two random numbers between 1 and 6.  I should probably remind you, dear gentle reader, that this was back in the day when the internet was ... well, it was limited to a few key universities, and a huge number of military assets.  And that was it.  As in before Facebook.  And AOL.  And ... well, the popularity of a modem in a computer.  Yeah.  We're talking 1985.  

And what I did was I distinctly remember creating a "macro" that would copy the thousand results of those little cells, and paste them, as values, not as new formula, on the end of the batch of numbers I'd just completed.  That is, I would copy cells A5:A1005, go over three columns to D, then hit end-down, and on the bottom, add another 1000 results.  

And then the formula next to it would do the simple work - and count the number of times each possible total appeared.  That is, it would total up how many times 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, and 12 appeared - individually - and then  I could update a graph to see how the bell curve worked.  Yes, I was taking a stats class in college, and wanted to see if I could use a spreadsheet to replicate the data we might need to analyze.

So yeah.  Anyway, back in the modern day, I found myself deeply wishing I had both of those tools because I'd exported my bookmarks.

Yeah, I still use bookmarks.  Have for ... a very long time.  Back about thirty million jobs or so (okay, it was actually my second really technical job) I found myself jumping from Apple Mac computers to IBM PC computers to ... well, servers, other servers, and etc.  And I found that I wanted some of those bookmarks everywhere.  Which is what gave birth to the idea of my "Portal" home page.  It started as a combined resource where I updated the bookmarks by simply stealing them from where they were.  Back in the dark ages, a bookmark file was typically stored in an HTML file called Bookmarks.html, or possibly just bookmarks.htm - sneaky people, them.  

But I'd combine the results from multiple machines into one single file.  And it's been my home page, and grown, since then.  My current home page is now PortalV22.html, which is roughly the 22nd MAJOR revision, not counting several hundred minor changes.  The current version is basically laid out in a table, with six columns, and three rows, of cells.  The upper right first row, first cell, or in good old Lotus123 nomenclature, would be A1, is my good old days.  It contains links to all of the original Daynoter web sites, and some of my own heavily used resources, like to various emails, search engines, and places like APOD - The Astronomy Picture Of The Day - which I used to visit daily.  Now it's more of a quarterly thing.  B1 is primarily technical resources, and some technical shopping.  C1 is more technical stuff, while D1 is news resources and places like the local zoo's web site.  E1 is my Scouting resource links, and F1 is combined with F2 and F3 to list some camping equipment retailers and resources up top, then a link to about half of the Wisconsin State Parks and every Minnesota State Parks, all with mileage from - roughly - where I live.  I've moved a few times since starting that, but it's fairly accurate.

A2 is a more complete Daynoters set of links, while B2 is a series of links to some geographic resources for my son, a cell I filled in when he was taking his first (and only) high school AP class in geography.  He didn't much care for it.  C2 is a lot of retailers - some internet-only, many fall into the section I've titled "Bricks-n-Clicks" - both.  D2 is a list of government resources - from national down to local.  E2 is more Cub Scouting resources I found over the years and did not want to lose.

The third row has grown over the years.  Initially A3 was traffic information, including many traffic cams on my various work routes, and job hunt resources as well.  B3 is a huge batch of links from when I was trying to make pens for a living - I couldn't make it work.  C3 is now a very long list of woodwork project plans and web sites, because, well, I like to think I'm a woodworker.  D3 was another series of links of resources I needed to rely on regularly - doctor's offices, pharmacy links, etc.  E3 was where I indulged in my Google Earth obsession, and found and bookmarked a rather long list of places - places I'd lived, family lives, friends, places that had been important to me, or still are.  

What's that got to do with technology stuff?  Well, of late, I've fallen into an addiction of the Instructables sort.  It goes back close to a year, now, that I've spent at least a few minutes a day looking at various projects others have done and posted about on the Instructables site.  And when I realized I had more than 1800 projects bookmarked, I thought I needed to do something a little more ... well, serious about organizing them.

And this is where the talk turns to the technical stuff.  I have many tools I've relied on over the years.  I often did my HTML work in Notepad.  Yes, I started putzing around with a bunch of tools, including FrontPage, which was quite impressive for the time, but as usual, Microsoft had a fantastic bit of kit they drove into the ground and then chucked out behind the trash cans for ... well, for the history books to stumble over it forty years from now and wonder just what the hell made them leave it to rot.  Or perhaps it's only me, but it is what it is.  I've learned how to write much cleaner, faster, more readable code than FrontPage ever barfed out, but again, the fact the bear danced the waltz quite smoothly wasn't as remarkable as the fact that the bear chose not to eat his partner.  And, well, Microsoft.  'Nuff said.

Back to those 1800+ instructables I've bookmarked.  I wanted to put them into a collected HTML page so I could see them in an organized fashion.  Yes, I'm pathological about organizing my bookmarks (I have some folders that are 7 layers deep to maintain organization), but I also like to see the list.  And it occurred to me that Firefox, my browser of choice, will vomit out the HTML in a bookmark export file. I can also copy and edit the JSON file that backs up the bookmarks I have (you didn't know?  You probably should back up your bookmarks, and then email yourself a copy.  That way it's stored in a cloud, somewhere - safely).  But I wanted to just take an export, grab the Instructables folder, and all the subfolders, and barf that onto a page.

Except for one problem - when one exports Firefox bookmarks, a hell of a lot of cruft comes along with.  Each link I'm concerned about in the Bookmarks file include the URL - that is, the address of the web site for that bit of information - and I'm also interested in the Title - that is, the stuff in the menu bar at the very top of the screen, which tells you - or should, if the person has any mental capacity when it comes to putting together a web page of any sort - what the page is about.  

Firefox includes some additional bits of ... well, garbage I really don't care about when it exports bookmarks.  The first is the date the bookmark was added, and it also tracks the last time I visited the page.  And, in the most heinous of crimes, it also includes some data which seems to be the representation of the icon.  

HTML Education 206: If you look at a web page, and notice the menu bar at the top of the screen, the very first item on that title bar is typically a little icon.  In HTML, this is the Favicon.  Back when I was building web pages, this was a very small image you could create with just about any graphics program, including Paint, and convert to an Ico format, which means anyone creating a bookmark on their computer to remember that web page also downloads this little image and it appears next to the text in their bookmark list.  What is new - that is, that which has changed since about 2002 - is that the icon is stored in the bookmark file in what looks like a rather long text string.

And I didn't need that crap in my instructables page.  97% of the icons stored are that little yellow Robot they use.  Some differ slightly.  But the trick then becomes what to do about that trash.  When I pulled out the 1800 plus links to instructables out of my bookmarks, the collected file was - you may want to sit down for this part - a bit over a 13 megabyte file.  

I know, I know, you need to remember that my first hard drive was a 40 Meg drive.  Back about ... seriously, 26 years or so.  No, I do not still have it.  It's been reduced into ... individual components.

But 13 megabytes is a big file.  And when I have to do semi-professional stuff like editing HTML, and it's a big file, Notepad is out.  I rely on Notepad++ - which is an upgraded editor.  Nowhere near the capabilities of good old PE, but it does recognize, and allow me, to search and replace and include extended characters, such as the CR/LF combo.  

What's that?  Oh, you see, at the end of most lines of text you see on the internet, there's no end.  That is, they're usually collected in a long string, but at the end, in HTML, there's usually a symbol that looks like <BR> or </P> - The first is a line break.  The second is a paragraph break.  The difference between the two?  A paragraph break usually inserts a blank space between the line above and the line below.  A line break simply places the next line right below the current one.

What's that got to do with CR/LF?  You have to remember that computers were, initially, considered to be something of a souped-up replacement for a typewriter.  And when you hit the end of a line on a typewriter, you would hear a "ding" that would tell you you're nearly at the edge of the page, start a new line.  Many old manual typewriters had a lever which sort of hung over the keyboard, and you'd get used to slapping that lever.  It would turn a roller that moved the piece of paper up - a Line Feed - and your continued pressure until the lever stopped at the other edge of the page, that would perform a return to start - a Carriage Return.

And so, if you're still reading at this point, this is the beginning of the whole CR/LF thing.  Back many years ago printers were strange little machines that had a bunch of pins lined up and they were pushed onto a ribbon which pressed itself against the paper - thus depositing the ink from the ribbon onto the paper.  Not all that unlike a typewriter, honestly.  But the whole CR/LF thing became a little trick that some early computer users used to create ... well, art.  Many years ago, I downloaded onto a tape - yes, a tape - a collection of a number of graphics known as "ASCII ART" - they were pictures of people, animals, places, and things.  Some were one or two pages of an image, and some were sized to fit entire walls.  

Back in my computer system adminstrator days in college, I remember "confiscating" a particularly gigantic file which printed an image of the surface of the moon - and that image was to be printed on "wide" paper, or something that ran about 11x17 inches, and if the output was lined up right, it formed a picture that was about 12 pages tall by 10 or so wide.  And would burn out about three printer ribbons.  Because Ink Jet and Laser Printers for computers hadn't been invented yet.

But what's that got to do with my geek task?  Simple.  I wanted to turn each of the links in my bookmark file into a link on a basic simple HTML page.  And I needed to strip out the garbage that came from the export, which typically included the ADD_DATE="xxxxxxxxxxxx" LAST_VISIT="xxxxxxxxxxx" ICON=" (imagine roughly 5200 characters, ending in) CYII=">.  

If I'd been lucky enough to keep a copy of SuperKey still working today, I could record a macro that would allow me to move my cursor to the next occurrence of ADD_DATE, go to the beginning of that text, back up one more character to get the space before it, then select from there to the CYII="> combination, and replace all of it with one > character to close my HTML tag.  

But life ain't that easy any more.  And NotePad++ will allow me to search for those occurrences, but to get to select all of the text betwixt them is...  Well, it's not easy.  It may be possible, but it wasn't working for me when I tried to create a macro within Notepad++ that did it.  Then good old Search and Replace saved my bacon.  

I went looking for every occurrence of /" ADD_DATE=" and use the Search/Replace function to replace that string with /">\n\rADD_DATE=" - which told Notepad to go find that string, and stick in a closing caret (">"), a new line ("\n") and a carriage return ("\r") - CR/LF - it started a new line with the ADD_DATE=" information.  Then I did the same damned thing with the CYII="> - I replaced that with CYII="\n\r.  When I was done, I had the beginning of my HTML A HREF tag, which is how you create a link to another page, on one line, the next line started with ADD_DATE=" and held all of the crap I didn't want, and then the line immediately after the garbage was the title of the web page that went with the link two lines up.

What good did that do me?  Oh, friends, you've forgotten the power of the command line.  And our dear old friend FIND.  

Huh?  How did we get from HTML to a command line?  You see, I once mocked UNIX for including a command-line calendar program.  But like UNIX, DOS still exists deep inside Windows, and certain command-line tools are still out there, and powerful as hell if needed.

And one of those is the Find tool.  You can type the command TYPE FILE.TXT | FIND "Fred" - and you'll see every line in your FILE.TXT file that contains the characters "Fred".  And Find can also display the line count for the number of lines that Fred appears in, or several other tricks.

But what I did, was save the file I'd been working on as INS_LINKS2.HTML - and then I got to the command line in the same folder as the INS_LINKS2.HTML file, and executed the command below:

TYPE INS_LINKS2.HTML | FIND /V "ADD_DATE" > INS_LINKS3.HTML

That dirty little secret told the computer to display, on the screen, the contents of INS_LINKS2.HTML - it's a plain ASCII text file, no control codes or anything that makes it somewhat difficult to look at things like word documents and the like.  But instead of looking at all of that text - some 13 Megs worth of data - I used the pipe tool - it's that vertical line usually right above the backslash key on most modern keyboards - and it sent the output, instead of to the screen, to the FIND program.  Find was told to look at every line, and display every single line that did not include the ADD_DATE text.  And rather than get all of that output sent to the screen, I dumped it all into the new INS_LINKS3.HTML file.  How did I do that?  The greater-than symbol is a redirector - it sends the output of the previous command where I told it.  If I'd wanted to combine my output with an existing file, I'd just use two of the >> symbols - that tells it to put it in the file - and if it's already there, add it to the bottom.  If not, create it.  

Why would I do such a silly thing?  Rather than go through the remaining lines, I could, automatically, eliminate 1621 lines that started with ADD_DATE.  Which left 3301 lines, of which I need to go through and replace the CR/LF combination where I don't need it.  Which should be easy to do.  

So why am I so tickled?  I cut the file size from 12,910,905 (12.31 Meg, because you do remember a kilobyte is 1024 characters, and a megabyte is 1024 kilobytes - or 1,048,576 characters - or a little over a million) down to 385,737 bytes.  So the end result was about 2% of the size of the original file.  

That's some pretty serious taking out the trash...  I'm happy with it. 

Comments

Popular posts from this blog

NEC TurboGrafx, Sega Genesis, and Me...

Slightly Better Than Unsuccessful Woodworking Day

NeverWalz.com and anti-aliasing...