Guide to my GitHub projects and uber-geekery

My GitHub repositories

overall project notes and themes

Almost all of these projects are my own pet projects. They aren't paid projects.

There is a theme to some of them: I'm looking for fundamentals and / or high precision. I am looking for "the end of the line." When I could not get the late 2019 edition of the time.gov clock widget to run on my web site, I created my own which may be better than time.gov on a number of counts: clock repo, live clock. (The live clock will work on Chrome on phones, but perhaps not other phone browsers. I had a misunderstanding that led to some incompatibilities.)

Before that, on a similar line, I was trying to answer what should have been a simple question: how to a query a time server to figure out my clock's offset, or just to get very precise time easily? I later discovered "chrony," but, before that, I found / created an answer: ntp live, ntp repo.

I now provide an easy source of nano-precision time and the chrony data to validate the time (nanotime web server repo).

Along the lines of the highest precision, in order to provide a nanotime server, I figured out how to access nanotime from PHP. That same PHP extension provides another fundamental / high precision: it provides a primary (unique) key on multiple threads / cores by getting down to a CPU / core's / thread's fundamental clock tick and the core / thread number. Otherwise put, it provides a primary key with no semaphores / mutex mechanism needed.

My GMail unread count checker solved a problem in early 2017: my phone was totally unreliable for updating new emails. As of late 2020, that situation is much better with the latest Android and GMail app and such, but I still use my web app all the time to be sure. It brings a smile to my face to be sure, after all the work I did to create that.

Monitoring of my AWS EC2 stats sort of goes in this category of knowing, fundamentally, what's happening: AWS EC2 repo (web, email), AWS EC2 live.

A primary job of the flight simulator scripts was to "score" my landing based on how soft it was. Before that, I was just banging into the ground without much idea of that. I probably wasn't injuring my sim passengers, but I probably was damaging the landing gear.

I suppose the web server access logs stuff is a search for line-item-level precision. I'm making progress (as of 2020/11/24), but that one hasn't geled yet.

The timesheet app is a quest for high-precision billable hour keeping if it ever gets anywhere. I'd actually completely forgotten about that one for a while.

2020, November 24

true random number generator

The latest madness is deriving true random numbers from an empty microphone circuit--that is, a microphone input with no microhpone attached. With a 32 bit sound sample (4 bytes), I am taking only the lowest byte. Thus, I am picking up electrical noise. When I play the output to speakers, it should and does mostly sound like white noise / static. However, I can hear radio wave "whistlers" occasionally, which is a good indication of picking up electrical noise.

"My" random numbers pass "rngtest" and feed rngd quite well. Those are demonstrations of randomness. Otherwise put, with my script / program feeding into rngd, /dev/random outputs quickly and indefinitely. Without a hardware / true source, /dev/random is very slow.

2020, October 20

The latest fun and games has been web server access log analysis. When you visit a site, the web server records your IP address, timestamp, the HTTP command ("GET /index.html"), the HTTP status code ("200 OK", 404 'not found'), your "user agent" (more on this below), and a few other items.

From something like June 10 to October 15, my site has around 271,000 lines in the access log. Something like 85 - 90% are robots: mostly search engine crawlers like the Googlebot. Another large percentage of the remaining percentage are more subtle robots that make more requests in a few seconds than a human is capable of. If there is a human or a thorough robot, then a page GET comes with a favicon, other images, possibly JavaScript, etc. So those amount to redundant lines in the log--redundant for my purposes, which I explain a bit more below.

So the point is to make sense of all this and figure out how many humans there are. I absolutely will not put Google Analytics on this site, so that options is out. I am tempted to elaborate on that, and one day I will. For now let's just say that Google got rid of "Don't be evil" from its code of conduct in May, 2018, although they'd unofficially gotten rid of it years earlier.

On the other hand, I will register this site with them for search purposes. Even the good guys still use Google Search. Perhaps Google and some of its executives will soon be successfully prosecuted and sued for a number of extremely serious crimes, and all will be well. I see some chance of that happening in the next few years.

Because I'm on the subject, yes, I realize GitHub is owned by Satan's Software Company, but if they want to host my code for free, I'll take their service. GitHub was a center of the universe before the buyout: even the good guys use it. *sigh*

I'm not sure how useful Google Analytics is anyhow, but I'm not going to find out. My father's site has some open source analytics, but I tend not to trust that, either. I suppose the point is that I've been wanting to do this myself for years, and now I'm getting closer.

One fun part of this was using all 12 of my CPUs' cores and then MongoDB's "insertMany" to do the initial parse of those lines in something like 4 seconds. With one core and, as I remember, not saving to anything, I think it took about 20 seconds.

While setting up the multi-core stuff, I also managed yet again to create an infinite process fork loop and crash my session. That's always fun, and it feels like 1995 when any infinite loop would crash a session. Perhaps I need to make a PHP pcntl_fork() wrapper that limits the forking. A "static" variable would partially do it. Or better yet a wrapper that counts the CPUs and limits to that.

A static variable would not account for the child processes' forking. I might need something more complicated. I'll think about that.

Anyhow, I'm getting some reasonably well filtered results into a browser. Then I cleaned up the "user agent." I'd show you a human's user agent, but I'm having trouble finding one with all the flippin' robots, and my latest results heavily modify the ua. In any event, a user agent shows something to the effect of "SAMSUNG FAB-EXPENSIVE-30000 Android 8.1 Firefox/257.2 ..."

The motivation for this latest madness is that I'm trying to figure out how much it's worth investing in my web site. I have reason to believe that more humans than I would have thought visit it ("this"), and I'm trying to be sure before I invest time.

2020, October 13

I finally do discuss a few more repositories in relation to my new nanosecond-precise time server. Ironically, I haven't put that code in GitHub because it's 3 unique lines of PHP code right now. (Pop quiz: what are those lines?!)

2020, October 3 - high-precision time for primary keys

For the last few days I've been working on high precision timestamps that can be used as unique keys / primary keys. I am nearly certain I have solved the problem I was trying to solve, but it was an interesting path.

discussion of the unique timestamp problem

PHP has offered a microsecond-level timestamp for many years and versions. That has any number of problems for my purpose: it doesn't easily come in an integer form, equality in floating points is tricky and thus can't be used for unique keys, one can only get to 0.0001 seconds or so with a floating point due to the number of digits of precision, the default string form has a space and period in it, and there's more. Even if none of that were true, microsecond level time will not guarantee uniqueness especially on multi-core machines. I demonstrated a few days ago that nanosecond-level time won't guarantee uniqueness.

More specifically, I forked processes onto all 12 of my cores and had them all bang away at hrtime(1). I saved the data to an array. Only then did I dump the array into MongoDB. With 12 cores X 1,000 iterations each I always had 3 - 4 collisions.

So then what? I went looking for an assembly language solution that gives both the basic CPU clock tick (like 2 - 3 GHz) and a core number. Or, at least, I was looking for something like that, and that's what I found. It's known as the Time Stamp Counter or __rdtscp() in C/C++.

I've done a great deal of C, mostly decades ago, and a decent bit of C++. If I looked hard I'd probably tell where C becomes C++, but in the case of my extension (mentioned below), I tend to think it's plain C. With that said, I'll just call it C++ from now on.

Anyhow, TSC is all well and good in C++. What about PHP? I was aware that the PHP interpreter and libraries are written in C++, but I didn't know much more than that. I found out how to make my own PHP extension to execute C++. Or, more specifically, call __rdtscp and send the answer back into "PHP space."

I got up over a million calls to TSC with no collisions, using the same collision tester referenced above--the one that flunked hrtime(). When TSC is run in pure C++, it takes 32 clock cycles to run. So, theoretically speaking, it seems impossible to have a collision with both clock tick and CPU.

With all that said, that's only unique for a given machine AND a given boot session. That is, the clock starts over at 0 when the machine reboots. I've been considering a script that starts at boot to keep track of such things, but perhaps it's time to stop this madness.

related timing / timestamp observations

From my experiments I've tentatively concluded that any PHP function takes at least 600ns to run, on a mediocre machine by 2020 standards. I'd imagine that's the time it takes to go from "PHP space" to machine language space. I should probably understand that better. TSC takes 32 cycles (10 - 15ns) to run in pure (compiled, binary) C++ and around 800ns with my extension. hrtime(1) also takes around 800ns.

Calling a for a new MongoDB ObjectId ($o = new MongoDB\BSON\ObjectId();) (and PHP doc) also takes something like 700ns and gets you a true UUID--a universally unique ID. That is, it is unique by time, machine, core / process, etc. Any object id likely is the only one on earth. The point being that I have a lot more respect now for the BSON object id.

There is always more to be said, but this makes for a good first entry, I hope.