• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

Programmer thoughts

Apr 24, 2020
491 (2.57/day)
I'd like a thread for low-effort to medium-effort programming thoughts. If your thought doesn't deserve a topic, feel free to post it here! Any subject, any language, any skill level.


To start off the topic, I'm studying "constraint programming" for a Puyo Puyo AI. Its an obscure match-4 video game with a deeply dedicated community. I have a feeling that I can represent Puyo Puyo boards as a constraint solving problem (similar to 3-SAT). As such, I've been kinda playing around with this (assumed) NP-complete, or harder (#P-complete problems).

Anyone who has completed Comp. Sci. algorithms courses knows that NP Complete (or harder classes: such as #P Complete) are "unscalable", its assumed impossible to invent an algorithm that can solve this problem in a scalable manner. And as an undergraduate, I mistakenly thought that this means that if you ever proven (or assumed) a problem to be NP-complete or harder, that you should "give up" and find another problem.

This was a mistaken thought. NP Complete (and harder) problems scale very poorly. But that means that all problems below a certain size complete almost instantly.

Just because you have an NP-complete problem (or harder) doesn't mean you're hopeless. If you can prove that your problem's scope is limited, then the problem can suddenly become solvable.

Consider Chess exploration or Chess AIs. Trying to solve Chess is infeasible. But if you limit the problem's scope to a certain size, such as "All Board positions with 7 or fewer pieces", it suddenly becomes possible to solve. The 7-man Sygzy Tablebase has been completed, exhaustively proving and solving all possible chess positions of a certain size (7-pieces): https://syzygy-tables.info/

Another chess-example is the Perft problem: calculating and enumerating all possible openings. Perft(11) has been solved in less than an hour on modern hardware, and perft(15) has been claimed to be solved by some people. This problem scales poorly (exponentially more moves every perft(n+1)), but drop down just a few numbers to Perft(10, and you pretty much solve the problem instantly on modern hardware.

In the case of Puyo-puyo, I'm not necessarily trying to fully solve PuyoPuyo, but instead trying to make a super-human AI that can chain better than any human. (Which is rather difficult, humans are really, really good at making chains in this game).

I guess: my initial post is... "NP Complete" or harder (#P complete) problems are not hopeless! You can still make useful programs in the face of these unscalable problems. Learning how to write code that tackles this extremely hard class of problems can be a joy. Your goal is to solve as large of a problem as possible.
Last edited:


Staff member
May 14, 2004
20,997 (3.49/day)
Processor Core i7-4790K
Memory 16 GB
Video Card(s) GTX 1080
Display(s) 30" 2560x1600 + 19" 1280x1024
Software Windows 7
Apr 24, 2020
491 (2.57/day)
Another goal can be to be "good enough", or even "better than nothing", also AI
Good point.

A good demonstration of "good enough" is the following piece of code:

$ cat test.c
int main(){
    for(int i=1; i!=0; i++);
    // Time how long it takes to exhaustively explore a 32-bit number

$ gcc -O1 test.c; time ./a.out

real    0m2.107s
user    0m2.107s
sys    0m0.000s
That's right, you can exhaustively run through all 32-bit numbers in less than 5-seconds these days. It shouldn't be a surprise: the 32-bit space is ~4-billion, and modern CPUs have a heartbeat of 4GHz (4-billion per second).

If you're unable to exhaustively search the search-space entirely... randomly searching 4-billion spots and then picking "Best of 4-billion" is often "good enough" in these kinds of problems. Ex: Explore 4-billion random solutions to "Traveling Salesman", and pick the best solution you found. Its not likely to be the true optimal solution, but its probably good enough for most purposes.


If you use multiple threads, or GPUs to explore the state-space, you might be surprised how good modern computers are at brute force.
Aug 20, 2007
13,596 (2.82/day)
System Name Pioneer
Processor Intel i9 9900k
Motherboard ASRock Z390 Taichi
Cooling Noctua NH-D15 + A whole lotta Sunon and Corsair Maglev blower fans...
Memory G.SKILL TridentZ Series 32GB (4 x 8GB) DDR4-3200 @ 14-14-14-34-2T
Video Card(s) EVGA GeForce RTX 2080 SUPER XC ULTRA
Storage Mushkin Pilot-E 2TB NVMe SSD
Display(s) 55" LG 55" B9 OLED 4K Display
Case Thermaltake Core X31
Audio Device(s) VGA HDMI->Panasonic SC-HTB20/Schiit Modi MB/Asgard 2 DAC/Amp to AKG Pro K7712 Headphones
Power Supply Seasonic Prime Titanium 750W
Keyboard WASD CODE 104-Key w/ Cherry MX Green Keyswitches, Doubleshot Vortex PBT White Transluscent Keycaps
Software Windows 10 Enterprise (Product of work, yes it's legit)
Benchmark Scores www.3dmark.com/fs/23478641 www.3dmark.com/spy/13863605 www.3dmark.com/pr/306218
I guess I should dump this here, this is more programmers humor than anything:

I was trying to correct a piece of code in a KSP related mod. The purpose of this code is to walk up a list/tree of bodies orbits to find and return a root star. I submitted the following in a stroke of latenight, braindead genius:

/// <summary>
/// Returns the host star directly from the given body.
/// </summary>
public static CelestialBody GetLocalStar(CelestialBody body)
while (body?.orbit?.referenceBody != null)
if (body.orbit.referenceBody.isStar)
body = body.orbit.referenceBody;

return body;

Yeah, that's pretty much the most messed up loop I've ever submitted. I may as well be drunk. First it checks if the reference body (the thing it's orbiting ) is not null (meaning body is not a star, not what we want). Then it enters a while loop that only advances under a select set of circumstances. In short, if you feed a moon to this code, it will infinite loop that while and crash the ENTIRE game.

And yes, I submitted it as an official pull commit to my hobby project, to the mocking of my peers:

Never code at work while sleep deprived. It's bad kids.
Apr 24, 2020
491 (2.57/day)
John Lakos's "Large Scale C++" was one of the best written 90s books on software engineering.

Hearing that Lakos has made a 2nd edition, updated to year 2020 sensibilities, made me instantly buy the new book. However, Lakos is going to split the new 2nd edition into two books, only the first book has been written so far. My copy arrived today, so I've been giving it a skim through.

Great book, would recommend to any C++ programmer. I eagerly await for the 2nd half for next year.
Dec 24, 2008
1,614 (0.37/day)
Volos, Greece
System Name ATLAS
Processor Q6600 QUAD
Motherboard ASUS P5QC
Cooling ProlimaTech Armageddon
Memory HYPER-X KHX1600C8D3T1K2 /4GX PC3-12800 1600MHz
Video Card(s) Sapphire HD 5770 VAPOR-X
Storage WD Raptors 73Gb - Raid1 10.000rpm
Display(s) DELL U2311H
Case HEC Compucase CI-6919 Full tower (2003) moded .. hec-group.com.tw
Audio Device(s) X-Fi Music + mods Audigy front Panel (full working)
Power Supply HIPER 4M780 PE 980W Peak
Mouse MX510
Keyboard Microsoft Digital media 3000
Software Win 7 Pro x64 ( Retail )
Consider Chess exploration or Chess AIs. Trying to solve Chess is infeasible. But if you limit the problem's scope to a certain size, such as "All Board positions with 7 or fewer pieces", it suddenly becomes possible to solve.
This is much close to the concept of proper thinking when repairing electronics, but still you need to be a master in awareness of its piece properties.
And the most important lesson that no school delivers, this is of how you to control the range of your own focus over a problem.
Apr 24, 2020
491 (2.57/day)
If you haven't programmed a GPU yet, its a lot of fun and I suggest everyone to try it out. Its actually very intimidating to try to reach the base level of utilization of a GPU. GPUs have a large number of "threads", for example: AMD's Vega64 has 64-compute units, and each compute unit has 64-vALUs in it, and each vALU requires 4-threads before its fully utilized.

That's right: you need 16384 Threads to fill up a Vega64 to occupancy 1.

But that's not all: Much like "Hyperthreading" on Intel Skylake or SMT on AMD Zen, GPUs (both NVidia and AMD) support multiple "software threads" per "core". It is recommended to reach at least occupancy 4 (that's 4-threads per core), meaning you really should be aiming at 65536 threads running on a Vega64.

Vega64 supports up to 10-threads per hardware resource, meaning a total of 163840 threads are possible. This is significantly oversubscribed however, and it seems unlikely to me that 40-threads per vALU would be efficient (even if the system supports it, it doesn't make it a good idea). Occupancy 4 seems to be the general number to aim at.

To emphasize how intimidating Occupancy 4 is on Vega64, there is 8GBs of VRAM. Occupancy 4 provides the programmer a total of 128kB of VRAM per thread. You're far below the "640kB is enough for everybody" line. Now of course, GPUs aren't designed to really have "independent" threads working on independent memory by themselves. Instead, GPUs are about SIMD-collaboration on the same problem sets (ex: pixel shaders working on the same 1080x1920 screenbuffer to display a video game).

As such: SIMD programming is a very different feel than normal programming. NVidia may market that "SIMT" allows you to think like "a bunch of threads", but the whole GPU methodology couldn't be more different! SIMD is strange, but wonderful. You have HUGE compute potential at your disposal, especially good for these "brute force" problems I've been talking about in this thread so far.