there are lots of reasons to write assembly and none of them have to do with C compilers. if you are porting an operating system to a new CPU architecture, invariably the very first code you will write will be in assembly. bootloaders are machine specific.
i like to illustrate the idea of man battling the compiler with the legend of john henry
john henry was a folklore legend, a giant of a man with incredible strength and stamina. he worked building railroads in the olden times, pounding in ties one by one. a man showed up one day, claiming his steam engine could drill through solid rock at a rate unmatched by any man. john henry took him to this challenge, and with a 20lb hammer equipped in each hand, competed against the steam machine
he won, only to die in his revelry from over-exertion.
that is kind of what like writing raw assembly to beat a compiler is like today. sure, in some small cases, you could absolutely outperform a compiler. it’d be hard work and you’d have to know the CPU in and out (and i mean like, know every single stage of the CPU pipeline, what subcomponents are implicated in which pipeline stage, etc.)
you could do it, but you’d want to kill yourself afterwords. there’s a reason we don’t write in lisp or assembly anymore for most tasks.
when we start to talk about very large programs that encompass hundreds and hundreds of libraries and such, compilers will win for two reasons. the first (and most pragmatic reason) is that people don’t write large, high-level abstract programs in assembly. you’d spend literally decades doing a job that would take a month. your program would be a fucking feat of strength, but it’d be completely inpenetrable by anybody else but you. you can barely understand what the fuck someone else was trying to do with their C code, let alone assembly.
the more interesting reason is that compilers now can make really calculated judgement calls based off the performance metrics of completely separate & and seemingly unrelated entities that will, in the end, have an impact on performance. these are things you would literally never think of, even if you were dennis ritchie himself.
there are things hand-written in assembly in order to get the fastest, most efficient implementation, though. the most prominent example are encryption algorithms and hashing algorithms. these are generally very small and contained mathematical operations that can be boiled down to asm very easily, and implementing them by hand in asm is a necessity. this is because such hashing/encryption/checksumming algorithms are meant to be standardized (like sha-128/256, chacha-20, blowfish, etc) and ported to every operating system trying to stay current. they’ll be used by hugely powerful computers to do everything from securing your facebook account to securing the united state’s secrets, so by virtue of them being run all the time the net performance you squeeze out of your implementation will be multiplied by a trillion
also, you are usually competing against other algorithms in things like the NIST’s competition in which the winner’s algo is declared the national standard