computer stuff

ampguy

Veteran
Local time
2:23 PM
Joined
May 15, 2006
Messages
6,946
Brian - not sure why that other thread was closed, but regarding the I5/I7, later Nehalem based I5's, and all I7's and Nehalem+ based Xeons can be optimized wrt loops and utilization of the cache ... relative to the Core and pre late Nehalem cpus:

I know several programmers who are only now, post the most recent Intel IDF, now taking advantage of some advanced cache tuning ...

http://www.agner.org/optimize/optimizing_assembly.pdf

http://software.intel.com/sites/products/collateral/hpc/vtune/performance_analysis_guide.pdf
 
(I was just about to join the original thread when it closed - dang :eek: - but here we are again :rolleyes: ).

I love these off-topic meanders down Memory Lane. I was not a system programmer, but what you might call a 'user-space' user of avionics-test type systems. I remember back in the '70s when we had a dedicated system based around a Honeywell, used only intermittently, that had been supplied, by the sub-contractor for a unit, with only the run-time software. But as was customary in those days, it had a nice row of toggle switches with which you could patiently enter programs bit by bit in machine code :D . One of the lads knocked up a nice little editor program, so we could use it as an extra machine to generate source code on paper tape for application programs for the main project system.

I mostly use Debian 'Lenny', double-booting to XP when needed for things like software for the scanner.
BTW I find that Photoshop Elements 2 runs very nicely on both Wine and Crossover Office.
Reading the comments in the earlier thread, I might be tempted to try out OS X ... :rolleyes:
 
The other thread got way off topic. Apple finances seemed to dominate towards the end.

These days, I code for embedded systems. "Power, Size, Weight, Speed". Where else can you really rely on hand-optimized assembly language code?

I need to read the Architecture manual and Assembly language manual for the modern Intel processors. Compare them with the ASC vector instruction set. On the latter, I think the record was collapsing a section of code that required 25 lines of FORTRAN into one assembly language instruction. The vectors on the ASC could handle a 3-deep nested loop, indexing 3-dimensinal arrays, with stride-counts on the inner two loops. The triple-buffered inputs allowed the memory unit to gather non-contiguous operands to present to the pipeline without stalling. I memorized the paths through the machine and wrote FORTRAN to match.
 

Fixed vector length, register to register operations. The Cray could be thought of as similar, 64 length maximum vector length. The ASC and STAR-100 (remade as the Cyber-203 and Cyber-205) were memory to memory vector machines. It would be the equivalent of putting a "REP" in front of any of the arithmetic operations on an Intel processor. The STAR and Cybers would use the equivalent of ESI, EDI, and EDX to control the streaming. Of course you would have to add a second source register (ESI2, ESI3) as vector instructions are typically 3 (and 4) operand, two (or 3) source, one destination. The ASC used multiple sets of registers to control a 3-deep loop collapsed to one vector instruction.

DO 15 K= KB, KE
DO 10 J= JB, JE, JS
DO 5 I= IB, IE, IS
OUTPUT( I, J, K)= IN1( I, J, K)+ IN2( I, J, K)
5 CONTINUE
10 CONTINUE
15 CONTINUE

"Begin", "End", "Stride"

collapsed to 1 assembly language vector add.

But it did cost $8m, and took three water-cooled air conditioning systems to keep it from overheating. At least you did not have to lower it into liquid nitrogen as the Cray-2 required.

I was looking at my "NIC" (Network Interface Card) from 1994, and saw that the key chips were all marked "prototype". Bleeding edge. The protocol stack was done in R3000 assembly, i960 assembly, and finally across the PC into the Pentium. The i960 was cool- you could set up big-endian and small-endian memory regions simultaneously. Made it easier to go from big-endian network addresses to small-endian Pentium.
 
Last edited:
... At least you did not have to lower it into liquid nitrogen as the Cray-2 required.
"Fluorinert", not liquid nitrogen. This coolant, according to Wikipedia, is made for the purpose by 3M, and operates around room temperature. It doesn't even go through a phase change (in normal circumstances :)), but circulates as a liquid past a vanilla heat exchanger. Given the power requirement, I couldn't afford a Cray-2 even if given to me as a gift.
 
I mostly use Debian 'Lenny', double-booting to XP when needed for things like software for the scanner.

Is there a reason why you still use Lenny? Squeeze has been out for a while now.

Then again, I still have an installation of Sarge on a Mac SE/30 now, even though I almost never boot it - the machine usually runs A/UX 3.1.1 instead.

Reading the comments in the earlier thread, I might be tempted to try out OS X ... :rolleyes:

If you have an OS X license, you can try it on a generic PC using the instructions from places like this and this and this. Go ahead, it's fun.
 
Is there a reason why you still use Lenny? Squeeze has been out for a while now.

Then again, I still have an installation of Sarge on a Mac SE/30 now, even though I almost never boot it - the machine usually runs A/UX 3.1.1 instead.

If you have an OS X license, you can try it on a generic PC using the instructions from places like this and this and this. Go ahead, it's fun.

Not *at all* wishing to start a differently themed flame war here :rolleyes: , but I use Lenny + Backports so that I can still use KDE 3.5
- it gives me just the right amount of desktop integration
- it doesn't impose a lot of unwanted eye-candy :bang:
- and is therefore 'crisp' in operation
- konqueror is still sane IMHO

If I need to boot up 'Squeeze', I use LXDE, Fluxbox, anything else but KDE >= 4*
(BTW I read that Torvalds agrees with me :rolleyes: ).
KDE Trinity for Squeeze is a good effort, but it doesn't have quite the functionality of echt-Debian KDE3 in my experience.

Sarge is good, but for newer device support I would need to do a lot of compilation from source ...

Thanks for the links for OS X ! :)
 
Last edited:
Its always funny.. I think of myself as moderately low level but I rarely have a need to optimize down to cache lines. :) I certainly appreciate those who do.

I do enjoy knocking down l2 misses, though most of that has been data reorganization and occasional prefetching intrinsics. That and crash dump asm reading is about as low level as I get day to day. I spend more of my time working on bridging the gap between the low level folks and what APIs other devs need.

/I work in the game industry on the Xbox 360, PS3 and PC. Thats a good mix of low level, high level and artistic culture.
 
On the MIPS R3000 core used in the LSI 64360 "ATMizer", the L1 cache was "disabled" and instead directly addressable. You could load program instructions and data into addressable segments in the cache, anything else into SRAM. Loading into IRAM was 5x faster execution than SRAM. The compiler and even the assembler could not understand that certain memory was faster than others, so you had to do things like turn of code reordering and basically hand-optimize. We also used a CAM (content addressable memory) with it, really sped up network address resolution. You has 2.7uSec to process incoming packets, on a 50Mhz processor.
 
A sad day. Rest in Peace.

I did meet Rob Pike and Ken Thompson, never Dennis though. Hard to believe that this was the same AT&T we have today ....
 
Noooo, seriously???

C is my the first language that captivated my mind. It's elegant but never loses its edge (you'll shoot yourself in the foot quickly if you are not really thinking).

To Dennis, thank you for letting me "C", Sir.
 
Ugh. This one hits close to home for many of us...

Dennis Ritchie, the creator of the C programming language and a key developer of the Unix operating system, has passed away.

The only C manual that I ever read.

Rest in Peace.

I would ask that any C/C++ developers consider putting a dedication in their source code.
 
It's not every day that we lose a demigod. :) I met DMR at Usenix 1980 in Santa Monica - a great destination in January. I had previously passed email messages to him (via Usenet, of course) regarding a C optimizer bug. At the conference, he presented a new kernel facility called Streams. This was to replace a more primitive character buffering facility called clists. Anyway, I saw him sitting on the curb outside the hotel like a regular joe, and joined him to shoot the breeze for a few minutes. Besides the brilliance, he was clearly down-to-earth!
 
I met Grace Hopper in 1980, while I was in school. I had the manual that she wrote for the IBM Mark I, surplus from the library at work- which had held the first computer conference in 1949.

So after the talk, I put the manual in front of her and asked for an autograph. She was surprised to see it, "Where the hell did you get this". She looked at the names and started telling stories about them.

"This guy, he was something else. You'd go to program the computer and find out he rewired it the night before and changed how the instructions worked".

picture.php


Not long ago, I found the proceedings of the 1949 computer Symposium in our library, and the person had written some articles on the computer and optimizing instructions - as in changing how they worked in hardware.

I think that influenced my career.

The Leica in the picture- made about the same time as the IBM Mark I project kicked off.
 
Hi Brian

Hi Brian

In the '90s we looked at these products closely, while developing our own. We found that most of the NICs with embedded MIPS core's could not keep up with wire speed (OC-3 for ATM, 100/1000 Mbps for Ethernet).

Our partners and OEMs/ODMs, as well as independent network testing groups found the same, all the way to the Alteon GbE NIC cards.

We also used CAMs, and the first TCP/IP hardware assist in-chip designs, and embraced the open source folks. We had Donald Becker out for a week, and you can see a lot of our first gen GbE NIC chip in his hamachi.c code for linux.


On the MIPS R3000 core used in the LSI 64360 "ATMizer", the L1 cache was "disabled" and instead directly addressable. You could load program instructions and data into addressable segments in the cache, anything else into SRAM. Loading into IRAM was 5x faster execution than SRAM. The compiler and even the assembler could not understand that certain memory was faster than others, so you had to do things like turn of code reordering and basically hand-optimize. We also used a CAM (content addressable memory) with it, really sped up network address resolution. You has 2.7uSec to process incoming packets, on a 50Mhz processor.
 
My code could keep up with the fully loaded ATM OC-3 rates. It required a lot of fine-tuning. Interleaved instruction execution so that the processor would not insert wait states waiting for results to be available. The destination register of an operation should not be the source for the next instruction. Execute an instruction using different sources, get a free clock cycle. "Register update Hazard" is what we called it on the ASC. We also worked with the "never released" 64363 and then the 64364 follow-on. Those were fun days. The card was done in '94. The i960CF also helped move data around. These cards were custom designed to develop and demonstrate the protocols, not to be a commercial NIC.
 
Last edited:
Donald is incredible

Donald is incredible

He's been a great kernal and networking contributor to Linux since the very early days. A great friend of mine since the early '90s, and I'm proud to have worked with him at 3 different companies, one of which we co-founded.


Ahh, Donald Becker. A name I've known for quite some time - from my heavy Linux days...
 
Back
Top Bottom