How 3D is handled by hardware ?

agent_x007 · Jun 17, 2020

You are forcing a very weird scenario with this low RAM "max VRAM" example.
We are at 2020 now, you don't buy 2GB DDR4 memory sticks (you either get 4GB [lol] or 8GB), and current gen GPUs can't work with 32-bit OS (there is no 32-bit drivers for them to use).
Lastly, most hardware doesn't support WinXp since quite some time ago, and only with it those limitations you are keep bringing up in examples may apply.

Under 32-bit WinXp, PCIe GPUs reserve usually 256MB of RAM.
The only card that did more for me so far was Titan X(M) (on moded driver it went to 512MB).

Houdini · Jun 17, 2020

Are you serious? ...
ok 4 GIGS ok ? 4gigs ...
it doesn't matter 2 or 4 or 8 , I was just illustrating point that what happens if VRAM is way more than RAM...
guy said that it goes to VRAM and if VRAM overflows it uses RAM as an extension of VRAM
stop complicating things , asking you 4th time...

repeating question

ok I'm talking about 3d max , let's narrow down that only to 3d max.

"Creating a sphere in 3ds Max is content creation " does word "content" changes a lot ? I dont think so...
everything in 3d world is content creation ...

"If it is too big for VRAM, the data will spill over to RAM "

That is very interesting, because it means that CPU directly accessing VRAM (never addresses ram while working in 3d unless overflow happens) while as you said "content creation"

Does it mean that , if you have 32 gigs of VRAM, and only 2 gigs of system RAM
object will not fill the RAM at all, because objects will freely fit into the VRAM?

According to that, creating 10 gigs amount of triangles in 3d max will not touch system RAM at all , right ?
only will fill the RAM when VRAM overflow

londiste · Jun 17, 2020

Everything is content but creating that content is separate from rendering. In most use cases, the content is precreated. Rendering itself always happens on content that already exists. In your 3ds Max sphere example, anything before "now, render this stuff" going to API is content creation and irrelevant to how 3D is handled by hardware - or the extended version of hardware because layers like drivers and API are closely tied to hardware.

I said VRAM overflows to RAM. GPU is (more or less) directly accessing system RAM. The way VRAM is set up these days (again, depending on things like API, OS, drivers or vendor) is that if VRAM runs out, system RAM can be used as VRAM extension - over PCIe - to contain the overflowing part. Obviously that comes with a very noticeable performance hit.

Houdini · Jun 17, 2020

Please forget about rendering of that polygons and all that rasterization stuff, what happens inside GPU I dont mind.. I'm only interested how sphere gets into GPU , directly by avoiding RAM or not ..
if it goes directly into VRAM avoiding RAM , then it means: if you have 10gigs of VRAM and only 1 gig of RAM it will be ok because it will never touch RAM because object will fit entirely into VRAM
is above true for 3d max and maya houdini and blender ? (they work in same way I'm sure, at least steps of object creation is same). it's something that should be common for such host softs like those I've mentioned.

Deleted member 197986 · Jun 17, 2020

Is it me, or am I missing that way of developer code programming and type of developer language was not included yet in this thread?

dragontamer5788 · Jun 18, 2020

Houdini said:
I'm only interested how sphere gets into GPU , directly by avoiding RAM or not

That's a very complicated question, because today's computers have so many optimizations that I don't believe we can really generalize anymore.

Lets start with data format. The sphere is first transformed into a list of triangles, typically a triangle-strip. https://en.wikipedia.org/wiki/Triangle_strip. For a sphere, it would look something like this:

If you look closely at curves in video games, you can see the triangles on the edges.

Okay, so the CPU first needs to make a list of triangles to pass to the GPU. Its simply a list of every point in the strip. Like "(0,0,0), (1,0,0), (1,1,0), (1, 1, 1),", etc. etc. A big list like that. This is called "Vertex information".

A "Vertex Shader" or "Geometry Shader" is a GPU-program that may make more triangles from data passed in from the CPU ("shaders" are run on the GPU instead of the CPU). GPU code is highly customizable and can pretty much do anything these days, so its really hard to generalize exactly what format CPUs even pass to the GPUs anymore. But... lets assume a triangle strip.

So... where am I going with this?

I guess what I'm saying is... anything can happen, so long as someone wrote the code for it. Code runs on the GPU, code runs on the CPU. At best, I can tell you what happens with a specific situation, like lets assume we have a simple triangle strip, lets assume we have a mip-mapped texture mapped to those triangles. Lets assume the CPU is storing the data in DDR4 (not necessarily true! It could be in L3 cache, it could be on the hard drive, it could be on the internet and the CPU doesn't even have the data yet).

It all depends on how a particular game engine works, and what optimizations are enabled, whether the GPU is an iGPU (shares internal bus with the CPU), or a dGPU (must be external over PCIe), device drivers... every piece of the puzzle is changing constantly because everyone involved is trying to make the whole process faster.

if it goes directly into VRAM avoiding RAM , then it means: if you have 10gigs of VRAM and only 1 gig of RAM it will be ok because it will never touch RAM because object will fit entirely into VRAM
is above true for 3d max and maya houdini and blender ? (they work in same way I'm sure, at least steps of object creation is same). it's something that should be common for such host softs like those I've mentioned.

The CPU cannot access GPU-RAM directly, but it can pass data to the GPU somewhat efficiently (roughly 15GB/s is the speed of PCIe 3.0 16x lanes). So even if the CPU doesn't have 10GBs, it can certainly be programs to pass 10GBs of generated data to the GPU.

Does that happen commonly? No. Most computers have more DDR4 RAM than vRAM. DDR4 is way, way cheaper after all. 16GBs of DDR4 is like $80, while 10GBs+ GDDR6 VRAM GPUs are hundreds of dollars. So most code probably assumes that they're in DDR4.

But what about 3d Max, Maya, Houdini and Blender? As offline renderers, they're designed to support dozens of GBs sized scenes, in excess of a computer's DDR4 capacity. Unlike a video game (which has speed requirements: usually 60fps, 30fps, or some other similar FPS requirement), offline renders can spend days on a single animation. So... they probably can swap off data that doesn't fit into the hard drive or something.

You're asking a lot of questions, but things are very complicated and not really simple to answer.

----------

EDIT: Okay, so lets choose Blender. Because Blender is easy. Why? Because Cycles / Blender could be a CPU-only renderer. No GPU discussions happen at all. Bam. Simple answer to a simple question.

Houdini · Jun 18, 2020

dragontamer5788 thank you for comprehensive answer.

I was saying about less RAM and more VRAM not because it happens frequently , but to illustrate the point.

ok , yesterday I made an experiment with my hardware and 3d max , it uses direct 3d viewport so it means it uses directX API
here is my configuration.

1) 3dmax with directX API
(everything that happens in max and needs hardware to do something (draw a triangles in that case) goes into Vram throughout API)

2) 4gigs of VRAM 1050ti

3) 16gigs of system RAM

now take a look , I thought that if CPU can directly put triangles into VRAM then it will not use the RAM at all , guy said here that RAM is used only when VRAM is full (looks like his statement is false)
I had 3.5gigs of free vram after I launched max with empty viewport (looks like 500mb are eaten by OS and other stuff) .

lets just call it idle state of max , so my idle is 3.5 gigs of free VRAM and 1 gig of system RAM (taken by max) , so I started to monitor a situation there

I created 10 heavy spheres , it took 1 gig of RAM instantly also filled my VRAM with 20+% (so it means that process is happening in parallel , it takes place in both memories , it places those spheres in VRAM and RAM in same time...) .

So my theory is , when OS starts DirectX API runs in system RAM right ? when launching max it also goes into RAM
So when creating spehres , it sends that info into VRAM throughout API that is running in RAM

if logic is correct then we always have duplication of same objects, we have spheres in RAM and also in VRAM. is that accurate ? I'm talking about only 3d max (dont mind how other supper specialized softst work)

dragontamer5788 · Jun 18, 2020

Houdini said:
if logic is correct then we always have duplication of same objects, we have spheres in RAM and also in VRAM. is that accurate ? I'm talking about only 3d max (dont mind how other supper specialized softst work)

I don't know how 3d studio max works. But sure. That seems reasonable.

The CPU can talk at about 50GBps to DDR4 RAM.

GPU VRAM can talk at speeds anywhere from 128GBps to 1000GBps to the GPU (depending on GDDR5, GDDR6, HBM2 and other details).

CPU can talk to GPU VRAM at only 15GBps, while GPU can only talk to 15GBps to DDR4 RAM.

So by duplicating the data, the CPU can go 50GBps, and the GPU can talk at 300GBps (or whatever its VRAM is clocked at). Its way faster instead of waiting at 15GBps PCIe bus. But really, it depends on what the programmer felt was most important in any given situation.

I'm talking about only 3d max (dont mind how other supper specialized softst work)

Every program works differently. Its very difficult to generalize.

Houdini · Jun 18, 2020

--------------------------------------------------------------------------------------------

dragontamer5788 "I don't know how 3d studio max works. But sure. That seems reasonable."

that statement is not as correct I think , I believe it doesn't matter how max works at all, what matters is which API is using to communicate with hardware .. we know it's DirectX

--------------------------------------------------------------------------------------------
So whole that big topic I started here narrows down to one question now: why it holds spheres in both places ...
or is 20+% in VRAM are spheres at all ?(most likely they are)
what is going on step by step still not answered.

dragontamer5788 · Jun 18, 2020

Houdini said:
I believe it doesn't matter how max works at all

At a minimum, there's the 1GB vertex data in your CPU's DDR4 RAM. Maybe. Again, I don't know what 3d Studio Max is doing, it could be putting it in the hard drive for all I know.

But that's all I'd know. Why would 3d Studio max transmit all vertex data to the GPU? 3d Studio Max only outputs a simplified output to the viewport. The actual render is going to happen later, offline. All that needs to happen in 3d studio max is to have a simplified, easy to render, version for the artist to kinda-sorta see what he's working with. In isolation, I don't even know if 3d studio max transmits all 1G points of vertex data, or if its been decimated (math run to have fewer vertexes over the data).

For all I know, 3d studio max decimates all data to say, 100,000 polygons or something so that the viewport remains speedy. It really depends on what the programmer for the program decided was best. Maybe its a configuration option. I don't know, and it's unlikely that anyone would really know. It really is getting into the weeds into how specific programs operate. All programs have their own code and rendering systems.

--------

Maybe 3d studio max calculates the points that are on the screen, and only sends those points to the GPU. Maybe its dependent on the view angle (why send verticies if they are not in the camera's view? Like if a vertex is behind the camera, no need to send it to the GPU to render). because as I've said before, its about 15GB/s speed to send new data to the GPU. If the 3d studio max programmers determined that's fast enough for shifting viewports, maybe they're willing to have the scroll button be non-realtime / slightly laggy.

So in this model, the CPU would only send the GPU the vertex data if its on the screen. Or maybe 3d studio max is a very complicated program, which calculates the amount of VRAM on the GPU and automatically sends data that fits, and dynamically determines what should, and shouldn't be, in the GPU (which is almost certainly the case IMO). At which point, there's no easy answer anymore, its really up to the programmers who programmed the system.

So whole that big topic I started here narrows down to one question now: why it holds spheres in both places ...

Why not? I already told you earlier. GPU VRAM is 300GBps speed (dependent on GPU model. A RTX 2060 Super has 448 GB/s speed to VRAM). CPU DDR4 is about 50GBps. PCIe is only 15GBps. Keeping the data in both locations is one of the easiest ways to speed up programs I can think of.

If the GPU VRAM doesn't have the data, then the GPU would only be able to access it at 15GBps, or roughly 1/20th the speed compared to if it were already loaded in GPU VRAM.

Houdini · Jun 18, 2020

you told me earlier yes , that speeds are quite reasonable statements, but overally you're not sure , but I need to put it into article so I need to know 100% what is going on , you are assuming quite well but still we don't know .

In max if you create millions of objects it starts to drastically decrease viewport performance let's say down to 1 frame per second . you don't need to delete them you just move camera out of those objects so camera don't see them, and you have 300+ fps again , but both RAM and VRAM are still filled with those objects .

you said : "Why would 3d Studio max transmit all vertex data to the GPU? 3d Studio Max only outputs a simplified output to the viewport "
according to above it can't be like that .

Can you imagine how many times CPU would have to transfer - fill and empty data from RAM and VRAM when you just move camera from one side to another.
Again , by moving camera from objects to empty space only FPS changes , but RAM and VRAM stays filled with same amount info .

Vya Domus · Jun 18, 2020

dragontamer5788 said:
The CPU cannot access GPU-RAM directly, but it can pass data to the GPU somewhat efficiently (roughly 15GB/s is the speed of PCIe 3.0 16x lanes). So even if the CPU doesn't have 10GBs, it can certainly be programs to pass 10GBs of generated data to the GPU.

Efficiently is a sever overstatement, if you need to swap data performance wise it's over. It's not just the bandwidth but the latency is colossal.

Houdini said:
So whole that big topic I started here narrows down to one question now: why it holds spheres in both places ...

Because if you have geometry stored somewhere on disk you cannot transfer it directly to GPU VRAM, first it has to be stored into RAM. After the transfer is done it doesn't have to remain into the RAM. There are also ways to generate geometry directly in VRAM, without having to first store it into RAM.

Houdini said:
fill and empty data from RAM and VRAM when you just move camera from one side to another.

When the "camera moves" (which is inaccurate, in reality the whole scene moves and the camera remains fixed, it's complicated to explain why but that's more efficient), the data doesn't change. There is something called a transformation matrix which "moves the camera".

Houdini · Jun 18, 2020

how camera transformation is called I don't mind at all...

when loading file of course it goes to RAM
again forget about disks at all , talk about creation process only .

So, empty software loads into RAM and only then creates a sphere.

"Because if you have geometry stored somewhere on disk you cannot transfer it directly to GPU VRAM, first it has to be stored into RAM. After the transfer is done it doesn't have to remain into the RAM. There are also ways to generate geometry directly in VRAM, without having to first store it into RAM."

First part about disk and RAM , yes!

Second part "After the transfer is done it doesn't have to remain into the RAM." absolute No!

It still remains data in RAM anyway. It fills RAM by 1gig and then fills VRAM at 20%, but (VRAM is not overflowed, but object is still in RAM) ...

if it would behave like you said I would understand the whole thing, but it still keeps it in RAM after sending to VRAM.

Vya Domus · Jun 18, 2020

Code Viewer. Source code: src/1.getting_started/3.3.shaders_class/shaders_class.cpp

Look at this portion :

Code:

    float vertices[] = {
        // positions         // colors
         0.5f, -0.5f, 0.0f,  1.0f, 0.0f, 0.0f,  // bottom right
        -0.5f, -0.5f, 0.0f,  0.0f, 1.0f, 0.0f,  // bottom left
         0.0f,  0.5f, 0.0f,  0.0f, 0.0f, 1.0f   // top
    };

This is the vertex data which lives into RAM that gets copied into the buffer which goes into VRAM.

Code:

glBufferData(GL_ARRAY_BUFFER, sizeof(vertices), vertices, GL_STATIC_DRAW);

This function takes a pointer, after it is called the pointer and whatever it points to can be deleted. In this case "vertices" is a statically allocated array but it might as well have been dynamically allocated at run time and then deleted after the buffer is initiated.

So no, the vertex data can always be deleted after the buffer gets created.

Houdini · Jun 18, 2020

"can be deleted" maybe, but it's not deleted in fact ...
it copies that data dumbly , 1x1 looks like that's correct ..

it means max creates ONLY floats data with help of ONLY CPU and it of course lives in system RAM (because CPU cant draw without GPU or APU at least)
then that floatand index data with help pf API instructions are sent to VRAM to draw a shell over that points ( visual triangle creation process according to point index integers and position floats)
but it still remains it in RAM because if you need to change sphere shape it will change point positions in RAM (floats gets updated in RAM and then it will send updated float data in VRAM to draw/update shape form)

when you add pints to object it adds floats and indexes in ram and sends them and so on and so forth..

in other words logical way to do that would be next steps:

1) CPU sends data to GPU with help of API (that sits in RAM) BUT...
2) data goes to VRAM (and logically it should be cleared from RAM) and only if data is more that VRAM can handle it uses RAM to not get crashed

why it dumbly duplicates data ? and not clears RAM after data was sent into VRAM

admin.. police... somebody delete that topic ))...

question is still opened.

dragontamer5788 · Jun 18, 2020

Houdini said:
2) data goes to VRAM (and logically it should be cleared from RAM)

Why? CPUs can't access VRAM as quickly as RAM (15GBps to VRAM, since its over PCIe. While RAM is 50GBps). So any program will be way faster if they copy the data to both locations.

You might be surprised at how much copying happens in programming. Extra copies of data running around makes code way simpler. If all data were "don't repeat yourself" (DRY), you'd need to pass a lot of pointers around. You've got pointers, aliasing issues, etc. etc. An extra copy is just easier to program, easier to think about, and way faster in most cases.

-----------

EDIT: Another note. Clearing RAM takes time. Unless you've actually filled RAM up, why spend time clearing RAM? If you just copy the data to both locations, you skip the "clearing" step and have faster code.

spectatorx · Jun 18, 2020

I think this channel would be helpful in what you want to achieve:

javidx9

Programming and Technology Tutorials from a different perspective. Everybody has the ability to write good, useful and fun code. No hate, no patronising, jus...

www.youtube.com

Also blog of author of banished is another great source for deeper knowledge on this kind of things:

Shining Rock Software

www.shiningrocksoftware.com

Houdini · Jun 18, 2020

dragontamer5788 "Why? CPUs can't access VRAM as quickly as RAM "

yes I believe it's faster that way , but you still don't understand the main question ,

when you as you said quickly accessed RAM and wrote there 1gig of floats and then sent it in VRAM
why you haven't deleted that data from RAM after it was sent and located into VRAM ? why you accumulating it into RAM ?

if answer on my above question is something like: "logically it's correct but hard to do" or "it's correct but we don't want to clear ram that frequently"
if one of those , then it explains the main question , but it still speculations ,we don't know what really happens there.
sure there is more technically correct explanation of that behavior.
(maybe have nothing to do regarding API at all) because same happens in openGL maya and directX max

spectatorx , I don't wanna become a programmer to know an answer on above question , I just want to know that RAM + VRAM step from 3d max point of view.
if you know the answer, tell us please.

dragontamer5788 · Jun 18, 2020

The only one who can answer your question for 3d Studio Max is the person who programmed 3d Studio Max.

-------

I'm a different person, a different programmer. I can tell you what I would think in that situation. If I were the programmer, then I wouldn't clear RAM. Because clearing RAM in most cases is wasted effort. There's almost no legitimate programming reason to clear data before you've run out of RAM. If, and only if, my code runs the risk of running out of RAM, will I start to clear out some data. (or more specifically: call "free" on pointers to the memory allocator).

"Clearing RAM" is a step. A step that takes CPU time away from the rest of the things the program wants to do.

Houdini · Jun 18, 2020

"The only one who can answer your question for 3d Studio Max is the person who programmed 3d Studio Max." That is not correct aswell!

because: -> Maya does same , houdini does same , Max does same , I can expect every softwares or at least most of them do same...
and it means explanation lies somewhere in outer space rather than what particular programmer or team decided to do.
and if that is common behaviour it means it's something that happens commonly when creating 3d data, regardless max and their developer preferences

sure there is an answer that can be stepped like cartoon show, simply and with simple words, step 1 step 2 step 3 step 4 , that's it.

spectatorx · Jun 18, 2020

Another materials i would recommend you to look into are John Carmack's keynotes, arstechnica "warstories", especially the ones about older games like first prince of persia or crash bandicoot.

If you want to really explain how image rendering and resources management works you can't do it properly without saying a thing from programmer's perspective and you do not need to be a programmer to understand these things.

Computerphile is another great channel for this kind of knowledge:

System Name	BOX
Processor	Core i7 6950X @ 4,26GHz (1,28V)
Motherboard	X99 SOC Champion (BIOS F23c + bifurcation mod)
Cooling	Thermalright Venomous-X + 2x Delta 38mm PWM (Push-Pull)
Memory	Patriot Viper Steel 4000MHz CL16 4x8GB (@3240MHz CL12.12.12.24 CR2T @ 1,48V)
Video Card(s)	Titan V (~1650MHz @ 0.77V, HBM2 1GHz, Forced P2 state [OFF])
Storage	WD SN850X 2TB + Samsung EVO 2TB (SATA) + Seagate Exos X20 20TB (4Kn mode)
Display(s)	LG 27GP950-B
Case	Fractal Design Meshify 2 XL
Audio Device(s)	Motu M4 (audio interface) + ATH-A900Z + Behringer C-1
Power Supply	Seasonic X-760 (760W)
Mouse	Logitech RX-250
Keyboard	HP KB-9970
Software	Windows 10 Pro x64

System Name	X99-A
Processor	5930k
Motherboard	X99-A
Cooling	Corsair H105
Memory	32gigs
Video Card(s)	1050 Ti
Power Supply	EVGA Supernova 850G
Software	Wind 10 64bit

Processor	Ryzen 7800X3D
Motherboard	ROG STRIX B650E-F GAMING WIFI
Memory	2x16GB G.Skill Flare X5 DDR5-6000 CL36 (F5-6000J3636F16GX2-FX5)
Video Card(s)	INNO3D GeForce RTX™ 4070 Ti SUPER TWIN X2
Storage	2TB Samsung 980 PRO, 4TB WD Black SN850X
Display(s)	42" LG C2 OLED, 27" ASUS PG279Q
Case	Thermaltake Core P5
Power Supply	Fractal Design Ion+ Platinum 760W
Mouse	Corsair Dark Core RGB Pro SE
Keyboard	Corsair K100 RGB
VR HMD	HTC Vive Cosmos

System Name	X99-A
Processor	5930k
Motherboard	X99-A
Cooling	Corsair H105
Memory	32gigs
Video Card(s)	1050 Ti
Power Supply	EVGA Supernova 850G
Software	Wind 10 64bit

System Name	X99-A
Processor	5930k
Motherboard	X99-A
Cooling	Corsair H105
Memory	32gigs
Video Card(s)	1050 Ti
Power Supply	EVGA Supernova 850G
Software	Wind 10 64bit

System Name	Good enough
Processor	AMD Ryzen R9 7900 - Alphacool Eisblock XPX Aurora Edge
Motherboard	ASRock B650 Pro RS
Cooling	2x 360mm NexXxoS ST30 X-Flow, 1x 360mm NexXxoS ST30, 1x 240mm NexXxoS ST30
Memory	32GB - FURY Beast RGB 5600 Mhz
Video Card(s)	Sapphire RX 7900 XT - Alphacool Eisblock Aurora
Storage	1x Kingston KC3000 1TB 1x Kingston A2000 1TB, 1x Samsung 850 EVO 250GB , 1x Samsung 860 EVO 500GB
Display(s)	LG UltraGear 32GN650-B + 4K Samsung TV
Case	Phanteks NV7
Power Supply	GPS-750C

System Name	Smooth-Operator
Processor	AMD Ryzen 7 3800x
Motherboard	Asrock x570 Taichi
Cooling	AMD Wraith Prism
Memory	2x16GB 3200MHz CL16@CL14 DDR4
Video Card(s)	Sapphire Radeon RX 580 8GB NITRO+
Storage	2x4TB WD HGST 7K6 7200RPM 256MB
Display(s)	Samsung S24E370DL 24" IPS Freesync 75Hz
Case	Fractal Design Focus G Window Blue
Audio Device(s)	Creative X-Fi Titanium PCIe x1
Power Supply	Corsair HX850 80+ Platinum
Mouse	Gigabyte Aorus M3
Keyboard	Zalman ZM-K300M
Software	Windows 10 x64 Enterprise/Ubuntu Budgie amd64

How 3D is handled by hardware ?

Deleted member 197986

Guest