Beautiful Pixels: Presentation

Showing posts with label Presentation. Show all posts

Monday, March 12, 2012

Game Developers Conference 2012 presentation: The Bleeding Edge of Open Web Tech

I presented at GDC last week, and it went well. Here is a pre-recorded version of the talk. You can also check out the notes for the presentation.

Tuesday, August 23, 2011

Casual Connect 2011 HTML5 Games Presentation

My Casual Connect 2011 HTML5 Games Presentation was recorded, and the 30 minute video is up.

I discuss the current availability of some key HTML5 features, overview the browser tech being used today in games, and touch on monetization and distribution. Lots of resource references towards the end.

The next best way to learn more is to come to the New Game Conference in November.

Friday, May 1, 2009

Rapid Prototyping (and Rapid Iteration) with Gamebryo LightSpeed, Presentation

In April I presented at the 2009 Triangle Game Conference on Rapid Prototyping and Rapid Iteration. I'm making the slides, audio, and video available here:

Slides (6MB) (view in google docs)
Audio (15MB)
Videos (55MB)
(place the videos next to RapidPrototyping.ppt file to launch by clicking from inside powerpoint)

Abstract:

Studios succeed by securing solid publisher deals, and then delivering games on time and budget. Great games can't be started until that deal is in place, which places great prototypes as one of the most essential stages of development. This presentation discusses several technical strategies that can be used to facilitate rapid prototyping. These include discussions on asset management systems; live tool-game connections; and data driven designer tools and extensions. This presentation is intended for attendees experienced with game development. It will dive into the technical design of these systems and demonstrate their features. Concepts learned will be directly applicable by developers preparing to build a game content pipeline and tool set.

The demonstrations come from Emergent's latest product, Gamebryo LightSpeed.

Saturday, August 23, 2008

Multi-Platform Multi-Core Architecture Comparison (PC, Wii, Xbox 360, PS3, CUDA, Larrabee)

I just gave a presentation at the Game Connection Developers Conference in Leipzig. It dealt with Multi-Platform support for Multi-Core development... which we've solved at Emergent with Floodgate.

I've presented on this before, but what I added this time was a series of architecture block diagrams to illustrate the wide range of systems out there. They specifically focus on the memory topology relevant for code.

Some quick notes:

Sizes and distances between boxes don't have meaning in these diagrams, just the topology.
There are simplifications (e.g. I haven't added EDRAM on the 360). However, the high level structure of the systems is valuable to contrast, and I've focused on what general processing typically accesses. If I've goofed something, let me know, but also perhaps I omitted it to keep things simpler.
R stands for Registers, L1 and L2 for caches, Mem for Memory, GMem for graphics memory

We start with simple PCs and Multi-Core PCs. Memory is cached, but even with multi-core systems the programmer doesn't have to worry about consistency. As long as synchronization primitives are used to avoid race conditions, the systems take care of getting the right data when you fetch it. (This takes some work, since invalid data could be in an L1 cache that should be replaced by data currently in a write queue from another CPU.)

Getting into consoles, we start with the Wii. There are two types of memory, both accessible by CPU and GPU. However, what's really interesting is the ability to lock a portion of the L1 cache and explicitly manage it with DMA transfers. In one test case, we saw 2.5 times performance improvement by explicitly managing Floodgate transfers with the locked cache!

The Xbox 360 looks quite a bit like a multi-core PC, with multiple hardware threads per core. The main thing to note is the single memory used for "system" and graphical resources. Also, the GPU happens to be the memory controller, and has access to L2, but programmers needed concern themselves with this and only a few developers take advantage of GPU L2 access.

The PlayStation 3 (CELL processor) is the earliest architecture that really rocked the boat. A series of co-processors named SPUs have dedicated memory for instructions and data called Local Stores. These must be managed explicitly by DMA transfers. PlayStation 3 is why we built Floodgate, but as you'll see, it's not the only system that can benefit.

nVidia's CUDA is certainly an interesting architecture. It differs significantly from other systems, being a large collection of fairly small microprocessors. Each microprocessor block has a shared register file, and a large number of threads that are very efficiently switched by a hardware implemented scheduler. Each block also has a shared memory cache that must be explicitly managed by code.

The left side of the diagram is the CPU of the system, I left it as a dual-core just for an example.

Intel's Larrabee looks like a many core system in many ways. Again, I left a generic dual-core CPU on the left side. The architecture feature to note is that the L2 cache has been broken up and a portion dedicated to each core of 4 hardware threads. However, there is a high speed ring bus that provides access to any L2 from any core. The caches maintain coherency so programmers need only worry about race conditions, but not data barriers, write queues, and caches. However, high performance code will take advantage of the faster access of "local L2 cache".

Some things to summarize:

There a wide variety of machine types currently on the market, or about to be here.
Some architectures have non-uniform memory, and many require explicit memory management.
Systems that don't require explicit memory management still benefit from it. e.g.:

Wii with Locked Cache
CUDA with Shared Memory
360 with prefetching
Larrabee with "right sized" "local L2 cache" data

Large numbers of computing elements are coming. CUDA already exposes a very high count, but so does Larrabee. These systems will require efficient blends of both functional decomposition and data decomposition

Ed Holzwarth and I designed Floodgate in 2005/2006 to deal with many of these issues on PS3 & Xbox 360. I'm pleased to find our approach has positioned us well for upcoming hardware architectures we didn't know about then (CUDA, Larrabee). If you'd like more info on Floodgate, for now I'll just send you to some marketing material and a white paper. Also, much credit to those who actually implemented and maintain the system: David Asbell, Stephen Chenney, Michael Noland, Dan Amerson, & Joel Bartley (sorry if I missed someone).

Tuesday, July 22, 2008

Parallel Rendering with DirectX Command Buffers

I'm at Microsoft XNA Gamefest 2008 presenting Practical Parallel Rendering with DirectX 9 and 10. You can find the slides and open source code. Bo Wilson helped out on the design and prototype.

The short form of the presentation is:

We made a command buffer (or "display list") format for DirectX 9.
Multiple CPUs can record command buffers simultaneously.
One main thread that owns the device can playback command buffers.
For many games, this provides an effective way to improve performance with minimal changes to existing render code architecture, they can "simply" swap in one of our special recording devices.
The code is open source.

Diagram: Example application framework using command buffers