[Guide] Hardware Transcoding: The JDM way! QuickSync and NVENC

Interesting the talk about getting 20 h264 transcodes “easily”.

I struggle to get 20 1080p h264 transcodes on a brand new Intel Xeon E-2276G, Coffee-Lake gen with Intel P630 graphics. These are 15-20mbps files with DTS 5.1

64GB RAM - NVMe Intel P3600 NVMe

If I try and do any more than 20 it starts buffering - despite disk, network and cpu all being low - which to me suggests the QSV chip is maxing out

What OS?

Ubuntu Server 18.04

Seems to be a bottleneck regardless of transcode quality. Even transcoding to 720p 4mbps im limited to 21 or so transcodes.

Also if I add PGS subtitles, even though it’s HW encoding/decoding, CPU usage skyrockets also and transcodes are limited to around 17 or so.

Would be interested to see what performance you achieve with files with DTS 5.1 audio and/or PGS subtitles.

Unfortunately I don’t believe 20+ transcodes is always easily achievable in QSV as there are so many variables that impact CPU usage outside of QSV.

1 Like

I don’t have any experience with the E-2276G. I’d recommend trying to upgrade to 20.04 as recommended in the guide. As far as I know, subtitle transcoding takes place exclusively on the CPU, which is different than video stream transcoding.

Also, if you’re doing anything else on the machine, it changes the environment, negating any of the results stated here.

In running on a baremetal testbed- nothing but Plex and netdata running.

QSV capability of the E-2276G is the same if not higher than the G5400 (Same gen- higher spec iGPU which utilises more system RAM).

QSV also works out of the box on 18.04 server. What info do you have that QSV performs better on 20.04?

Are you able to test 1080p h264 with DTS 5.1? Genuinely curious if you can run more than 20 simultaneous transcodes without buffering.

Did some more testing, and the results are interesting.

HARDWARE
CPU: Intel Xeon 2276G (6 Core/12 Thread 3.8Ghz CPU, 12146 Passmark, P630 iGPU, Coffee Lake)
RAM: 64GB DDR4
DISK: 2x 1.2TB Intel P3600 RAID0 (content saved on RAM/transcoding in RAM)
GPU: Nvidia GTX 1080p 8GB

SOURCE FILE
Video h264 1080P 15763kbps
Audio DCA DTS 5.1 1536 kbps

TEST
24x Simultaneous transcodes to 8mbps 1080P

COMPARISON
Quick Sync Video vs Nvidia NVENC

Note// To remove disk performance from the equation - content was saved on 64GB ramdisk and transcoding also occured in RAM.

RESULTS
QSV
Starts to buffer at around 20 transcodes - with significant IOWAIT - despite files and transcoding stored in RAM. System load is minimal (2-3), and user/system CPU usage is also minimal (20%).

See streams buffering and transcode speeds less than 1.0 for reference.

NVENC
Streams 24x Transcodes with no buffering. All files sitting in throttled transcode

SUMMARY
Whilst QSV performs adequately for less than 20 transcodes - and CPU usage remains low throughout the test - QSV transcoding generates noticeable IOWAIT despite disk activity being minimal. IOWAIT increases in step with the number of transcodes running. It does not occur once a particular threshold is met- instead it builds slowly as transcodes are added. Once IOWAIT hits approx 60% transcodes begin to buffer.

It is important to remember that IOWAIT isn’t specifically disk related.

The exact same test on NVENC does not see this same IOWAIT side effect, running on the same hardware, same disks, same transcode setup. As a result - 24 simultaneous transcodes run without buffering.

Don’t transcode audio at the same time and report back results.

Why? Neither NVENC and QSV transcode audio

Both comparisons are processing video via the GPU, and audio via the CPU

Because QuickSync inherently takes place on the CPU.

No it doesn’t. QSV transcoding occurs on a dedicated Quick Sync Core - separate from the standard compute CPU cores/threads.

Both comparisons have the same stress being placed on the CPU cores/threads.

Quick Sync
Video - QSV Core
Audio - CPU Core

NVENC
Video - GPU Core
Audio - CPU Core

Right, but the iGPU is part of the CPU. It’s a shame you won’t just try it, I’m not saying it’s the issue, but you seem more interested in proving a point than actually testing it.

1 Like

The iGPU is a part of the die - but the processing is separate from the standard compute cores. Processing on the QSV core has no impact on the standard CPU cores - the same as if NVENC is transcoding.

So the two scenario’s tested are equal in terms of the stress being placed on the CPU cores.

I’m happy to test it - I’m as curious as you are. Problem is, I can’t find a h264 video file that requires video transcoding but direct plays audio.

It seems as soon as you force transcode video, it’ll also transcode audio.

I think the scenario of “transcoded video - direct play audio” is rare enough that it’s not worth testing.

Almost all users that are transcoding video will also be transcoding audio.

Is your setup transcoding audio even while video is direct?

Have you tried files that are stereo audio only, not surround?

No when video is direct, audio is direct.

But when you force a video transcode (for testing purposes), Plex will also transcode the audio. Regardless of whether your client can direct play the audio.

So the scenario of “Transcode Video, Direct Play Audio” would be quite rare- particularly with h264 content.

In summary- Quick Sync Video is a fantastic solution, if you need less than 18 or so simultaneous transcodes (that’s when IOWAIT gets too high).

If you need more than 18 transcodes, a dedicated NVIDIA GPU with more than 8GB of VRAM is still the way to go.

If running windows you’ll need a Quadro however in Linux the hacked drivers to unlock multiple transcodes in GTX cards works fine (my testing was with a GTX1080 in Linux)

Your experience is an anecdote at this point. It may be valid, but without more data it’s only a point and not a trend or necessarily a representation. You’re also using hardware that’s not covered in this guide, I wonder if that has anything to do with it.

Huh? I’ve presented more data than you have.

The only data you’ve provided is a single screenshot from Plex saying “21 transcodes” - not showing the file types, transcode types or transcode speeds. For all a reader knows, half of these could have been buffering.

Post a detailed screen shot of 20+ simultaneous 1080p transcodes showing detailed transcode information for each stream (in particular transcode info and transcode speeds).

Also the hardware I’ve used is significantly higher quality than what’s being recommended in this thread.

2 Likes

Huh? I’ve presented more data than you have.

The only data you’ve provided is a single screenshot from Plex saying “21 transcodes” - not showing the file types, transcode types or transcode speeds. For all a reader knows, half of these could have been buffering.

Post a detailed screen shot of 20+ simultaneous 1080p transcodes showing detailed transcode information for each stream (in particular transcode info and transcode speeds).

Also the hardware I’ve used is significantly higher quality than what’s being recommended in this thread.

I’m new here and no expert but in my real world testing, I’ve had plenty of my movies and TV shows will transcode video while direct playing audio or vice versa, I always thought it was player that was the deciding factor if something got transcoded not nessasarily the source.

I’d assume if you could get 18 transcoded of both video and audio in your testing, do you not assume you could get well over 21 if you were just transcoding video?

Like I said, I’m a noob here but the argument seems weak, further testing probably nessesary.

If you have your player set to direct play - there are rare occasions where video will transcode with direct play audio.

Audio transcoding requires minimal CPU compute.

If you refer to both tests - you’ll see that user/system CPU usage is at most 25%. That’s the result of 24 DCA DTS 5.1 audio transcodes. Audio isn’t a deciding factor in this test case.

The cause of the buffering on Quick Sync is IOWAIT - not CPU usage as a result of audio transcoding.

I disagree about the argument being weak - in that both tests were like for like:

  • Video transcoded on dedicated GPU cores
  • Audio transcoded on CPU cores
  • Same machine, files, OS. No other processes running
  • Files stored and transcoded on RAM to remove disk IO as a potential bottleneck

But I agree that more testing can only be a good thing.

I’d love to see evidence that Quick Sync can transcode 20+ 1080P h264 files with a healthy transcode buffer. I’m all for Quick Sync over Nvidia. It’s cheaper and less power hungry and I can ditch my GPU.

The issue I’m facing is I’m unable to perform more than 20 1080P transcodes with Quick Sync, with modern, powerful hardware and an ideal test environment. Furthermore, I haven’t seen any evidence that proves it’s achievable on any hardware. I’m hoping that evidence exists!

I even ran the same test on another machine I have available (i7 7700k, 64GB RAM) and experienced the same buffering and 60% IOWAIT at about 18-20 1080P h264 transcodes.

1 Like