ST-2110: Codecs

In the previous parts, we retraced the history of ST-2110 and detailed its intrinsic advantages: stream independence, routing flexibility, scalability…​

This third part explores in detail the formats supported by ST-2110: uncompressed and compressed video, PCM and AES3 audio, metadata and subtitles. We will analyze the economic impact of these technical choices on infrastructures and examine whether it’s possible to reconcile broadcast quality with economic viability.

1 - History

Technologies that preceded the ST-2110 standard.

2 - Overview

The fundamental strengths and limitations of ST-2110.

3 - Codecs and Infrastructure

Impact on networks and format selection.

4 - Transport and Signaling

Transmission protocols, limitations and improvement directions.

5 - NMOS

The essential building block for ST-2110 production.

6 - Conclusion

Should we invest in coaxial cable recycling companies?

ST-2110: Inventory

To establish a clear framework for our analysis, we will first go through the different parts of the standard. We can then confront this functional scope with the intended operational domain.

Video

ST 2110-20: Uncompressed Video Transport

An SDI - ST 2110 gateway with no added value.

As its title suggests, this document defines how to transport and signal uncompressed video. This part of the standard is interesting in cases where:

  • quality must be maximized, and without deterioration throughout the processing chain,

  • latency must be minimal,

  • computing resources are limited,

  • available bandwidth is significant and at low cost.

ST 2110-22: Constant Bitrate Compressed Video

Compressed video transport…​ constraining, and without interoperability guarantee.

This document defines how to transport and signal compressed video in an ST-2110 ecosystem. There are two major things to note:

  • The standard requires the transmission of constant bitrate streams. This is a surprising point, contrary to the advantages brought by IP compared to SDI: IP assets are perfectly capable of routing streams with varying bitrates. While we can understand that this choice simplifies the work of network assets, it prevents the optimization of network resources brought by the efficiency of VBR transport, and notably dynamic allocation with statmux-type mechanisms [1]

  • On the other hand, by choosing not to impose any codec, this part of the standard offers no guarantee of interoperability between equipment, which is nevertheless a crucial point. Two pieces of equipment labeled as ST 2110-22 compatible can fully respect the standard while being totally incompatible if their manufacturers have chosen different codecs.

Audio

Historically, SDI allowed embedding both PCM and AES3 streams. Unsurprisingly, the audio part of ST-2110 takes up this logic and describes the transport of these two formats.

ST 2110-30: PCM Digital Audio

An AES 67 profile guaranteeing interoperability with the IP audio ecosystem.

ST 2110-30 presents itself as a restricted profile of AES67, designed for broadcast. It requires support for the 48 KHz sampling frequency and removes VoIP-oriented mechanisms. This brings the immense advantage of guaranteeing backward compatibility with AES 67 or compatible infrastructures.

The chosen format is a linear PCM stream at 16 or 24 bits.

ST 2110-31: AES3 Transparent Transport

An SDI - ST 2110 gateway contrary to the standard’s philosophy

ST 2110-31 allows transporting AES3 streams in an ST-2110 infrastructure. However, AES3 is an encapsulation protocol that can contain:

  • linear PCM,

  • compressed audio formats like Dolby E, Dolby ED2, AC-3/E-AC-3, MP1L2 or AAC,

  • but also, in certain cases, data that isn’t even audio (for example metadata).

We are therefore faced with an IP encapsulation of a stream that itself encapsulates an essence, contrary to ST-2110’s initial objective which aimed to clarify essence transport. The comparison with video is striking: where ST 2110-22 explicitly specifies the transported codec, ST 2110-31 limits itself to indicating "AES3," leaving the receiver to determine what the stream actually contains.

From a normative perspective, a first problem appears: the same essence can now be transported in several different ways. Thus, a PCM stream can transit either in ST 2110-30 or ST 2110-31. Similarly, certain metadata streams can be broadcast indifferently in ST 2110-31 or ST 2110-4X.

Finally, this approach once again raises interoperability problems: no restrictions are defined concerning codecs or a minimal compatibility profile, paving the way for mutually incompatible implementations.

Ancillary Data

ST 2110-40: SMPTE ST 291-1 Ancillary Data

An SDI - ST 2110 gateway, with no added value, nor possibility of evolution.

This standard is a gateway that allows transporting ancillary data contained in the blanking intervals (again!) of an SDI signal. It exists only as a backward compatibility mechanism and should be progressively replaced by alternatives better suited to the IP world.

ST 2110-41: Fast Metadata Framework

A universal means of transmitting metadata.

ST 2110-41 aims to offer a generic metadata transport mechanism. The principle is appealing: an agnostic format, capable of encapsulating both standardized JSON or XML and proprietary data.

ST 2110-43: Timed Text Markup Language for Captions and Subtitles

A transport mode that gives subtitles back their full essence status.

This part of the standard defines the transport of subtitles and captions using TTML (Timed Text Markup Language), a widely used format adapted to the IP world.

The interest is twofold:

  • Subtitles are no longer relegated to simple auxiliary data hidden in SDI blanking intervals, but considered as a full essence like video or audio.

  • The use of TTML opens the way to better interoperability and much more flexible processing than old subtitles encapsulated in the VBI.

From Inventory to Usage

Context and Use Case Framework

Before advancing in our thinking, it’s necessary to specify when it’s relevant to consider adopting ST-2110 streams. However, SMPTE doesn’t define use cases anywhere: the standard describes how to transport media streams over IP  essentially those that were once carried in SDI. This is quite obvious in the inventory: a large part of the documents struggle to break free from SDI-inherited reflexes.

If we consider that the standard’s purpose is to replace SDI, then we must consider that the technical choices are capable of ensuring all stages from image capture to final control room, before encoding. Since not all broadcasting is necessarily live, we’ll also need to think about recording and editing issues. Finally, insofar as one of the advantages mentioned in the previous chapter, it’s necessary that the technical choices be compatible with cloud infrastructures.

Infrastructure Constraints

In SDI, the cost of an infrastructure remained independent of the transported bitrate: once the cable and equipment were in place, broadcasting a 4K stream costs the same price as an SD stream. In IP, costs increase proportionally to the consumed bandwidth.

To contain these costs, it therefore becomes essential to optimize bandwidth usage according to actually expected functionalities. This principle already applies to on-premise architectures, where bitrate directly conditions network dimensioning. But it’s even more crucial in the cloud, where every megabit is billed.

For similar reasons, it’s important to keep an eye on the computing power needed for processing.

Expectations on Essences

Once infrastructure constraints are stated, it’s necessary to define expectations for essence broadcasting, particularly in terms of quality and latency.

In SDI, baseband transmission guarantees maximum quality and quasi-zero latency: it’s therefore tempting to expect the same thing in IP.

However, we must not forget that while this baseband can be preserved end-to-end during live productions, it’s no longer the case as soon as recording enters the chain. In practice, XDCAM remains the format mostly used in this context, with XAVC Intra ready to take over - yet both formats inherently imply visual signal deterioration while remaining compliant with broadcasters' expectations.

In terms of latency, it seems reasonable to accept an order of magnitude of a few frames between the beginning of the chain and final control room output: the delays induced by broadcasting modes being superior by several orders of magnitude, including in streaming.

Analysis by Essence

Audio

ST 2110-30 and S-ADM contribution via ST 2110-41 perfectly cover the needs.

For audio, the situation is relatively simple. The necessary resources  both in bandwidth and computing power  are negligible compared to video. There’s therefore no reason to deprive ourselves of linear PCM transport, as defined by ST 2110-30, which guarantees direct interoperability with AES67 already widely deployed.

When descriptive metadata is necessary, for example for immersive production, the stream can be enriched via S-ADM transported in ST 2110-41. This combination (2110-30 + 2110-41) allows preserving essence clarity without compromising the future.

Ancillary Data

ST 2110-41 and ST 2110-43 fulfill the expected role. SCTE 104 remains stuck in its blanking interval.

For metadata as well as subtitles, the building blocks are present. ST 2110-41 offers a generic framework flexible enough for most needs, while ST 2110-43 gives subtitles back their full essence status. Despite some questionable technical choices, the whole is exploitable and coherent with the IP spirit.

SCTE-104, used to signal advertising insertion points, remains today prisoner of ST 2110-40 where it transits as simple auxiliary data inherited from SDI. This approach ensures compatibility, but it remains out of step with the IP spirit, where this metadata would deserve native transport. ST 2110-41 offers an ideal framework, if only an official mapping were defined. In the absence of a standard, some actors have already taken the lead: SiriusXM has thus registered a Data Item Type (DIT) to transport SCTE-104 in JSON over 2110-41.

Video

The analysis reveals a paradox: where ST 2110-20 offers perfect technical compatibility but economically unsustainable, ST 2110-22 proposes an economically viable solution but compromises interoperability. VSF TR-08 proposes a profile based on JPEG-XS allowing reconciliation of these two requirements.
Uncompressed

As stated in the section describing ST 2110-20, uncompressed video transport is only conceivable when bandwidth is available at low cost…​ And for now, that’s far from being the case. Schematically, on a 25 Gbps link, it’s only possible to transport 16 1080p25 streams or 4 2160p25 streams, and a single 8K stream. This quickly imposes 100 or 400 Gbps network cores, and infrastructure costs that aren’t justified by the advantages brought by the standard. The paradox is obvious: this mode can only exist in a laboratory or on an isolated set - a scale at which SDI remains perfectly sufficient and much cheaper.

Compressed

For its part, ST 2110-22 brings a perfectly valid technical response, which provides total flexibility on the nature of the transported codec. This approach allows adapting the format according to needs and capabilities at each stage of the chain: we can thus imagine a camera output in XAVC Intra, a mixer output in lossless JPEG XS, then transport to the cloud in HEVC at 25 Mb/s after watermark insertion.

Unfortunately, this freedom compromises interoperability. Fortunately, several external initiatives offer a way out of this interoperability impasse, while allowing a drastic reduction in infrastructure costs.

The VSF TR-08 document constitutes a direct response to the problem: it defines a JPEG-XS interoperability profile specifically designed for ST 2110-22, relying on RFC 9134 for the transport layer. Beyond protocol aspects, TR-08 specifies critical parameters  authorized compression ranges, encoding parameters  that allow aligning implementations between manufacturers. This pragmatic approach has been validated by the EBU, which cites TR-08 in its reference document Tech 3371.

In practice, officializing TR-08 as the reference JPEG-XS profile for ST 2110-22 would immediately solve the identified interoperability problems, while bringing substantial economic benefits:

  • Drastic reduction in infrastructure costs: JPEG-XS compression ratios allow dividing the necessary bandwidth by a factor of 13.

  • Cloud deployment facilitation: the decrease in network consumption makes economically feasible the use of remote infrastructures

  • Quality preservation: visually lossless compression adapted to broadcast requirements

  • Minimal latency: less than one line delay, compatible with real-time constraints

This approach thus reconciles the technical, economic and operational requirements of modern broadcast IP.

Table 1. Comparison of 1080p25 bitrates (4:2:2 10 bits)
FormatBitrateRatio

ST 2110-20

1.04 Gbps

1:1

JPEG-XS 1.5 bpp

78 Mbps

13.3:1

JPEG-XS 4 bpp

207 Mbps

5:1

 

Table 2. Comparison of 2160p25 bitrates (4:2:2 10 bits)
FormatBitrateRatio

ST 2110-20

4.15 Gbps

1:1

JPEG-XS 1.5 bpp

311 Mbps

13.3:1

JPEG-XS 4 bpp

829 Mbps

5:1

Number of video streams on a 25Gbps link.
Figure 1. Number of video streams on a 25 Gbps link.

2 - Overview

4 - Transport and Signaling

Bibliography

[1] SMPTE ST 2110-20:2022. Professional Media Over Managed IP Networks: Uncompressed Active Video. Society of Motion Picture and Television Engineers, 2022.

[2] SMPTE ST 2110-22:2022. Professional Media Over Managed IP Networks: Constant Bit-Rate Compressed Video. Society of Motion Picture and Television Engineers, 2022.

[3] SMPTE ST 2110-30:2022. Professional Media Over Managed IP Networks: PCM Digital Audio. Society of Motion Picture and Television Engineers, 2022.

[4] SMPTE ST 2110-31:2022. Professional Media Over Managed IP Networks: AES3 Transparent Transport. Society of Motion Picture and Television Engineers, 2022.

[5] SMPTE ST 2110-40:2018. Professional Media Over Managed IP Networks: SMPTE ST 291-1 Ancillary Data. Society of Motion Picture and Television Engineers, 2018.

[6] SMPTE ST 2110-41:2022. Professional Media Over Managed IP Networks: Fast Metadata Framework. Society of Motion Picture and Television Engineers, 2022.

[7] SMPTE ST 2110-43:2021. Professional Media Over Managed IP Networks: Timed Text Markup Language for Captions and Subtitles. Society of Motion Picture and Television Engineers, 2021.

[9] SCTE 104 2018. Automation System to Compression System Communications Applications Program Interface (API). Society of Cable Telecommunications Engineers, 2018.

[10] W3C TTML2. Timed Text Markup Language 2 (TTML2). World Wide Web Consortium, 2018.

[11] ISO/IEC 21122-1:2019. Information technology  JPEG XS low-latency lightweight image coding system  Part 1: Core coding system. International Organization for Standardization, 2019.

[12] ITU-T H.265 | ITU-T H.265:2024. High efficiency video coding. International Telecommunication Union, 2024.

[15] SMPTE. SMPTE ST2110-41 Administrative Register. Society of Motion Picture and Television Engineers.


1. Statistical Multiplexing is a mechanism that has been used for a long time in DVB broadcasting that allows dynamic allocation of each stream’s bitrate in real time according to scene complexity or any other criteria.