# A Tight Signal-Band Power Bound on Mismatch Noise in a Mismatch-Shaping Digital-to-Analog Converter

Jared Welz, Member, IEEE, and Ian Galton, Member, IEEE

Abstract-Many applications employ digital-to-analog converters (DACs) to obtain the advantages of digital processing (e.g., low power and physical size, resilience to noise, etc.) to generate signals, such as voltages, that are analog in nature. Given the appropriate numerical representation of its input, the DAC ideally behaves as a linear gain element. However, as a result of inevitable component mismatches, the output of a multibit DAC (i.e., a DAC designed to output more than two analog levels) is a nonlinear function of its input. The resulting distortion, called DAC noise, limits the overall signal-to-noise ratio (SNR) and hence the obtainable accuracy of the DAC. Mismatch-shaping DACs exploit built-in redundancy to suppress the DAC noise in the input signal's frequency band. Although mismatch-shaping DACs are widely used in commercial products, little theory regarding the structure of their DAC noise has been published to date. Consequently, designers have been forced to rely upon simulations to estimate DAC noise power and behavior, which can be misleading because the DAC noise depends on the DAC input. This paper addresses this problem. It presents an analysis of the DAC noise power spectral density (PSD) in a commonly used mismatch-shaping DAC: the dithered first-order low-pass tree-structured DAC. This design ensures that its DAC noise has a spectral null at dc (i.e., zero frequency) by generating digital, dc-free sequences using the same techniques that have been developed for line codes. An expression is derived for the DAC noise PSD that depends on the statistics of these sequences and is used to show various properties of the DAC noise. Specifically, an attainable bound is derived for the signal-band DAC noise power that can be used to predict worst case performance in practical circuits.

Index Terms—Analog-to-digital, data converters, dc-free sequences, delta–sigma ( $\Delta\Sigma$ ), digital-to-analog, dynamic element matching, mismatch shaping, multibit, sigma–delta, spectral shaping.

## I. INTRODUCTION

**I** N many applications, such as telecommunications, information that is processed digitally must be converted to an analog signal using a digital-to-analog converter (DAC). This

Manuscript received June 25, 2002; revised Nov. 27, 2003. This work was supported by the University of California Communications Research Program under Grant core00-10069 and by the National Science Foundation under Grant CCR-0073552.

I. Galton is with the Department of Electrical and Computer Engineering, University of California, San Diego, La Jolla, CA 92093-0407 USA (e-mail: galton@ece.ucsd.edu).

Communicated by G. Battail, Associate Editor At Large.

Digital Object Identifier 10.1109/TIT.2004.825022

device receives a digital input—i.e., an abstract element from a given finite set called an *alphabet*—and produces an analog physical quantity, such as a constant voltage, that is unique for the provided input. For most DACs, each possible output is of the same nature (e.g., only voltages and not currents) and are only differentiated by a scalar multiplication. Therefore, the set of  $M_u$  possible DAC outputs can be characterized as  $\{\Delta \cdot u_i : i = 1, \dots, M_u\}$ , where each  $u_i$  is a unique, unitless constant, and  $\Delta$  represents the physical quantity. Thus, from a mathematical point of view, the DAC provides a bijective mapping from the digital input's alphabet to a set of analog values.

Designers usually simplify this mapping by exploiting the arbitrariness of the names or "values" of the digital inputs. As previously mentioned, each DAC output can be written as product of an element in  $\{u_i : i = 1, \ldots, M_u\}$  by the physical quantity  $\Delta$ . By attributing the value  $u_i$  to the digital input that produces  $\Delta \cdot u_i$ , the ideal DAC can be considered a gain element whose magnitude is  $\Delta$ . This representation of the DAC input's alphabet is not unique, so the designer typically uses one that is most convenient for the application.

Most DACs are not built with the intent to perform a single conversion. The DAC input is normally a temporal sequence of digital values that are converted at specific instances in time, which are typically periodic. Such a sequence will be denoted by y[n], where the dummy variable n corresponds to the specific instance in time, called a *sample time*, that the sequence is evaluated or converted, and y[n] represents either the value of the sequence at that instant or the sequence in its entirety, depending on the context. It is assumed that the (n + 1)st sample, y[n + 1], always occurs after the the nth sample, y[n]. In the periodic case, each sample follows its previous sample by the same amount of time. Thus, the DAC input, y[n], is discrete in both time and value.

The DAC output, on the other hand, is a continuous-time physical quantity. It is assumed, without loss of generality, that the DAC holds its output at a constant value from the time that the given digital value is converted to the next sample time. Thus, between the *n*th and (n + 1)st sample times, the ideal DAC output is  $\hat{z}(t) = \Delta y[n]$ . Any pulse shaping or interpolation can be achieved by appropriately filtering this DAC output. Even though the DAC output is a continuous-time signal, it is uniquely determined at the sample times and therefore is more conveniently represented as the sequence  $\hat{z}[n] = \Delta y[n]$ .

To understand the physical implementation of a DAC, it is first necessary to understand how a digital sequence is implemented with analog circuitry. The digital sequence is typically

J. Welz was with the Department of Electrical and Computer Engineering, University of California, San Diego, La Jolla, CA 92093-0407 USA. He is now with Northrop Grumman Space Technology, Redondo Beach, CA 90278 USA (e-mail: jaredshirlene@yahoo.com).

represented by means of a sequence of analog voltages that are interpreted as discrete quantities by comparing them with given thresholds. For instance, a zero-voltage threshold is used for a binary sequence; thus, the values of the digital sequence are determined by the polarities of its representative voltage signal at each sample time. Samples that have positive and negative voltages are referred to as *high* and *low* samples, respectively. If the digital sequence's alphabet has more than two elements, the sequence is usually represented by a set of binary digital sequences (e.g.,  $N_b$  binary sequences can be used to represent a  $2^{N_b}$ -valued digital sequence).

In regards to the physical implementation of a DAC, first consider the two-level (i.e., 1-bit) case. This DAC can be implemented using a controllable current source. At each sample time, the voltage polarity that constitutes the input 1-bit digital sequence determines whether the current source is to be switched on or off for the entire sampling period. Ideally, the current supplied by this source is held constant for the entire sampling period. The current source is connected to a resistor to generate the 1-bit DAC's output voltage. Thus, during each sample interval, the 1-bit DAC generates one of two voltages depending on how the digital input switches the current source. If the two output voltages are nominally  $\pm \Delta/2$  volts, then the alphabet of the 1-bit DAC input can be chosen to be  $\{-1/2, 1/2\}$ , where 1/2 and -1/2 correspond to high and low samples, respectively, so that the ideal 1-bit DAC behaves as a gain element whose magnitude is  $\Delta$  volts.

Multibit DACs (i.e., those whose outputs are designed to include more than two levels) are often constructed by summing the outputs of several 1-bit DACs. In each case, the multibit DAC input is mapped to a set of 1-bit sequences which drive a bank of 1-bit DACs whose outputs are summed. This sum operation can be achieved with current source 1-bit DACs by connecting all of the current source outputs to a common node. As previously described, the multibit DAC is usually designed and its input alphabet chosen so that its output is ideally a scalar multiple of its input:  $\hat{z}[n] = \Delta y[n]$ . However, mismatches among the 1-bit DACs, which are inevitably introduced during fabrication, cause the actual DAC output z[n] to be a memoryless, nonlinear function of the input. The resulting error can be viewed, without approximation, as a constant gain error and additive offset, and an additive zero-mean sequence referred to as the DAC noise. In other words, the DAC output can be written as  $z[n] = \alpha y[n] + \beta + e[n]$ , where  $\alpha, \beta$  are constants ( $\alpha$  equals  $\Delta$ multiplied by the constant gain error factor) and e[n] is the DAC noise. In most cases, the performance criteria for the multibit DAC are substantially more sensitive to the DAC noise than to the gain error and offset. For example, the DAC noise usually limits the effective resolution of the DAC and can contain spurious tones that limit the converter's spurious-free dynamic range (SFDR).

As an example, consider the case where a multibit DAC is used in a *delta-sigma* ( $\Delta\Sigma$ ) analog-to-digital converter (ADC). A  $\Delta\Sigma$ ADC extracts a high-resolution digital version of its input from a low-resolution version by ensuring that most of the coarse version's quantization noise power resides in a separate frequency band from that of the ADC input so that it can be removed by filtering. To accomplish this noise spectral shaping, the coarse sequence is generated by quantizing a signal that consists of both the ADC input and previous samples of the coarse sequence that have been converted back to analog. When a multibit DAC performs this conversion, it generates DAC noise that shares the same signal path as the ADC input and is thus not spectrally shaped like the quantization noise. Consequently, much of the DAC noise cannot be removed, and this noise, therefore, limits the obtainable resolution of the  $\Delta\Sigma$ ADC.

Mismatch-shaping DACs are commonly used to reduce the harmful effects of DAC noise in such ( $\Delta\Sigma$ ) data converters where the signal of interest is restricted to a *signal band* that is narrow relative to the sample rate [1]–[6]. These DACs use digital logic to scramble the input sequences to the bank of 1-bit DACs in an input-dependent fashion such that the DAC noise is attenuated in the signal band. For example, if the input sequence's power spectral density (PSD) is confined to a small region around dc (i.e., zero frequency), then the digital logic can be used to scramble the inputs to the 1-bit DACs so that the PSD of the DAC noise has a high-pass shape with most of its power outside of the signal band. By passing the DAC output through a low-pass filter, most of the power from the DAC noise can be removed while that from the DAC input is preserved, which increases the DACs effective resolution. Such mismatch-shaping DACs have facilitated multibit  $\Delta\Sigma$  modulation [7]–[9] for data conversion and have proven to be enabling components in most of today's high-performance  $\Delta \Sigma$  data converters [10]–[16].

Nevertheless, despite the widespread commercial use of mismatch-shaping DACs, few theoretical results have been published to date that can be used to quantify their performance. Most of the previously published theoretical analyses have been limited to showing that the DAC noise PSD vanishes at some frequency (e.g., see [17] and [18]). Consequently, designers rely heavily on simulations to evaluate the power and tonal properties of the DAC noise. However, these simulations can be misleading because the DAC noise depends on both the chosen DAC input and mismatches.

This paper presents a theoretical analysis of the DAC noise in two versions of a widely used mismatch-shaping DAC architecture: the dithered first-order low-pass tree-structured DAC [6], [15]–[21]. The DAC noise of this device is a linear combination of digital sequences, called switching sequences, that are generated inside it. The two versions are distinguished by how these sequences are implemented. In the analysis of both versions of this device, expressions for the DAC noise PSDs are derived as functions of the switching sequence statistics and 1-bit DAC mismatches. These PSD expressions are used to derive a bound on the signal-band DAC noise power for each of the two DAC versions. These bounds are independent of the multibit DAC input and can be used as a worst case estimate in the design of data converters that employ these DACs. Moreover, each bound is shown to be tight as there exist a set of DAC mismatches and an input sequence that give rise to DAC noise that achieves the bound.

The paper is divided into three main sections and an Appendix. Section II reviews the operation of the dithered first-order low-pass tree-structured DACs. This section shows how line coding techniques are used to ensure that the DAC



Fig. 1. A 9-level tree-structured DAC.

noise PSD has a spectral null at dc. Section III presents and discusses the expressions for the switching sequence PSD and signal band power. Section IV addresses the differences between the two versions of the tree-structured DAC as it presents and discusses the DAC noise signal-band power bound for each. The Appendix presents the derivations of most of the main results.

## II. TREE-STRUCTURED DAC

An example 9-level tree-structured DAC is shown in Fig. 1. In general, the  $(2^b + 1)$ -level tree-structured DAC, where *b* is a positive integer, consists of a bank of  $2^b$  1-bit DACs and a *digital encoder*. The DAC input y[n] is a digital sequence whose values belong to the alphabet  $\{-2^{b-1}, -2^{b-1}+1, \ldots, 2^{b-1}-1, 2^{b-1}\}$ which is a set that consists of  $(2^b + 1)$  values that are designed to be converted to the same number of analog levels. The digital encoder converts y[n] into  $2^b$  1-bit sequences that are denoted  $x_1[n], \ldots, x_{2^b}[n]$  from bottom to top. Like the example in the Introduction, each of these 1-bit sequences takes on values in the alphabet  $\{-1/2, 1/2\}$ . The *i*th 1-bit DAC converts  $x_i[n]$  into an analog sample  $y_i[n]$  as follows:

$$y_i[n] = \begin{cases} \frac{\Delta_D}{2} + e_{h_i}, & \text{if } x_i[n] = 1/2\\ -\frac{\Delta_D}{2} + e_{l_i}, & \text{if } x_i[n] = -1/2 \end{cases}$$
(1)

where  $\Delta_D$  is the nominal smallest step size of the tree-structured DAC, and  $e_{h_i}$  and  $e_{l_i}$  are the 1-bit DAC's high and low errors, respectively. These error terms result from inevitable inaccuracies in the fabrication of the 1-bit DACs and are taken to be arbitrary constants. The digital encoder consists of *b* vertical layers of *switching blocks*, labeled  $S_{k,r}$ , where  $k = 1, \ldots, b$  is the layer number, and  $r = 1, \ldots, 2^{b-k}$  is the horizontal depth within the layer. The switching blocks are described in more detail later in this section.

Typically, y[n] consists of the sum of a data signal and noise. The noise component's power can be spread across all frequencies while the data signal's power is confined to the radial frequencies in the interval  $(-\pi/O, \pi/O)$ , where O is the oversampling ratio (OSR). This terminology was chosen because the tree-structured DAC is most often used as a component in  $\Delta\Sigma$  converters where the data signal is oversampled. Thus, the normalized radial frequency  $2\pi$  corresponds to the Nyquist frequency of the data signal.

Ideally, the DAC output is a scaled version of the DAC input:  $\hat{z}[n] = \Delta_D y[n]$ . To ensure that the DAC approaches this ideal behavior when the 1-bit DAC error terms approach zero, the digital encoder outputs must satisfy the following equality:

$$x_1[n] + \ldots + x_{2^b}[n] = y[n].$$
<sup>(2)</sup>

This equality must hold for any multibit DAC that is constructed by combining  $2^b$  1-bit DACs of the same nominal step size with a digital encoder as shown in Fig. 1. For each value of y[n] except  $\pm 2^{b-1}$ , there are several possible ways to choose which digital encoder outputs are 1/2 and which are -1/2 under the constraint that (2) is satisfied. For example, if y[n] = 0, (2) is satisfied when the number of digital encoder outputs that are 1/2equals the number of outputs that are -1/2. This inherent redundancy is exploited by the mismatch-shaping DAC to control certain characteristics of its DAC noise. In the tree-structured DAC, the processing of the switching blocks, as described next, makes this relationship between the choices of digital encoder and the DAC noise manifest.

Let  $x_{k,r}[n]$  denote the input to  $S_{k,r}$ . With the digital encoder outputs written as  $x_{0,i}[n] \triangleq x_{2^b+1-i}[n]$ , for  $i = 1, \ldots, 2^b$ , the switching blocks are interconnected so that the top and bottom outputs of the switching block  $S_{k,r}$  are  $x_{k-1,2r-1}[n]$ and  $x_{k-1,2r}[n]$ , respectively. To ensure that (2) is satisfied and that  $x_i[n] = \pm 1/2$ , it is sufficient, as proven in [6], that each switching block satisfies the following two-part Number Conservation Rule: the two outputs of each switching block must belong to the set  $\{-2^{k-2}, -2^{k-2} + 1, \ldots, 2^{k-2} - 1, 2^{k-2}\}$ , where k is the layer number, and their sum must equal the input to this switching block

$$x_{k-1,2r-1}[n] + x_{k-1,2r}[n] = x_{k,r}[n].$$
(3)

When all the switching blocks comply with this rule, all the switching block inputs are integer-valued sequences. This rule is



Fig. 2. The signal processing performed by the switching block.

satisfied using the switching block architecture shown in Fig. 2, which consists of a switching sequence generator, an adder, a subtracter, and two divide-by-two elements. Fig. 2 indicates that

$$x_{k-1,2r-1}[n] = \frac{1}{2} \left( x_{k,r}[n] + s_{k,r}[n] \right)$$
(4)

and

$$x_{k-1,2r}[n] = \frac{1}{2} \left( x_{k,r}[n] - s_{k,r}[n] \right)$$
(5)

where  $s_{k,r}[n]$  is called the *switching sequence*. To motivate the description of the switching sequence generator, the relationship between the switching sequences and the DAC noise is shown next.

As proven in [6], the DAC output can be written as

$$z[n] = \alpha y[n] + \beta + e[n] \tag{6}$$

where y[n] is the DAC input,  $\alpha$  and  $\beta$  are constants that are functions of the 1-bit DAC errors, and e[n], called the *DAC noise*, is given by

$$e[n] = \sum_{k=1}^{b} \sum_{r=1}^{2^{b-k}} \Delta_{k,r} s_{k,r}[n]$$
(7)

where

$$\Delta_{k,r} = \frac{1}{2^k} \sum_{i=(r-1)2^k+1}^{(r-1)2^k+2^{k-1}} \left[ (e_{h_i} - e_{l_i}) - (e_{h_{i+2^{k-1}}} - e_{l_{i+2^{k-1}}}) \right]$$
(8)

and, as previously stated,  $e_{h_i}$  and  $e_{l_i}$  are the *i*th 1-bit DAC's high and low errors, respectively, which are taken to be constants. Thus, since each  $\Delta_{k,r}$  is a constant, the DAC noise is a constantcoefficient linear combination of the switching sequences. At each sample time, the collection of the switching sequences that satisfy the Number Conservation Rule give rise to the different choices for how the digital encoder selects its output values so that (2) is satisfied. As shown next, the switching sequence generators choose their switching sequences to control characteristics of the DAC noise.

In the dithered, first-order low-pass tree-structured DAC, the switching sequence generator in  $S_{k,r}$  selects its switching sequence under the following constraints:

1) It satisfies the following:

$$s_{k,r}[n] = \pm o_{k,r}[n] \tag{9}$$

where  $o_{k,r}[n]$ , called the *parity sequence* of  $S_{k,r}$ , is 1 when  $x_{k,r}[n] + 2^{k-1}$  is odd and 0 otherwise.

- 2) It is a dc-free sequence (i.e., it has a spectral null at dc).
- It contains no tones for any choice of the switching block input.

Condition 1 ensures that the switching block satisfies the range requirement of the Number Conservation Rule, while Conditions 2 and 3 ensure that the DAC noise is a dc- and tone-free sequence, respectively. With these constraints, the switching sequence can be viewed as a pseudoternary line code for its respective parity sequence. Since the switching sequence generator is a finite-state machine, it follows from [22] that provided the parity sequence  $o_{k,r}[n]$  consists of independent and identically distributed (i.i.d.) bits, then  $s_{k,r}[n]$  is a dc-free sequence if and only if its running digital sum, given by

$$R_{k,r}(m) \triangleq \sum_{n=0}^{m} s_{k,r}[n] \tag{10}$$

takes on only a finite number of values for all m. However, the parity sequence is not necessarily a sequence of i.i.d. bits, but, as shown in the Proposition in the Appendix, this necessary and sufficient condition holds whenever the PSD of  $s_{k,r}[n]$  exists.

A common line code that satisfies the first two constraints is the bipolar code [23] (where  $o_{k,r}[n]$  represents the data and  $s_{k,r}[n]$  is the code). When it is used, the nonzero switching sequence values always alternate between 1 and -1; thus,  $R_{k,r}(m)$  takes on only two values. The undithered switching block presented in [19] generates this switching sequence. However, if  $o_{k,r}[n] = 1$  for all n, then  $s_{k,r}[n] = \cos(\pi n)$ , which implies that the bipolar code does not satisfy the third constraint.

To satisfy all three conditions, the switching sequence is constructed by concatenating two types of *symbols* 

Type 1: 1, 
$$\underbrace{0, \dots, 0}_{\text{Until next}}$$
 -1,  $\underbrace{0, \dots, 0}_{\text{Until next}}$  (11)  
 $o_{k,r}[n]=1$   $o_{k,r}[n]=1$ 

and

Type 2: -1, 
$$\underbrace{0, \dots, 0}_{\text{Until next}}$$
 1,  $\underbrace{0, \dots, 0}_{o_{k,r}[n]=1}$  (12)

the choice of which is made randomly by an approximated fair coin toss. Using such symbols to generate  $s_{k,r}[n]$  ensures that  $R_{k,r}(m) \in \{-1, 0, 1\}$  which implies that  $s_{k,r}[n]$  has a spectral null at dc.

Fig. 3 shows the finite-state transition diagram (FSTD) for the switching sequence generator where the states correspond to the values of  $R_{k,r}(m)$  and the edge labels are the outputs that occur with the associated changes of state. The state of the switching sequence generator changes only at sample times when the parity sequence is 1. Moreover, it changes from 0 to 1 at sample times when a Type 1 symbol begins in the switching sequence, and it changes from 0 to -1 at sample times when a Type 2 symbol begins in the switching sequence; the choice of which, as previously described, is random. Example implementations of this switching sequence generator using two D-type flip-flops are presented in [15] and [19].



Fig. 3. The FSTD for the switching sequence generator where the state corresponds to the value of  $R_{k,r}(m)$ .

If a symbol starts at the present sample time  $n_0$ , the random symbol type selection implies that, regardless of the parity sequence  $o_{k,r}[n]$ , the present and future samples of the switching sequence are uncorrelated from the past samples

$$E\{s_{k,r}[n_0 - l]s_{k,r}[n_0 + m]\} = 0$$
(13)

for all  $m \ge 0$  and l > 0. Since every other nonzero sample of  $s_{k,r}[n]$  is the start of a new symbol, this implies that the switching sequence does not contain tones regardless of its associated parity sequence.

The choice of the symbol type in  $s_{k,r}[n]$  is made randomly with the 1-bit *dither sequence*  $d_{k,r}[n]$ . The dither sequence approximates a sequence of uniformly distributed, i.i.d. bits whose values are taken from the alphabet  $\{-1/2, 1/2\}$ . If a symbol starts at sample time  $n_0$ , then that symbol is a Type 1 symbol if  $d_{k,r}[n_0] = 1/2$ , and it is a Type 2 symbol if  $d_{k,r}[n_0] = -1/2$ . For (13) to hold, it is sufficient that  $d_{k,r}[n]$  be independent of  $x_{k,r}|n|$ . Therefore, all the switching blocks in the same layer k can share the same dither sequence—i.e.,  $d_{k,r}[n] = d_k[n]$  for each layer k. Implementations utilizing this dithering scheme require only b dither sequences, which are realized by pseudorandom sequence generators as demonstrated in [15]. As shown in Section IV, a much tighter bound on the DAC noise power is obtained when an independent dither sequence is employed by each switching block; however, this implementation requires  $2^{b} - 1$  dither sequences.

## **III. SWITCHING SEQUENCE SPECTRUM**

As reviewed in the previous section, the DAC noise in the tree-structured DAC is a linear combination of the switching sequences. Thus, the DAC noise PSD is a function of the switching sequence PSDs and cross spectra. This section presents and discusses an expression for the switching sequence PSD and signal-band power. The switching sequence cross spectrum is addressed in the Appendix. First, some intuition behind the switching sequence PSD and its derivation is provided along with some required terminology.

The dependence of the switching sequence on the parity sequence in (9) prevents a conventional analysis of its PSD. If  $o_{k,r}[n]$  were a sequence of i.i.d. bits, then  $s_{k,r}[n]$  could be written as a function of the Markov chain  $R_{k,r}(m)$ , and techniques such as those presented in [24] could be used to analyze the PSD. If  $o_{k,r}[n]$  were periodic, then  $s_{k,r}[n]$  would be a cyclostationary sequence, and its PSD could be determined by the commonly known techniques (e.g., see [25]) that were introduced in [26]. However, in general,  $o_{k,r}[n]$  is neither periodic nor a sequence of i.i.d. bits, so a new technique must be developed to determine the PSD of  $s_{k,r}[n]$ . The technique presented in this paper relies on the randomness in the symbol type selection. As a consequence of this randomness, samples of  $s_{k,r}[n]$  that are in different symbols are orthogonal in the sense that if  $n_0$  and  $n_1$  are sample times such that  $s_{k,r}[n_0]$  and  $s_{k,r}[n_1]$  are in different symbols, then

$$E\{s_{k,r}[n_0]s_{k,r}[n_1]\} = 0.$$
(14)

Therefore, the PSD of  $s_{k,r}[n]$  depends only on the correlation between samples of  $s_{k,r}[n]$  that are within the same symbol. These intrasymbol correlation statistics are conveniently described using the terminology presented next.

Let the symbols described in (11) and (12) be divided into two "halves" where the first  $\pm 1, 0, \ldots, 0$  segment is called the *head* of the symbol, and the second such segment is called the *tail* of the symbol. The *head length* of a symbol is defined to be the number of samples of  $s_{k,r}[n]$  that constitute the head of that symbol. Let the *head-length process*  $H_{k,r}$  be the random process that represents the head lengths of symbols in  $s_{k,r}[n]$ ; thus,  $H_{k,r}[m]$  is the number of samples in the head of the *m*th symbol in  $s_{k,r}[n]$ . The definitions of *tail length* and the *taillength process*  $T_{k,r}$  are analogous to those for the head length and head-length process, respectively.

Theorem 1: The PSD of  $s_{k,r}[n]$  is

$$S_{k,r}\left(e^{j\omega}\right) = 2\sigma_{k,r}^2 E\left\{\sin^2\left(\frac{\omega H_{k,r}[m]}{2}\right)\right\}$$
(15)

where  $\sigma_{k,r}^2 \triangleq E\{s_{k,r}^2[n]\}$ , and the signal-band power of  $s_{k,r}[n]$  is

$$P_{k,r}(O) = \frac{\sigma_{k,r}^2 E\left\{1 - \operatorname{sinc}\left(\frac{H_{k,r}[m]}{O}\right)\right\}}{O}$$
(16)

where  $\operatorname{sinc}(x) \triangleq \sin(\pi x) / (\pi x)$ , and O is the oversampling ratio.

*Proof:* Presented in the Appendix. 
$$\Box$$

Some properties of the above switching sequence PSD can be discerned even though it depends on the switching sequence head-length statistics and variance. For example, it is shown next that this PSD has a continuous derivative, which implies that the switching sequence cannot contain tones. Let

$$\phi_{H_{k,r}}(\omega) \triangleq E\{e^{j\omega H_{k,r}[m]}\}$$

which implies that

$$\operatorname{Re}\left\{\phi_{H_{k,r}}(\omega)\right\} = E\left\{\cos\left(\omega H_{k,r}[m]\right)\right\}$$

where  $\operatorname{Re}\{\cdot\}$  is the real-part operator. Therefore, it follows from (15) that the switching sequence PSD can be written as

$$S_{k,r}\left(e^{j\omega}\right) = \sigma_{k,r}^{2}\left(1 - \operatorname{Re}\left\{\phi_{H_{k,r}}\left(\omega\right)\right\}\right).$$
(17)

Provided  $E\{H_{k,r}[m]\} < \infty$ , then  $\phi_{H_{k,r}}(\omega)$  has a continuous derivative because it is the characteristic function of  $H_{k,r}[m]$  [27]. Therefore, it follows from (17) that  $S_{k,r}(e^{j\omega})$  also has this property in this case. However, if  $E\{H_{k,r}[m]\} = \infty$ , then  $\sigma_{k,r}^2 = 0$  and thus  $S_{k,r}(e^{j\omega}) = 0$  because, as proven in Lemma A1 in the Appendix

$$\sigma_{k,r}^2 = \frac{2}{E\{H_{k,r}[m]\} + E\{T_{k,r}[m]\}}.$$
(18)



Fig. 4. The function  $1 - \operatorname{sinc}(x)$ .

Therefore,  $S_{k,r}(e^{j\omega})$  has a continuous derivative in this case too. By the same reasoning, the real part of the cross spectrum of two switching sequences, as given in Theorem A1 in the Appendix, also has a continuous derivative. This implies that the DAC noise PSD also has this property and thus contains no spurious tones

Properties of the switching sequence signal-band power can also be derived using (16). Shown in Fig. 4 is a portion of the function that is the argument of the expectation operator in (16). Since this function approaches zero as its argument approaches zero, it follows from (16) that the switching sequence power, and thus the DAC noise power, can be made arbitrarily small by increasing the oversampling ratio O. Additionally, for a fixed O, (16) and (18) imply that the signal-band switching sequence power can be decreased by sufficiently decreasing or increasing the head lengths of symbols in  $s_{k,r}[n]$ . This suggests that, as proven in the next section, there is an upper bound for the switching sequence signal-band power.

Consider the following simplified scenario. Let  $o_{k,r}[n]$  be a sequence of i.i.d. Bernoulli trials with  $p \triangleq P(o_{k,r}[n] = 1)$  and  $q \triangleq P(o_{k,r}[n] = 0) = 1 - p$ . The desired switching sequence statistics are then

$$\sigma_{k,r}^2 = P(o_{k,r}[n] = 1) = p \tag{19}$$

and

$$P(H_{k,r}[m] = h) = q^{h-1}p.$$
 (20)

Substituting (19) and (20) into (15) gives the following switching sequence PSD:

$$S_{k,r}\left(e^{j\omega}\right) = \frac{2p\left(1+q\right)\sin^{2}\left(\omega/2\right)}{p^{2}+4q\sin^{2}\left(\omega/2\right)}.$$
 (21)

Fig. 5(a) shows the switching sequence PSD given above for varying values of p. Additionally, Fig. 5(b) shows the switching sequence signal-band power for varying values of p and O.

Fig. 5(b) shows that, for this simplified parity sequence, the switching sequence signal-band power, as a function of p, is bounded above by a value that depends on the oversampling ratio O. As shown in the next section, this is true in general as the switching sequence signal-band power is bounded by a value that depends on O regardless of the statistics of the parity sequence.

#### **IV. DAC NOISE POWER BOUND**

A key part of the proof of the DAC noise power bound is the derivation of the switching sequence power bound, which is provided next.

*Theorem 2:* The signal-band power of  $s_{k,r}[n]$  is bounded as follows:

$$P_{k,r}(O) \le \frac{2}{O(O+1)} \tag{22}$$

and the bound is achieved if and only if  $H_{k,r}[m] = O$  and  $T_{k,r}[m] = 1$  almost surely (a.s.) (i.e., with probability one).

*Proof:* Since the tail length of every symbol is at least one sample, it follows from (18) that

$$\sigma_{k,r}^2 \le \frac{2}{E\{H_{k,r}[m]\} + 1}.$$
(23)

Additionally, for any positive integer H, Lemma A2 in the Appendix provides

$$1 - \operatorname{sinc}\left(\frac{H}{O}\right) \le \frac{H+1}{O+1} \tag{24}$$

where equality is obtained if and only if H = O. Substituting (23) and (24) into (16) proves (22), and equality is obtained if and only if  $P(H_{k,r}[m] = O)$  and  $P(T_{k,r}[m] = 1)$  are both 1.



Fig. 5. The (a) PSD and (b) signal-band power of  $s_{k,r}[n]$  given its input parity sequence is an i.i.d. Bernoulli sequence with  $p = P(o_{k,r}[n] = 1)$ .

Because the DAC noise is a linear combination of the switching sequences as shown in (7), Theorem 2 implies that a DAC noise power bound could be obtained as a function of the oversampling ratio and the switching sequence coefficients ( $\Delta_{k,r}$  for all k and r). However, in practical circuits, the values

of these coefficients are not known, and the DAC noise power is typically estimated as a function of the oversampling ratio and matching statistics of the 1-bit DACs. Thus, to obtain a more useful result, the DAC noise power bounds presented in this paper are functions of the matching statistics of the 1-bit DACs and not the  $\Delta_{k,r}$  coefficients. Before the bounds are presented, some additional definitions are required concerning the matching characteristics of the 1-bit DACs.

Denote  $e_{h_i} - e_{l_i}$  as the *step-size error* of the *i*th 1-bit DAC. Let the relative step-size error of the *i*th 1-bit DAC be defined as

$$\delta_i \triangleq (e_{h_i} - e_{l_i}) - \frac{1}{2^b} \sum_{j=1}^{2^b} (e_{h_j} - e_{l_j}).$$
(25)

Thus,  $\delta_i$  is the difference between the step size of the *i*th 1-bit DAC,  $\Delta_i$ , and the sample average of the step sizes of the 1-bit DACs

$$\bar{\Delta}_D \triangleq \frac{1}{2^b} \sum_{j=1}^{2^b} \Delta_j.$$

Let the sample variance of the step-size errors be denoted

$$\bar{\sigma}_{\delta}^2 \triangleq \frac{1}{2^b} \sum_{i=1}^{2^b} \delta_i^2.$$
<sup>(26)</sup>

As shown next, the DAC noise PSD is bounded by a function of the oversampling ratio and the sample variance given above.

Theorem 3: If a dither sequence is shared by all the switching blocks in each layer, the DAC noise power is bounded as follows:

$$D_O \le \frac{4^b \,\bar{\sigma}_\delta^2}{2 \cdot O\left(O+1\right)} \tag{27}$$

and when  $\bar{\sigma}_{\delta}^2 \neq 0$ , this bound is achieved if and only if the following two conditions hold.

- 1.  $H_{1,r}[m] = O$  and  $T_{1,r}[m] = 1$  a.s. for each  $r \in \{1, \ldots, 2^{b-1}\}$ . 2. There exists a constant  $\hat{\delta}$  such that  $\delta_{2j-1} = \hat{\delta}$  and  $\delta_{2j} = -\hat{\delta}$  for each  $j \in \{1, \ldots, 2^{b-1}\}$ .

Moreover, if a unique dither sequence is used in each switching block, then the DAC noise power is bounded as follows:

$$D_O \le \frac{2^b \bar{\sigma}_\delta^2}{O\left(O+1\right)} \tag{28}$$

and when  $\bar{\sigma}_{\delta}^2 \neq 0$ , the bound is achieved if and only if the first condition from the previous case holds and the second condition is relaxed to be the following:

2'. 
$$\delta_{2j-1} = -\delta_{2j}$$
 for each  $j \in \{1, \dots, 2^{b-1}\}$ .  
*Proof:* Presented in the Appendix.

Theorem 3 implies that, for either dithering scenario, the DAC noise power bound is achieved if the relative mismatch errors satisfy Condition 2, the states of the switching sequence generators in layer one are reset to 0 at sample time n = 0, and the DAC input is given by

$$y[n] = \begin{cases} 0, & \text{if } n \mod (O+1) = 0 \text{ or } O\\ 2^{b-1}, & \text{otherwise.} \end{cases}$$
(29)

In this scenario,  $H_{1,r}[m] = O$  and  $T_{1,r}[m] = 1$  for each r = $1, \ldots, 2^{b-1}$  and all m > 0, which satisfies Condition 1 in the theorem.

The DAC noise power bound is larger in the case where a dither sequence is shared by switching blocks in the same layer because the switching sequences can be correlated in this case. If a symbol starts in  $s_{k,r_1}[n]$  and  $s_{k,r_2}[n]$   $(r_1 \neq r_2)$  at the same sample time, then the type of each symbol is chosen by the same differ sequence because  $d_{k,r_1}[n] = d_{k,r_2}[n] = d_k[n]$ . Therefore, these symbols are the same type, and this event gives rise to correlation between the two switching sequences. Although correlation between switching sequences can increase or decrease the DAC noise power, it increases the DAC noise power bound. By using an independent dither sequence in each switching block, a smaller DAC noise power bound is obtained at the cost of additional hardware.

Theorem 3 can be used to discern a guideline concerning the circuit layout of the tree-structured DAC. To achieve either power bound,  $\delta_{2i-1} = -\delta_{2i}$  for  $j = 1, \dots, b$ . Therefore, to minimize either bound, the DAC should be laid out to optimize the matching between the (2j-1)st and (2j)th 1-bit DACs. Typically, this is achieved by placing these 1-bit DACs as close as possible to each other or, if possible, interlacing the components of these 1-bit DACs on the integrated circuit. This guideline is in conflict to the often-used practice of the common centroid layout where the goal is to optimize matching amongst all the 1-bit DACs.

The DAC noise power bound can be used for noise budgeting in the design of circuits, such as  $\Delta\Sigma$  data converters, that employ the first-order tree-structured DAC. The worst case matching among 1-bit DACs is often characterized by the " $3\sigma$ " relative mismatch error, which represents a practical maximum. This error is typically given as a percent, denoted here as  $100\xi\%$ , of the sample average of the step sizes  $\overline{\Delta}_D$ . This implies that  $|\delta_i| \leq \xi \overline{\Delta}_D \approx \xi \Delta_D$ , which, with (26), leads to  $\overline{\sigma}_\delta \leq \xi \Delta_D$ . Substituting this inequality into (27) and (28) gives

 $\frac{D_O}{\Delta_D^2} \le \frac{4^b}{2} \left(\frac{\xi}{\sqrt{O(O+1)}}\right)^2$ 

and

$$\frac{D_O}{\Delta_D^2} \le 2^b \left(\frac{\xi}{\sqrt{O\left(O+1\right)}}\right)^2 \tag{31}$$

(30)

respectively. These upper bounds are shown as functions of  $\xi/\sqrt{O(O+1)}$  for b=3,4,5 in Fig. 6. Thus, the size of the tree-structured DAC (i.e., b), the oversampling ratio, the worst case matching percent, and the dithering scheme can be chosen using (30) and (31) to ensure the DAC noise power is less than the value budgeted to it in a given application.

## V. CONCLUSION

Expressions for the switching sequence PSD and signal-band power in the dithered first-order low-pass tree-structured DAC have been derived. These expressions have been used to obtain an attainable bound on the signal-band DAC noise power for both versions of this DAC. Necessary and sufficient conditions have been given for the bound to be achieved in each case. Additionally, it has been shown that by using an independent dither sequence in each switching block as opposed to each layer, the DAC noise



Fig. 6. DAC noise power bound relative to  $\Delta_D^2$  as a function of percent mismatch and oversampling ratio with a unique dither sequence used in (a) each switching block and (b) each layer.

power bound is smaller and achieved under less stringent conditions on the mismatch errors. Therefore, this dithering scheme is better suited in applications where the bound is used as an estimate for the DAC noise power. It has also been shown that, regardless of the dither scheme, the switching sequence PSD has a continuous derivative, which implies that the DAC noise in both implementations is void of spurious tones.

#### APPENDIX

The following material provides most of the mathematics to support the theory that is presented in this paper. It is tacitly assumed throughout that all spectral densities considered exist and all sequences are ergodic.

*Proposition:* Suppose 1) that s[n] is the output of a finite sequential state machine driven by an input sequence o[n] which

takes on a finite number of values for all n, and 2) that s[n] has a PSD. Then, s[n] has a spectral null at dc if and only if its running digital sum

$$R(m) \triangleq \sum_{n=0}^m s[n]$$

takes on a finite number of values for all m.

**Proof:** First, suppose that R(m) takes on a finite number of values for all m. This implies R(m) is a bounded sequence: i.e., there exists a constant B such that  $|R(m)| \leq B$  for all m. Therefore, Lemma 1 in [18], which is a generalization of Lemma 1 in [28] (the proof in this lemma does not require that the underlying probability measure be a Markov measure) proves that s[n] has a spectral null at dc.

Suppose s[n] has a spectral null at dc. Let  $z_n$  represent the *state* of the finite-state sequential machine at time n. If the machine input is an i.i.d. sequence, then it follows from [22] that there exists a complex-valued function  $\phi(\cdot)$  such that

$$s[n] = \phi(z_{n+1}) - \phi(z_n).$$
 (32)

However, any sequence o[n] can be a sample path of an i.i.d. sequence, so (32) must hold in general. Therefore,  $R(m) = \phi(z_{m+1}) - \phi(z_0)$ , which implies that R(m) can take on only a finite number of values for all m.

Notation and Definitions: Given the layer number kand two depth values  $r_1$  and  $r_2$ , let  $s_1[n] \triangleq s_{k,r_1}[n]$  and  $s_2[n] \triangleq s_{k,r_2}[n]$ . Two symbols in the switching sequences  $s_1[n]$  are  $s_2[n]$  are called *joint symbols* if they start at the same sample time. Let  $H_1[m]$  and  $H_2[m]$  represent the head lengths of the *m*th symbols in  $s_1[n]$  and  $s_2[n]$ , respectively. Let  $\hat{H}_1[m]$ and  $\hat{H}_2[m]$  be the head lengths of the *m*th joint symbols in  $s_1[n]$  and  $s_2[n]$ , respectively.

Theorem A1. Switching Sequence Cross Spectrum: Given  $s_1[n]$  and  $s_2[n]$  employ the same dither sequence, the real part of the cross spectrum of  $s_1[n]$  and  $s_2[n]$  is given by

$$S_{s_1,s_2}\left(e^{j\omega}\right) = \sigma_1 \sigma_2 \sqrt{\rho_1 \rho_2} E\left\{\sin^2\left(\frac{\omega \hat{H}_1[m]}{2}\right) + \sin^2\left(\frac{\omega \hat{H}_2[m]}{2}\right) - \sin^2\left(\frac{\omega \left(\hat{H}_1[m] - \hat{H}_2[m]\right)}{2}\right)\right\} (33)$$

where  $\sigma_1$  and  $\sigma_2$  are the standard deviations of  $s_1[n]$  and  $s_2[n]$ , respectively, and  $\rho_1$  and  $\rho_2$  are the probabilities that symbols in  $s_1[n]$  and  $s_2[n]$ , respectively, are joint.

*Proof:* For  $\lambda = 1, 2$ , let  $w_{\lambda,i}[n]$  be a window sequence that equals one when  $s_{\lambda}[n]$  is an element of the *i*th joint symbol and zero otherwise. Additionally, let

$$w_{\lambda,0}[n] \triangleq 1 - \sum_{i=1}^{\infty} w_{\lambda,i}[n].$$

Therefore, each switching sequence can be written as

$$s_{\lambda}[n] = \sum_{i=0}^{\infty} w_{\lambda,i}[n] s_{\lambda}[n].$$
(34)

For any positive *i* and *l* with  $i \neq l$ ,  $w_{1,i}[n]s_1[n]$  and  $w_{2,l}[m]s_2[m]$ , given  $x_{k,r_1}[n]$  and  $x_{k,r_2}[m]$ , are independent zero-mean random variables for any *n* and *m* because the

signs of each are determined by independent, uniform dither sequences. By the same reasoning, given  $x_{k,r_1}[n]$  and  $x_{k,r_2}[m]$ ,  $w_{1,0}[n]s_1[n]$  is independent of  $w_{2,l}[m]s_2[m]$  for every l, and  $w_{1,i}s_1[n]$  is independent of  $w_{2,0}[m]s_2[m]$  for every i. This implies

$$E_x \{ w_{1,i}[n] s_1[n] w_{2,i}[m] s_2[m] \} = 0$$
(35)

for any n, m, when either i or l is zero, or  $i \neq l$ , where  $E_x\{\cdot\}$  is the conditional expectation operator given the switching block inputs (i.e.,  $E_x\{\cdot\}$  only averages over the possible symbol type choices).

The cross spectrum is derived below by taking the expected value of a time-averaged estimate. Let  $\hat{N}_1$  and  $\hat{N}_2$  be the number of samples of  $s_1[n]$  and  $s_2[n]$ , respectively, that include the first N joint symbols. Let  $N_s = \max\{\hat{N}_1, \hat{N}_2\}$ . The time-averaged cross-spectrum estimate can be written as

$$P_N\left(e^{j\omega}\right) = \frac{1}{N_s} \left(\sum_{n=0}^{N_s-1} s_1[n]e^{-j\omega n}\right) \left(\sum_{m=0}^{N_s-1} s_2[m]e^{j\omega m}\right).$$
(36)

Since only N joint symbols are included in this spectrum estimate, it follows that

$$P_{N}\left(e^{j\omega}\right) = \frac{1}{N_{s}} \left(\sum_{n=0}^{N_{s}-1} \sum_{i=0}^{N} w_{1,i}[n]s_{1}[n]e^{-j\omega n}\right) \left(\sum_{m=0}^{N_{s}-1} \sum_{l=0}^{N} w_{2,l}[m]s_{2}[m]e^{j\omega m}\right).$$
(37)

Let  $S_N(e^{j\omega}) = E_x\{P_N(e^{j\omega})\}\)$ , which, upon rearranging the sums in (37), can be written as

$$S_{N}(e^{j\omega}) = \frac{1}{N_{s}} E_{x} \left\{ \left( \sum_{i=0}^{N} \sum_{n=0}^{N_{s}-1} w_{1,i}[n]s_{1}[n]e^{-j\omega n} \right) \\ \left( \sum_{l=0}^{N} \sum_{m=0}^{N_{s}-1} w_{2,l}[m]s_{2}[m]e^{j\omega m} \right) \right\}.$$
(38)

From (35), the cross terms, with respect to window indexes, in the above expectation are all zero (i.e., the terms where  $i \neq l$ ). Moreover, any term in (38) that includes an index of i = 0 or l = 0 is also zero. Therefore, (38) can be simplified to

$$S_{N}\left(e^{j\omega}\right) = \frac{1}{N_{s}} \sum_{i=1}^{N} E_{x} \left\{ \left( \sum_{n=0}^{N_{s}-1} w_{1,i}[n]s_{1}[n]e^{-j\omega n} \right) \left( \sum_{m=0}^{N_{s}-1} w_{2,i}[m]s_{2}[m]e^{j\omega m} \right) \right\}.$$
 (39)

Let N[i] denote the sample time of the start of the *i*th joint symbol  $(i \leq N)$ , and  $d[i] = \pm 1/2$  be the dither sequence sample that chooses the symbol type of the *i*th joint symbol. The sequences  $w_{1,i}[n]s_1[n]$  and  $w_{2,i}[m]s_2[m]$  (for i, j > 0) are nonzero for only two samples (i.e., the first element of the head and tail of the symbol), and so

$$\sum_{n=0}^{N_s-1} w_{1,i}[n] s_1[n] e^{-j\omega n} = 2d[i] e^{-j\omega N[i]} \left(1 - e^{-j\omega \hat{H}_1[i]}\right)$$
(40)

and  $N_s - 1$ 

$$\sum_{m=0}^{N_s-1} w_{2,i}[m] s_2[m] e^{j\omega m} = 2d[i] e^{j\omega N[i]} \left(1 - e^{j\omega \hat{H}_2[i]}\right).$$
(41)

Substituting (40) and (41) into (39) gives

$$S_N(e^{j\omega}) = \frac{1}{N_s} \sum_{i=1}^N E_x \left\{ 4d^2[i] \left( 1 - e^{-j\omega \hat{H}_1[i]} \right) \left( 1 - e^{j\omega \hat{H}_2[i]} \right) \right\}.$$
(42)

However,  $4d^2[i] = 1$  for each *i*, which implies that there is no randomness with respect to the dither sequence in the above argument of the expectation operator; thus,

$$S_N(e^{j\omega}) = \frac{1}{N_s} \sum_{i=1}^N \left( 1 - e^{-j\omega \hat{H}_1[i]} \right) \left( 1 - e^{j\omega \hat{H}_2[i]} \right).$$
(43)

Let  $\hat{S}_N(e^{j\omega})$  be the real part of  $S_N(e^{j\omega})$ ; it follows from the linearity of the real-part operator that

$$\hat{S}_N\left(e^{j\omega}\right) = \frac{2}{N_s} \left\{ \sum_{i=1}^N \sin^2\left(\frac{\omega\hat{H}_1[i]}{2}\right) + \sin^2\left(\frac{\omega\hat{H}_2[i]}{2}\right) - \sin^2\left(\frac{\omega\left(\hat{H}_1[i] - \hat{H}_2[i]\right)}{2}\right) \right\}.$$
 (44)

Let  $N_1$  and  $N_2$  be the total number of symbols in  $s_1[n]$  and  $s_2[m]$  up to and including the Nth joint symbol. Because a switching sequence is nonzero  $(\pm 1)$  only twice within a symbol, the time-averaged estimate of the variance of  $s_1[n]$  and  $s_2[n]$  is

$$\bar{\sigma}_1^2 \triangleq \frac{1}{N_s} \sum_{n=0}^{N_1 - 1} s_1^2[n] = \frac{2N_1}{N_s}$$
(45)

and

$$\bar{\sigma}_2^2 \triangleq \frac{1}{N_s} \sum_{n=0}^{N_2 - 1} s_2^2[n] = \frac{2N_2}{N_s}$$
 (46)

respectively. Additionally, after N joint symbols, the fraction of symbols in  $s_1[n]$  and  $s_2[n]$  that are joint is given by

$$\bar{\rho}_1 \triangleq \frac{N}{N_1} \tag{47}$$

and

$$\bar{\rho}_2 \triangleq \frac{N}{N_2} \tag{48}$$

respectively. Thus, (45)-(48) are substituted into (44) to give

$$\hat{S}_{N}\left(e^{j\omega}\right) = \bar{\sigma}_{1}\bar{\sigma}_{2}\sqrt{\bar{\rho}_{1}\bar{\rho}_{2}}\frac{1}{N}\left\{\sum_{i=1}^{N}\sin^{2}\left(\frac{\omega\hat{H}_{1}[i]}{2}\right) + \sin^{2}\left(\frac{\omega\hat{H}_{2}[i]}{2}\right) - \sin^{2}\left(\frac{\omega\left(\hat{H}_{1}[i] - \hat{H}_{2}[i]\right)}{2}\right)\right\}.$$
 (49)

With  $E_N{\cdot}$  defined as the time-averaged expectation operator, (49) becomes

$$\hat{S}_{N}\left(e^{j\omega}\right) = \bar{\sigma}_{1}\bar{\sigma}_{2}\sqrt{\bar{\rho}_{1}\bar{\rho}_{2}}E_{N}\left\{\sin^{2}\left(\frac{\omega\hat{H}_{1}[m]}{2}\right) + \sin^{2}\left(\frac{\omega\hat{H}_{2}[m]}{2}\right) - \sin^{2}\left(\frac{\omega\left(\hat{H}_{1}[m] - \hat{H}_{2}[m]\right)}{2}\right)\right\}.$$
 (50)

Under the ergodicity assumption, the time averages in (50) converge to ensemble averages as  $N \to \infty$ . Therefore, with  $S_{s_1,s_2}(e^{j\omega}) = \lim_{N\to\infty} \hat{S}_N(e^{j\omega})$ , (33) follows from (50).  $\Box$ 

Corollary A1. Cross Spectrum Area: Given an oversampling ratio of O, and  $s_1[n]$  and  $s_2[n]$  employ the same dither sequence,

the signal-band area of the real part of the cross spectrum of  $s_1[n]$  and  $s_2[n]$  is given by

$$A_O = \frac{\sigma_1 \sigma_2 \sqrt{\rho_1 \rho_2}}{2 \cdot O} E \left\{ 1 - \operatorname{sinc} \left( \frac{\hat{H}_1[m]}{O} \right) - \operatorname{sinc} \left( \frac{\hat{H}_2[m]}{O} \right) + \operatorname{sinc} \left( \frac{\hat{H}_1[m] - \hat{H}_2[m]}{O} \right) \right\}.$$
 (51)

*Proof:* Given Theorem A1, the cross-spectrum area is

$$A_O \triangleq \frac{1}{2\pi} \int_{\frac{-\pi}{O}}^{\frac{\pi}{O}} S_{s_1,s_2} \left( e^{j\omega} \right) d\omega.$$
 (52)

Because the argument of the expectation operator in (33) consists of bounded functions, Fubini's theorem [29] implies that the integral and expected value, implied in (52), can be swapped. Thus, (51) results upon evaluating this integral.

The above results are now used to prove Theorem 1.

Proof of Theorem 1: With  $s_1[n] = s_2[n] = s_{k,r}[n]$ ,  $\sigma_1 \sigma_2 = \sigma_{k,r}^2$ , and since every symbol in the same switching sequence starts at the same sample time,  $\rho_1 = \rho_2 = 1$  and  $\hat{H}_1[m] = \hat{H}_2[m] = H_{k,r}[m]$ . Substituting these values into (33) and (51) leads to (15) and (16), respectively.

*Theorem A2. DAC Noise PSD:* Given each switching block in the same layer shares a dither sequence, the DAC noise PSD is given by

$$D(e^{j\omega}) = \sum_{k=1}^{b} \left( \sum_{r=1}^{2^{b-k}} \Delta_{k,r}^2 S_{k,r}(e^{j\omega}) + 2\Delta_{k,r} \left( \sum_{\hat{r}=1}^{r-1} \Delta_{k,\hat{r}} S_{k,r,\hat{r}}(e^{j\omega}) \right) \right)$$
(53)

where  $S_{k,r}(e^{j\omega})$  is the switching sequence PSD for  $s_{k,r}[n]$  as given by (15), and  $S_{k,r,\hat{r}}(e^{j\omega})$  is the real part of the cross spectrum of  $s_{k,r}[n]$  and  $s_{k,\hat{r}}[n]$  as given by (33). Moreover, if a unique dither sequence is used in each switching block, the DAC noise PSD is

$$D\left(e^{j\omega}\right) = \sum_{k=1}^{b} \sum_{r=1}^{2^{b-k}} \Delta_{k,r}^2 S_{k,r}\left(e^{j\omega}\right).$$
(54)

*Proof:* First, suppose that switching blocks in the same layer share the same dither sequence. Because switching sequences in different layers employ independent dither sequences, these switching sequences are uncorrelated and so they have zero cross spectrum. Therefore, only the cross spectrum from switching sequences in the same layer contribute to the DAC noise power.

Let  $u_{k,r}[n]$  be the sequence

$$u_{k,r}[n] = \sum_{\hat{r}=1}^{r} \Delta_{k,\hat{r}} s_{k,\hat{r}}[n].$$
 (55)

To apply mathematical induction, suppose for some  $r_0 = 1, ..., 2^{b-k} - 1$ , that the PSD of  $u_{k,r_0}[n]$  is

$$U_{k,r_{0}}\left(e^{j\omega}\right) = \sum_{r=1}^{r_{0}} \Delta_{k,r}^{2} S_{k,r}\left(e^{j\omega}\right) + 2\Delta_{k,r}\left(\sum_{\hat{r}=1}^{r_{0}-1} \Delta_{k,\hat{r}} S_{k,r,\hat{r}}\left(e^{j\omega}\right)\right).$$
(56)

The PSD of  $u_{k,r_0+1}[n]$  can be written as

$$U_{k,r_0+1}(e^{j\omega}) = U_{k,r_0}(e^{j\omega}) + \Delta_{k,r_0+1}^2 S_{k,r_0+1}(e^{j\omega}) + 2\Delta_{k,r_0+1}C_{k,r_0}(e^{j\omega})$$
(57)

where  $C_{k,r_0}(e^{j\omega})$  is the real part of the cross spectrum of  $u_{k,r_0}[n]$  and  $s_{k,r_0+1}[n]$ , which, given (55), is calculated to be

$$C_{k,r_{0}}\left(e^{j\omega}\right) = \sum_{\hat{r}=1}^{r_{0}} \Delta_{k,\hat{r}} S_{k,\hat{r},r_{0}+1}\left(e^{j\omega}\right).$$
 (58)

Substituting (56) and (58) into (57) gives

$$U_{k,r_{0}+1}\left(e^{j\omega}\right) = \sum_{r=1}^{r_{0}+1} \Delta_{k,r}^{2} S_{k,r}\left(e^{j\omega}\right) + 2\Delta_{k,r}\left(\sum_{\hat{r}=1}^{r_{0}} \Delta_{k,\hat{r}} S_{k,r,\hat{r}}\left(e^{j\omega}\right)\right).$$
(59)

Therefore, it follows from mathematical induction that (56) holds for each  $r_0$ . Since the switching sequences in different layers are uncorrelated, it follows from (7) and (55) that

$$D\left(e^{j\omega}\right) = \sum_{k=1}^{b} U_{k,2^{b-k}}\left(e^{j\omega}\right).$$
(60)

Substituting (56) (with  $r_0 = 2^{b-k}$ ) into (60) gives (53).

When an independent dither sequence is employed by each switching block, all of the switching sequences are uncorrelated, which implies that  $S_{k,r,\hat{r}}(e^{j\omega}) = 0$  for all  $\omega$ , k, and  $\hat{r} \neq r$ . Substituting this into (53) leads to (54).

*Corollary A2. DAC Noise Signal-Band Power:* If an independent dither sequence is shared by all the switching blocks in each layer, the signal-band DAC noise power is

$$D_{O} = \sum_{k=1}^{b} \left( \sum_{r=1}^{2^{b-k}} \Delta_{k,r}^{2} P_{k,r}(O) + 2\Delta_{k,r} \left( \sum_{\hat{r}=1}^{r-1} \Delta_{k,\hat{r}} A_{k,r,\hat{r}}(O) \right) \right)$$
(61)

where  $P_{k,r}(O)$  is the signal-band power of  $s_{k,r}[n]$  (as in (16)) and  $A_{k,r,\hat{r}}(O)$  is the signal-band area of the cross spectrum of  $s_{k,r}[n]$  and  $s_{k,\hat{r}}[n]$  (as in (51)). If a unique dither is used in each switching block, then the signal-band DAC noise power is

$$D_O = \sum_{k=1}^{b} \sum_{r=1}^{2^{b-k}} \Delta_{k,r}^2 P_{k,r}(O).$$
 (62)

*Proof:* The proof follows directly from Corollary A1, Theorem 1, Theorem A2, and the linearity of the integral.  $\Box$ 

Lemma A1: The switching sequence variance is

$$\sigma_{k,r}^2 = \frac{2}{E\{H_{k,r}[m]\} + E\{T_{k,r}[m]\}}.$$
(63)

*Proof:* Let  $M_s$  be the number of samples in the first M symbols  $s_{k,r}[n]$ . Given the ergodicity assumption, it follows that

$$\sigma_{k,r}^2 = \lim_{M \to \infty} \frac{1}{M_s} \sum_{l=0}^{M_s} s_{k,r}^2[l].$$
 (64)

Since  $s_{k,r}^2[n] = 1$  twice within every symbol, (64) can be simplified to

$$\sigma_{k,r}^2 = \lim_{M \to \infty} \frac{2M}{M_s}.$$
(65)

Additionally, the ergodicity assumption implies

$$E\{H_{k,r}[m] + T_{k,r}[m]\} = \lim_{M \to \infty} \frac{1}{M} \sum_{i=1}^{M} (H_{k,r}[i] + T_{k,r}[i]).$$
(66)

However,  $\sum_{i=1}^{M} (H_{k,r}[i] + T_{k,r}[i])$  is the total number of samples comprising the first M symbols, i.e.,  $M_s$ . This implies that

$$E\{H_{k,r}[m] + T_{k,r}[m]\} = \lim_{M \to \infty} \frac{M_s}{M}.$$
(67)  
(5) imply (63).

Lemma A2: Given H and O are positive integers

$$1 - \operatorname{sinc}\left(\frac{H}{O}\right) \le \frac{H+1}{O+1} \tag{68}$$

where equality is obtained if and only if H = O.

*Proof:* This proof is based on the analysis of the following two functions:

$$f(x) \triangleq \frac{1 - \operatorname{sinc}(x)}{x} \tag{69}$$

and

$$f_{\gamma}(x) \triangleq \frac{1 - \operatorname{sinc}(x)}{x + \gamma} \tag{70}$$

where  $\gamma$  is a constant in the interval (0,1) and x > 0. Upon evaluating the derivative of f(x) and setting it to zero (i.e., f'(x) = 0), the First Derivative Theorem [30] indicates that all local maxima of f(x) are less than 2 for  $x \ge 2$ . Since f(2) = 1/2, this implies that  $f(x) \le 1/2$  for all  $x \ge 2$ , and since  $f_{\gamma}(x) < f(x)$ , this also implies

$$f_{\gamma}\left(x\right) < \frac{1}{2} \tag{71}$$

for all  $x \ge 2$ .

Evaluating the derivative of  $f_{\gamma}(x)$  (i.e.,  $f'_{\gamma}(x)$ ) indicates that this function is strictly increasing for  $x \in (0,1]$  and strictly decreasing for  $x \in [3/2,2]$ . This implies that there is at least one local maximum of this function in the interval (1,3/2).

Let  $g(x) \triangleq f'_{\gamma}(x) + \gamma$ . Evaluating the derivative of g(x) indicates that g(x) is a strictly increasing function for  $x \in (1, 3/2)$ . Therefore, the expression  $g(x) = \gamma$ , which is equivalent to  $f'_{\gamma}(x) = 0$ , has at most one solution for  $x \in (1, 3/2)$ . The First Derivative Theorem then implies that there is at most one local maximum of the function  $f_{\gamma}(x)$  in this interval. This and the previous arguments imply that  $f_{\gamma}(x)$  has exactly one local maximum for  $x \in (0, 2]$ , and because  $f_{\gamma}(1) = 1/(1+\gamma) > 1/2$ , this local maximum is the global maximum of  $f_{\gamma}(x)$  for x > 0.

As shown next, this global maximum occurs for values of xin the interval  $(1, 1 + \gamma)$ . Evaluating  $f_{\gamma}(x)$  at the values x = 1and  $x = 1 + \gamma$  provides

$$f_{\gamma}(1+\gamma) = \left(\frac{1+\gamma\left(1+\operatorname{sinc}\left(\gamma\right)\right)}{1+2\gamma}\right) f_{\gamma}(1). \quad (72)$$

Since  $\operatorname{sinc}(\gamma) < 1$  for all  $\gamma \in (0, 1)$ , (72) implies that  $f_{\gamma}(1+\gamma) < f_{\gamma}(1)$ . Therefore,  $f_{\gamma}(x)$ , which is a strictly increasing function for  $0 < x \le 1$ , must start decreasing for some value of  $x \le 1+\gamma$ . This and previous arguments imply that the global maximum of  $f_{\gamma}(x)$  occurs for some value of x between 1 and  $1+\gamma$ .

Fix the value of O > 1, and consider the function  $f_{\frac{1}{O}}\left(\frac{H}{O}\right)$ , where H is a positive integer. Since

$$f_{\frac{1}{O}}\left(\frac{O}{O}\right) = O/\left(O+1\right) > 1/2$$

it follows from the previous arguments that the maximum of this function is achieved at either H = O or H = O + 1. However, substituting  $\gamma = 1/O$  into (72) indicates that

$$f_{\frac{1}{O}}\left(\frac{O}{O}\right) > f_{\frac{1}{O}}\left(\frac{O+1}{O}\right)$$

Therefore, the global maximum of  $f_{\frac{1}{O}}(\frac{H}{O})$  is O/(O+1), which implies (68), and it is achieved only when H = O.  $\Box$ 

Notation and Definitions: Let  $\vec{\nu}_{k,r}$  be a 2<sup>b</sup>-length column vector whose *i*th component is defined to be

$$\nu_{k,r,i} \stackrel{\triangle}{=} \begin{cases} \left(\frac{1}{2}\right)^{k/2}, & \text{if } (r-1)2^k < i \le (r-1)2^k + 2^{k-1} \\ -\left(\frac{1}{2}\right)^{k/2}, & \text{if } (r-1)2^k + 2^{k-1} < i \le r2^k \\ 0, & \text{otherwise.} \end{cases}$$
(73)

Moreover, let  $\vec{\delta}$  be the  $2^b$ -length column vectors whose *i*th component is  $\delta_i$ .

*Lemma A3:* Given  $c_{k,r}$  is a nonnegative constant for each k and r

$$\sum_{k=1}^{b} \sum_{r=1}^{2^{b-k}} c_{k,r} \Delta_{k,r}^2 \le \max_{k,r} \{2^{b-k} c_{k,r}\} \bar{\sigma}_{\delta}^2 \tag{74}$$

and equality is obtained if and only if

$$\vec{\delta} = \sum_{(k,r)\in K} b_{k,r} \vec{\nu}_{k,r} \tag{75}$$

where each  $b_{k,r}$  is a constant, and

$$K \triangleq \{(k,r) \mid 2^{b-k}c_{k,r} = \max_{k,r} \{2^{b-k}c_{k,r}\}\}.$$
 (76)

*Proof:* It follows from the definitions of  $\Delta_{k,r}$ ,  $\delta_i$ , and  $\vec{\nu}_{k,r}$  as given in (8), (25), and (73), respectively, that

$$\Delta_{k,r} = \frac{\vec{\nu}_{k,r}^T \vec{\delta}}{2^{k/2}} = \frac{\vec{\delta}^T \vec{\nu}_{k,r}}{2^{k/2}}.$$
(77)

This and the distributive and associative properties of matrices imply that the left-hand side of (74) can be written as

$$\sum_{k=1}^{b} \sum_{r=1}^{2^{b-k}} c_{k,r} \Delta_{k,r}^2 = \vec{\delta}^T \underbrace{\left(\sum_{k=1}^{b} \sum_{r=1}^{2^{b-k}} \frac{c_{k,r}}{2^k} \vec{\nu}_{k,r} \vec{\nu}_{k,r}^T\right)}_{\triangleq \mathbf{D}} \vec{\delta}.$$
 (78)

Given  $(k_1, r_1) \neq (k_2, r_2)$  (and each are plausible layer numbers and depths) and without loss of generality,  $k_1 \geq k_2$ , it follows from (73) that  $\nu_{k_1,r_1,i}$  is a constant function of *i* for all values of *i* where  $\nu_{k_2,r_2,i} \neq 0$ . This implies that  $\vec{\nu}_{k_1,r_1}^T \vec{\nu}_{k_2,r_2} = 0$ because the set of nonzero values of  $\vec{\nu}_{k_2,r_2}$  consists of an equal number of values that are  $(1/2)^{k_2/2}$  and  $-(1/2)^{k_2/2}$ . Moreover, (73) implies that  $\vec{\nu}_{k,r}^T \vec{\nu}_{k,r} = 1$  for each *k* and *r*. Therefore, the  $2^b - 1$  vectors,  $\vec{\nu}_{k,r}$  for all *k* and *r*, that compose the matrix **D** are orthonormal. This implies that the expression for the matrix **D** in (78) is the spectral decomposition of the matrix (31], and each vector  $\vec{\nu}_{k,r}$  is an eigenvector of this matrix with an associated eigenvalue of  $\lambda_{k,r}$  which is given by

$$\lambda_{k,r} = \frac{c_{k,r}}{2^k}.\tag{79}$$

Since D is a symmetric matrix, the Rayleigh–Ritz theorem [31] implies that the quadratic expression on the right-hand side of (78) is bounded above by  $\lambda_{\max} \vec{\delta}^T \vec{\delta}$ , where  $\lambda_{\max}$  is the maximum eigenvalue of D. This and (79) imply that

$$\sum_{k=1}^{b} \sum_{r=1}^{2^{b-k}} c_{k,r} \Delta_{k,r}^2 \le \max_{k,r} \left\{ \frac{c_{k,r}}{2^k} \right\} \vec{\delta}^T \vec{\delta}$$
(80)

which, given  $2^b \bar{\sigma}_{\delta}^2 = \vec{\delta}^T \vec{\delta}$ , proves (74). Additionally, it follows from the Rayleigh–Ritz theorem that the bound is achieved if and only if  $\vec{\delta}$  is a linear combination of the eigenvectors whose associated eigenvalues are equal to  $\lambda_{\text{max}}$  as given in (75).

*Lemma A4:* The real-part of the signal-band area of the cross-spectrum of the sequences  $\Delta_{k,r_1}s_{k,r_1}[n]$  and  $\Delta_{k,r_2}s_{k,r_2}[n]$  satisfies

$$\Delta_{k,r_1} \Delta_{k,r_2} A_{k,r_1,r_2} (O) \le \frac{\Delta_{k,r_1}^2 P_{k,r_1} (O) + \Delta_{k,r_2}^2 P_{k,r_2} (O)}{2}$$
(81)

where equality is achieved if and only if  $\Delta_{k,r_1} = \Delta_{k,r_2} = 0$  or  $s_{k,r_1}[n] = s_{k,r_2}[n]$  a.s. and  $\Delta_{k,r_1} = \Delta_{k,r_2}$ .

*Proof:* Let  $w[n] = \Delta_{k,r_1} s_{k,r_1}[n] - \Delta_{k,r_2} s_{k,r_2}[n]$ . By computing the PSD of w[n] and integrating it across the range of the signal band, the power of this sequence is found to be

$$P_{w}(O) = \Delta_{k,r_{1}}^{2} P_{k,r_{1}}(O) + \Delta_{k,r_{2}}^{2} P_{k,r_{2}}(O) -2\Delta_{k,r_{1}}\Delta_{k,r_{1}}A_{k,r_{1},r_{2}}(O).$$
(82)

Since  $P_w(O) \ge 0$ , (81) follows from (82).

The bound is trivially achieved if  $\Delta_{k,r_1} = \Delta_{k,r_2} = 0$ ; therefore, assume that this does not hold for the remainder of the proof. If  $\Delta_{k,r_1} s_{k,r_1}[n] = \Delta_{k,r_2} s_{k,r_2}[n]$  a.s., then w[n] = 0a.s. Therefore,  $P_w(O) = 0$  in this case, and, upon substituting this into (82), equality is obtained in (81). Because  $s_{k,r_1}[n]$ and  $s_{k,r_2}[n]$  are both constrained to the range  $\{-1, 0, 1\}$ ,  $\Delta_{k,r_1} s_{k,r_1}[n] = \Delta_{k,r_2} s_{k,r_2}[n]$  a.s. if and only if  $\Delta_{k,r_1} = \pm \Delta_{k,r_2}$ and  $s_{k,r_1}[n] = \pm s_{k,r_2}[n]$  a.s. However, two switching sequences are only correlated when a symbol in each starts at the same sample time and the same dither sequence is used to choose their symbol types and, in such cases, the switching sequences have positive correlation. Therefore,  $s_{k,r_1}[n] \neq -s_{k,r_2}[n]$  a.s., which implies that  $\Delta_{k,r_1} s_{k,r_1}[n] = \Delta_{k,r_2} s_{k,r_2}[n]$  a.s. if and only if  $\Delta_{k,r_1} = \Delta_{k,r_2}$  and  $s_{k,r_1}[n] = s_{k,r_2}[n]$  a.s.

If  $\Delta_{k,r_1} \neq \Delta_{k,r_2}$  and  $s_{k,r_1}[n] = s_{k,r_2}[n]$  a.s., then  $w[n] = (\Delta_{k,r_1} - \Delta_{k,r_2})s_{k,r_1}[n]$  a.s., and  $P_w(O) > 0$ . This and (82) imply equality is not achieved in (81) in this case.

Suppose  $s_{k,r_1}[n] \neq s_{k,r_2}[n]$  a.s. Recall the notation used in Theorem A1 and that  $\hat{H}_1[m]$  represents the head length of the *m*th joint symbol in  $s_{k,r_1}[n]$ . Let  $\tilde{H}_1[m]$  be the head length of the *m*th *nonjoint* symbol in  $s_{k,r_1}[n]$ . By averaging the joint and nonjoint symbols, it follows from (15) that the PSD of  $s_{k,r_1}[n]$ can be written as

$$S_{k,r_1}\left(e^{j\omega}\right) = 2\sigma_1^2 \rho_1 E\left\{\sin^2\left(\frac{\omega \hat{H}_1[m]}{2}\right)\right\} + 2\sigma_1^2\left(1-\rho_1\right) E\left\{\sin^2\left(\frac{\omega \tilde{H}_1[m]}{2}\right)\right\}.$$
 (83)

Furthermore, consider the analogous definition and result for  $s_{k,r_2}[n]$ .

Suppose, for purpose of contradiction, that  $P_w(O) = 0$ . The PSD of w[n] is

$$S_w\left(e^{j\omega}\right) = \Delta_{k,r_1}^2 S_{k,r_1}\left(e^{j\omega}\right) + \Delta_{k,r_2}^2 S_{k,r_2}\left(e^{j\omega}\right) -2\Delta_{k,r_1}\Delta_{k,r_2} S_{k,r_1,r_2}\left(e^{j\omega}\right)$$
(84)

where  $S_{k,r_1,r_2}(e^{j\omega})$  is the real part of the cross spectrum of  $s_{k,r_1}[n]$  and  $s_{k,r_2}[n]$  as given in (33). Since  $S_w(e^{j\omega})$  is continuous, w[n] has no signal-band power if and only if  $S_w(e^{j\omega}) = 0$  for all  $\omega \in (-\pi/O, \pi/O)$ . Therefore, the second derivative of  $S_w(e^{j\omega})$  is zero at  $\omega = 0$ . However, it follows from (33), (83), and Fatou's lemma [29] that

$$\lim_{\omega \to 0} \frac{S_w(e^{j\omega})}{\omega^2} \ge \frac{1}{2} \left( \Delta_{k,r_1}^2 \sigma_1^2 (1-\rho_1) + \Delta_{k,r_2}^2 \sigma_2^2 (1-\rho_2) \right).$$
(85)

Since  $s_{k,r_1}[n] \neq s_{k,r_2}[n]$  a.s., there is a finite probability that symbols in both switching sequences are not joint: i.e.,  $\rho_1 < 1$ and  $\rho_2 < 1$ . This and (85) imply that the second derivative of  $S_w(e^{j\omega})$ , if it exists, is greater than 0 which is a contradiction. Therefore,  $P_w(O) > 0$  in this case, which implies that equality is not obtained in (81).

The above results are now used to prove Theorem 3.

*Proof of Theorem 3:* Consider the case where an independent dither sequence is used only for each layer of the DAC. Substituting the inequality in (81) into (61) indicates

$$D_{O} \leq \sum_{k=1}^{b} \left\{ \sum_{r=1}^{2^{b-k}} \Delta_{k,r}^{2} P_{k,r}(O) + \sum_{r_{1}=1}^{2^{b-k}} \sum_{r_{2}=1}^{r_{1}-1} \left( \Delta_{k,r_{1}}^{2} P_{k,r_{1}}(O) + \Delta_{k,r_{2}}^{2} P_{k,r_{2}}(O) \right) \right\}.$$
 (86)

Simplifying (86) gives

$$D_O \le \sum_{k=1}^{b} \sum_{r=1}^{2^{b-k}} 2^{b-k} \Delta_{k,r}^2 P_{k,r}(O).$$
(87)

Substituting the power bound in (22) into (87) leads to

$$D_O \le \frac{1}{O(O+1)} \sum_{k=1}^{b} \sum_{r=1}^{2^{b-k}} 2^{b-k+1} \Delta_{k,r}^2.$$
(88)

Applying Lemma A3 with  $c_{k,r} \triangleq 2^{b-k+1}$ , the inequality in (74) is substituted into (88) to give

$$D_O \le \frac{1}{O(O+1)} \max_{k,r} \{2 \cdot 4^{b-k}\} \bar{\sigma}_{\delta}^2.$$
 (89)

Since  $\max_{k,r} \{ 2 \cdot 4^{b-k} \} = 4^b/2$ , (27) follows from (89).

Now, consider the case where an independent dither sequence is used in each switching block. Substituting the power bound from (22) into (62) gives

$$D_O \le \frac{1}{O(O+1)} \sum_{k=1}^{b} \sum_{r=1}^{2^{b-k}} 2 \cdot \Delta_{k,r}^2.$$
(90)

Applying Lemma A3 again but with  $c_{k,r} \triangleq 2$ , the inequality in (74) is substituted into (90) to give

$$D_O \le \frac{\bar{\sigma}_{\delta}^2}{O(O+1)} \max_{k,r} \{2^{b-k+1}\}.$$
 (91)

Since  $\max_{k,r} \{2^{b-k+1}\} = 2^b$ , (28) follows from (90) and (91).

For both dithering schemes,  $\max_{k,r} \{2^{b-k}c_{k,r}\}\$  is achieved with k = 1. Thus, Lemma A3 implies that the relative mismatch

error vector  $\delta$  achieves equality in this case if and only if it is a linear combination of the vectors  $\vec{\nu}_{1,r}$  for  $r = 1, \ldots, 2^{b-1}$ . From (73), such a vector is characterized by having  $\delta_{2j} = -\delta_{2j-1}$  for  $j = 1, \ldots, 2^{b-1}$ . With these relative mismatch errors, (8) implies that, for k > 1,  $\Delta_{k,r} = 0$  for each r. In this case, the DAC noise is solely a linear combination of switching sequences in the first layer.

From Theorem 2, the signal-band power of  $s_{k,r}[n]$  is maximized only when  $H_{k,r}[m] = O$  and  $T_{k,r}[m] = 1$  a.s. In order for each switching sequence in layer  $k_0$  to satisfy this condition, each parity sequence in this layer must a.s. be a deterministic function of the DAC input and thus not dependent on a dither sequence. For this to hold,  $s_{k,r}[n] = 0$  a.s. for each  $k > k_0$  and r, and  $s_{k,r}[n]$  is a.s. not a deterministic sequence for each  $k < k_0$  and r. Moreover, since O is assumed to be greater than 1, this condition holds only if  $s_{k_0,r_1}[n] = s_{k_0,r_2}[n]$  a.s. for each  $r_1$  and  $r_2$ .

The inequality given in (28) depends only on the inequalities in Lemma A3 and Theorem 2. Therefore, it follows from the previous arguments that equality is obtained in (28) if and only if  $\delta_{2j} = -\delta_{2j-1}$  for each  $j = 1, \ldots, 2^{b-1}$ , and  $H_{1,r}[m] = O$ and  $T_{1,r}[m] = 1$  a.s. for each r.

The inequality in (27) also depends on that in Lemma A4. As previously discussed, if  $H_{1,r}[m] = O$  and  $T_{1,r}[m] = 1$ a.s. for each r, then  $s_{1,r_1}[n] = s_{1,r_2}[n]$  a.s. for each  $r_1$  and  $r_2$ . Therefore, given this holds, equality is achieved in (81) for every  $r_1 \neq r_2$  if and only if there exists a constant  $\hat{\delta}$  such that  $\Delta_{1,r} = \hat{\delta}$  for each  $r = 1, \ldots, 2^{b-1}$ . Given this condition holds, (8) implies that

$$\delta_{2i} - \delta_{2i-1} = 2\hat{\delta}.\tag{92}$$

If, in addition,  $\delta_{2j} = -\delta_{2j-1}$ , as required to achieve the inequality in (74), then (92) implies that  $\delta_{2j} = -\delta_{2j-1} = \hat{\delta}$  for each *j*. Therefore, the bound in (27) is achieved if and only if this condition holds and  $H_{1,r}[m] = O$  and  $T_{1,r}[m] = 1$  a.s. for each *r*.

#### ACKNOWLEDGMENT

The authors would like to thank the Associate Editor At Large, Gérard Battail, for his extensive review and recommendations that have made this paper much more accessible to the diverse audience of this journal. His efforts exceeded expectations, and we very much appreciated the time and the insight he provided.

#### REFERENCES

- B. H. Leung and S. Sutarja, "Multi-bit sigma-delta A/D converter incorporating a novel class of dynamic element matching techniques," *IEEE Trans. Circuits Syst. II*, vol. 39, pp. 35–51, Jan. 1992.
- [2] M. J. Story, "Digital to Analogue Converter Adapted to Select Input Sources Based on a Preselected Algorithm Once per Cycle of a Sampling Signal," U.S. Patent 5 138 317, Aug. 11, 1992.
- [3] R. T. Baird and T. S. Fiez, "Linearity enhancement of multi-bit ΔΣ A/D and D/A converters using data weighted averaging," *IEEE Trans. Circuits Syst. II*, vol. 42, pp. 753–762, Dec. 1995.
  [4] R. Schreier and B. Zhang, "Noise-shaped multi-bit D/A converter em-
- [4] R. Schreier and B. Zhang, "Noise-shaped multi-bit D/A converter employing unit elements," *Electron. Lett.*, vol. 31, no. 20, pp. 1712–1713, Sept. 1995.
- [5] R. W. Adams and T. W. Kwan, "Data-Directed Scrambler for Multi-Bit Noise Shaping D/A Converters," U.S. Patent 5 404 142, Apr. 4, 1995.

- [6] I. Galton, "Spectral shaping of circuit errors in digital-to-analog converters," *IEEE Trans. Circuits Syst. II*, vol. 44, pp. 808–817, Oct. 1997.
- [7] W. Chou and R. M. Gray, "Dithering and its effects on sigma-delta and multistage sigma-delta modulation," *IEEE Trans. Inform. Theory*, vol. 37, pp. 500–513, May 1991.
- [8] N. He, F. Kuhlmann, and A. Buzo, "Multiloop sigma-delta quantization," *IEEE Trans. Inform. Theory*, vol. 38, pp. 1015–1028, May 1992.
- [9] I. Galton, "Granular quantization noise in a class of delta-sigma modulators," *IEEE Trans. Inform. Theory*, vol. 40, pp. 848–859, May 1994.
- [10] T. W. Kwan, R. W. Adams, and R. Libert, "A stereo multibit sigma delta DAC with asynchronous master-clock interface," *IEEE J. Solid-State Circuits*, vol. 31, pp. 1881–1887, Dec. 1996.
- [11] R. Adams, K. Nguyen, and K. Sweetland, "A 113-dB SNR oversampling DAC with segmented noise-shaped scrambling," *IEEE J. Solid-State Circuits*, vol. 33, pp. 1871–1878, Dec. 1998.
- [12] T. Brooks, D. Robertson, D. Kelly, A. Del Muro, and S. Harston, "A cascaded sigma-delta pipeline A/D converter with 1.25 MHz signal bandwidth and 89 dB SNR," *IEEE J. Solid-State Circuits*, vol. 32, pp. 1896–1906, Dec. 1997.
- [13] A. Yasuda, H. Tanimoto, and T. Iida, "A third-order ΔΣ modulator using second-order noise-shaping dynamic element matching," *IEEE J. Solid-State Circuits*, vol. 33, pp. 1879–1886, Dec. 1998.
- [14] I. Fujimori, L. Longo, A. Hairapetian, K. Seiyama, S. Kosic, J. Cao, and S. Chan, "A 90 dB SNR, 2.5 MHz output-rate ADC using cascaded multibit delta-sigma modulation at 8 × oversampling ratio," *IEEE J. Solid-State Circuits*, vol. 35, pp. 1820–1828, Dec. 2000.
- [15] E. Fogleman, I. Galton, W. Huff, and H. Jensen, "A 3.3 V single-poly CMOS audio ADC delta-sigma modulator with 98 dB peak SINAD and 105-dB peak SFDR," *IEEE J. Solid-State Circuits*, vol. 35, pp. 297–307, Mar. 2000.
- [16] E. Fogleman, J. Welz, and I. Galton, "An audio ADC delta-sigma modulator with 100 dB SINAD and 102 dB DR using a second-order mismatch-shaping DAC," *IEEE J. Solid-State Circuits*, vol. 36, pp. 339–348, Mar. 2001.
- [17] R. K. Henderson and O. Nys, "Dynamic element matching techniques with arbitrary noise shaping function," in *Proc. IEEE Int. Symp. Circuits* and Systems, May 1996, pp. 293–296.

- [18] J. Welz and I. Galton, "Necessary and sufficient conditions for mismatch shaping in a gernal class of multi-bit DACs," *IEEE Trans. Circuits Syst. II*, vol. 49, pp. 748–759, Dec. 2002.
- [19] J. Welz, I. Galton, and E. Fogleman, "Simplified logic for first-order and second-order mismatch-shaping digital-to-analog converters," *IEEE Trans. Circuits Syst. II*, vol. 48, pp. 1014–1028, Nov. 2001.
- [20] J. Welz and I. Galton, "The mismatch-noise PSD from a tree-structured DAC in a second-order delta-sigma modulator with a midscale input," in *Proc. IEEE Int. Conf. Acoustics, Speech, and Signal Processing*, vol. 4, May 7–11, 2001, pp. 2625–2628.
- [21] J. Grilo, I. Galton, K. Wang, and R. Montemayor, "A 12-mW ADC delta-sigma modulator with 80 dB of dynamic range integrated in a single-chip bluetooth transceiver," *IEEE J. Solid-State Circuits*, vol. 37, pp. 271–278, Mar. 2002.
- [22] G. L. Pierobon, "Codes for zero spectral density at zero frequency," *IEEE Trans. Inform. Theory*, vol. IT-30, pp. 435–439, Mar. 1984.
- [23] H. Kobayashi, "A survey of coding schemes for transmission or recording of digital data," *IEEE Trans. Commun.*, vol. COM-19, pp. 1087–1099, Dec. 1971.
- [24] G. Bilardi, R. Padovani, and G. Pierobon, "Spectral analysis of functions of Markov chains with applications," *IEEE Trans. Commun.*, vol. COM-31, no. 7, pp. 853–861, July 1983.
- [25] A. Papoulis, Probability, Random Variables, and Stochastic Processes. New York: McGraw-Hill, 1991.
- [26] W. R. Bennett, "Statistics of regenerative digital transmission," *Bell Syst. Tech. J.*, vol. 37, pp. 1501–1542, Nov. 1958.
- [27] R. Durrett, *Probability: Theory and Examples*. New York: Duxbury, 1996.
- [28] B. H. Marcus and P. H. Siegel, "On codes with spectral nulls at rational submultiples of the symbol frequency," *IEEE Trans. Inform. Theory*, vol. IT-33, pp. 557–568, July 1987.
- [29] G. B. Folland, Real Analysis: Modern Techniques and Their Applications. New York: Wiley, 1999.
- [30] G. Thomas and R. Finney, *Calculus and Analytic Geometry*. Reading, MA: Addison-Wesley, 1988.
- [31] R. Horn and C. Johnson, *Matrix Analysis*. New York: Cambridge Univ. Press, 1985.