# A Time Amplifier Assisted Frequency-to-Digital Converter Based Digital Fractional-*N* PLL

Eslam Helal<sup>®</sup>, Enrique Alvarez-Fontecilla<sup>®</sup>, Amr I. Eissa<sup>®</sup>, and Ian Galton<sup>®</sup>, *Fellow, IEEE* 

 $v_{ref}(t)$  -

Abstract—This article presents a wide input-range delay chain based time amplifier (TA) and its application to a 6.5-GHz digital fractional-N phase-locked loop (PLL). The TA includes a delay-averaging linearity enhancement technique and the PLL is based on an improved dual-mode ring oscillator (DMRO) delta-sigma ( $\Delta\Sigma$ ) frequency-to-digital converter (FDC). The TA mitigates contributions to the PLL's phase noise from DMRO flicker noise, which would otherwise degrade the PLL's in-band phase noise, and from  $\Delta\Sigma$  FDC quantization error, which would otherwise degrade the PLL's phase noise at high bandwidth settings. This paper also presents a delay-free asynchronous DMRO phase sampling scheme, and the first experimental demonstration of a recently-proposed  $\Delta\Sigma$  FDC digital gain calibration technique. The TA-assisted PLL achieves a random jitter of 145 fs<sub>rms</sub>, a total jitter that ranges from 151 to 270 fs<sub>rms</sub> as a result of fractional spurs, and a worst-case fractional spur of -49 dBc without requiring nonlinearity calibration.

Index Terms—Averaging resistors, delta-sigma ( $\Delta\Sigma$ ) modulation, digital phase-locked loop (PLL), dual-mode ring oscillator (DMRO), frequency synthesizer, frequency-to-digital converter (FDC), gain calibration, jitter, phase sampling, time amplifier (TA).

# I. INTRODUCTION

ANY types of phase-locked loops (PLLs) use a phasefrequency detector (PFD) with subsequent circuitry to measure the time differences between corresponding edges of the reference signal and a divided-down version of the PLL output signal. In such PLLs, using a time amplifier (TA) to amplify the edge time differences prior to the PFD and subsequently dividing the measured time differences by the gain of the TA attenuates the noise introduced by the measurement process without otherwise changing the loop dynamics.

Several TAs have been proposed over the last two decades [1]–[6], yet most suffer from significant drawbacks such as narrow input range [1]–[4], gain and input range dependency on technology parameters [1]–[4], high nonlinearity [1]–[5], a tradeoff between gain and input range [1]–[3], and a tradeoff between linear input range and noise [4]. The

Manuscript received October 17, 2020; revised December 8, 2020 and December 22, 2020; accepted December 23, 2020. Date of publication February 2, 2021; date of current version August 26, 2021. This article was approved by Associate Editor Pietro Andreani. This work was supported by the National Science Foundation under Award 1617545. (*Corresponding author: Eslam Helal.*)

The authors are with the Electrical and Computer Engineering Department, University of California at San Diego, San Diego, CA 92092-0407 USA (e-mail: ehelal@ucsd.edu).

Color versions of one or more figures in this article are available at https://doi.org/10.1109/JSSC.2020.3048650.

Digital Object Identifier 10.1109/JSSC.2020.3048650

Digital Loop Second-Order  $\Delta\Sigma$  FDC Controller with QNC α.  $v_{\rm div2}(t)$ ÷2 N (a) u(t) $v_{\rm ref}(t)$ 127 $\hat{e}_q[n]$ PFD DMRO RPC  $v_{\rm TA}(t)$ y[n] $v_{\rm samp}(t)$ *v*[*n*]  $z^{-1}$  $v_{div}(t)$  $\div (N - v[n])$  $v_{\rm div2}(t)$ TA-Assisted DMRO  $\Delta\Sigma$  FDC MMD (b) Gain Calibration  $+\alpha$ LSBs  $p_R[n]$  $\hat{e}_a[n]$ Cycle 127 Counte v[n]& Phas 3 MSBs Decode M  $v_{\text{samp}}(t)$ 27 RPC with Gain Calibration (c)

p[n] Digital

Loop

Filter

d[n]

DCO

 $v_{\rm PLL}(t)$ 

 $\hat{e}_a[n]$ 

v[n]

TA-Assisted

DMRO Based

Fig. 1. (a) High-level block diagram of the PLL, (b) simplified block diagram of the TA-assisted DMRO  $\Delta\Sigma$  FDC, and (c) details of the RPC with gain calibration.

TA presented in [6] avoids most of these issues, but its relatively complicated implementation limits its noise performance which reduces its suitability for high-performance PLLs.

A low-noise inverter based delay chain TA with an analog delay-averaging nonlinearity mitigation technique is presented in this paper. The gain of the TA is nearly constant across a wide input range and is relatively insensitive to process, voltage, and temperature (PVT) variations, as it depends on a ratio of inverter delays. The TA's principle of operation is similar to that of the TA presented in [6], but its implementation is simpler and it achieves better noise performance.

The proposed TA is demonstrated in the context of a 6.5 GHz digital fractional-*N* PLL based on a dual-mode ring oscillator (DMRO) delta-sigma ( $\Delta \Sigma$ ) frequency-to-digital converter (FDC) [7]–[9]. As demonstrated in [7], this type of PLL can achieve good fractional spur performance, but the DMRO's  $1/f^3$  phase noise component degrades the PLL's inband phase noise, and  $\Delta \Sigma$  FDC quantization error limits the PLL's performance at high bandwidth settings. The PLL presented in this paper applies the proposed TA to overcome these issues by attenuating both noise sources by approximately

0018-9200 © 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See https://www.ieee.org/publications/rights/index.html for more information.





Fig. 2. Behavioral model of the TA-assisted DMRO  $\Delta\Sigma$  FDC with the gain calibration technique details omitted.

16 dB. Additionally, it incorporates and is the first experimental demonstration of several  $\Delta \Sigma$  FDC improvements proposed in [9]. These improvements include an all-digital background gain calibration technique that simplifies the DMRO design, and various architecture changes that relax the  $\Delta \Sigma$  FDC's timing constraints. A modified delay-free asynchronous DMRO phase sampling scheme is also incorporated in the PLL to further relax the  $\Delta \Sigma$  FDC's timing constraints.

#### **II. PLL HIGH-LEVEL ARCHITECTURE**

#### A. PLL Overview

A high-level block diagram of the PLL is shown in Fig. 1(a), where  $v_{ref}(t)$  and  $v_{PLL}(t)$  are the output waveforms of the reference oscillator and the PLL, respectively. Ideally,  $v_{PLL}(t)$ is periodic with frequency  $f_{PLL} = 2(N + \alpha) f_{ref}$ , where  $f_{ref}$ is the reference frequency, N is a positive integer, and  $\alpha$  is a fractional frequency offset that ranges from -1/2 to 1/2.

The PLL consists of a TA-assisted DMRO  $\Delta \Sigma$  FDC, a digital loop controller (DLC) with quantization noise cancellation (QNC), a digitally-controlled oscillator (DCO), and a divideby-2 block with output  $v_{\text{div2}}(t)$ . The  $\Delta \Sigma$  FDC generates two  $f_{\text{ref}}$ -rate digital sequences, y[n] and  $-\hat{e}_q[n]$ . Ideally

$$y[n] = -\alpha - e_{\text{PLL}}[n] + e_q[n] - 2e_q[n-1] + e_q[n-2], \quad (1)$$

where  $e_{PLL}[n]$  is a measure of the PLL's average frequency error over the *n*th reference period and  $e_q[n]$  is  $\Delta \Sigma$  FDC quantization error [8], [9]. The sequence  $\hat{e}_q[n]$  is an estimate of  $e_q[n]$ , and it is used to cancel most of the contribution of  $e_q[n]$  prior to the digital loop filter (DLF) [7]–[12].

Fig. 1(a) and (1) imply that the DLF input, p[n], is a measure of the PLL average phase error over the *n*th reference period plus a first-order highpass shaped version of the residual  $\Delta\Sigma$  FDC quantization error,  $e_q[n] - \hat{e}_q[n]$ . The p[n] sequence is lowpass filtered by the DLF, the output of which controls the DCO.

Fig. 1(b) shows a simplified block diagram of the TA-assisted DMRO  $\Delta \Sigma$  FDC. It consists of a PFD, a DMRO, a digital ring phase calculator (RPC), a multi-modulus divider (MMD), and a TA. The signal  $v_{samp}(t)$ , which is an inverted version of  $v_{ref}(t)$ , is used within the RPC to sample the DMRO phase each reference period. The signal processing details of the RPC including the gain calibration technique are shown in Fig. 1(c) [9].

# B. TA-Assisted DMRO $\Delta \Sigma$ FDC Behavior

An analysis similar to that presented in [8] but modified to include the TA and the improvements presented in [9] yields the  $\Delta \Sigma$  FDC behavioral model shown in Fig. 2. In this model,  $J_{\text{TA}}[n]$  is the TA's output jitter during the *n*th reference period,  $\theta_{\text{PLL}}(t)$ ,  $\theta_{\text{ref}}(t)$ , and  $\theta_{\text{DMRO}}(t)$  are the respective phase errors in cycles of  $v_{\text{PLL}}(t)$ ,  $v_{\text{ref}}(t)$ , and the DMRO,  $\tau_n$ ,  $t_n$ ,  $\rho_n$ , and  $\gamma_n$ , for n = 0, 1, 2, ..., are the respective times of the *n*th rising edges of  $v_{\text{div}}(t)$ ,  $v_{\text{ref}}(t)$ ,  $v_{\text{TA}}(t)$ , and  $v_{\text{samp}}(t)$ ,  $T_{\text{ref}} = 1/f_{\text{ref}}$ , and  $T_{\text{PLL}} = 1/f_{\text{PLL}}$ .<sup>1</sup> The behavioral model in Fig. 2 does not include error sources corresponding to the PFD, the MMDD, or the divide-by-2 block. Simulations performed by the authors indicate that these blocks do not significantly affect the PLL's phase noise, so they are omitted in the figure for simplicity.

The MMD is identical to those in analog PLLs, so, as illustrated in Fig. 2,  $\tau_n$  is an accumulated version of  $2T_{PLL}(N - v[n-1])$  plus noise, where  $2T_{PLL}$  is the divide-by-2 block's output period and N - v[n-1] is the MMD modulus.

As explained in Section III-B, the TA is implemented as a chain of  $N_{\text{TA}}$  nominally identical delay cells. The propagation delay of each delay cell is  $\tau_{\text{fast}}$  when  $v_{\text{ref}}(t)$  is low and  $\tau_{\text{slow}}$  otherwise, where  $\tau_{\text{fast}} < \tau_{\text{slow}}$ . As also explained in Section III-B, the TA delays the rising edges of  $v_{\text{div}}(t)$  such that the pulse-width of the PFD output, u(t), during the *n*th reference period, i.e.,  $u_n = \rho_n - t_n$ , is given by

$$u_n = -A_{\mathrm{TA}}(t_n - \tau_n) + N_{\mathrm{TA}}\tau_{\mathrm{slow}} + J_{\mathrm{TA}}[n], \qquad (2)$$

where

$$A_{\rm TA} = \tau_{\rm slow} / \tau_{\rm fast} \tag{3}$$

is the TA gain and  $N_{\text{TA}}\tau_{\text{slow}}$  is a constant offset term introduced as a byproduct of the TA's operation. Thus, the combined behavior of the TA and the PFD is equivalent to that of an inverting amplifier with input  $t_n - \tau_n$  and additive noise and offset terms.

The DMRO is a ring of  $N_R$  nominally identical delay cells. Ideally, its frequency is  $f_{\text{high}}$  when u(t) is high and  $f_{\text{low}}$  otherwise, where  $f_{\text{high}} > f_{\text{low}}$ . As explained in [8] and

<sup>&</sup>lt;sup>1</sup>By definition,  $\theta_{ref}(t)$  is the phase error at time t of  $v_{ref}(t)$  in units of cycles of  $v_{ref}(t)$ . Accordingly,  $T_{ref}\theta_{ref}(t)$  has units of seconds and it represents the reference oscillator's absolute jitter. Similarly,  $\theta_{PLL}(t)$  represents the phase error at time t of  $v_{PLL}(t)$  in units of cycles of  $v_{PLL}(t)$ , so  $T_{PLL}\theta_{PLL}(t)$  has units of seconds and it represents the PLL output signal's absolute jitter.

illustrated in Fig. 2, the behavior of the DMRO is that of an accumulator with gain

$$A_{\rm DMRO} = f_{\rm high} - f_{\rm low} \tag{4}$$

followed by an additive noise source, an additive  $nf_{low}T_{ref}$  term, and a quantizer,  $Q_r$ , with quantization step-size  $\Delta_r = (2N_R)^{-1}$ .

As explained in [9], the RPC extracts the information encoded in the sampled and quantized DMRO phase and computes a fixed-point measure of  $-\alpha - e_{PLL}[n]$  each reference period. This measure is quantized to the nearest integer to compute y[n], and the resulting quantization error,  $\hat{e}_q[n]$ , is used within the DLC to perform QNC. This *coarse* quantization operation is represented by a unity step-size quantizer,  $Q_c$ , in Fig. 2.

An analysis similar to that presented in [9] shows that the DMRO locks to an average frequency of  $Mf_{ref}$ , and the average u(t) pulsewidth,  $T_{\overline{u}}$ , is

$$T_{\bar{u}} = A_{\text{DMRO}}^{-1} (M - T_{\text{ref}} f_{\text{low}}).$$
<sup>(5)</sup>

The parameter *M* is chosen so that the falling edges of u(t) occur between rising edges of  $v_{ref}(t)$  and  $v_{samp}(t)$ , i.e.,  $t_n < \rho_n < \gamma_n$  for all *n*. This with the TA operation described in Section III-B causes the rising edges of  $v_{div}(t)$  to precede the rising edges of  $v_{ref}(t)$ , i.e.,  $\tau_n < t_n$  for all *n*, so that

$$\gamma_{n-1} < \tau_n < t_n < \rho_n < \gamma_n \tag{6}$$

when the PLL is locked.

A simplified version of the TA-assisted  $\Delta \Sigma$  FDC behavioral model that is valid for constant  $g_n$  is shown within the dashed contour in Fig. 2, wherein all noise components are input-referred and lumped into  $e_{PLL}[n]$ , the offset components are omitted, and the quantizers  $Q_r$  and  $Q_c$  are replaced by their respective additive error sequences,  $e_{qr}[n]$  and  $\hat{e}_q[n]$ . The model implies that the behavior of the  $\Delta \Sigma$  FDC when

$$g_n = \frac{1}{2T_{\rm PLL}A_{\rm TA}A_{\rm DMRO}}\tag{7}$$

is identical to that of a second-order  $\Delta \Sigma$  modulator, the output of which is given by (1) with

$$e_q[n] = g_n e_{qr}[n] + \hat{e}_q[n].$$
 (8)

As explained in [9], when (7) is not satisfied,  $\hat{e}_q[n]$  is imperfectly canceled by QNC so it leaks into the DLF input, thereby degrading the PLL's phase noise. The gain calibration technique shown in Fig. 1(c) causes  $g_n$  to converge to the right side of (7), which effectively circumvents this problem [9].

It follows from Fig. 2 and (7) that the power contribution of the DMRO's phase noise to y[n], and, hence, to the PLL's phase noise, is proportional to both  $A_{TA}^{-2}$  and  $A_{DMRO}^{-2}$ . The original DMRO  $\Delta\Sigma$  FDC PLL presented in [7] does not incorporate a TA, so it corresponds to the case of  $A_{TA} = 1$ in (7), and its in-band phase noise is dominated by the DMRO's  $1/f^3$  phase noise component. In the absence of a TA, modifying the DMRO to increase  $A_{DMRO}$  and/or reduce the DMRO's  $1/f^3$  phase noise component are the only options that would have mitigated this problem.

Unfortunately, these options are not attractive. In principle, increasing the widths of the transistors that make up the DMRO's delay cells increases  $A_{\text{DMRO}}$  via (4) and decreases the DMRO's  $1/f^3$  phase noise component by reducing transistor flicker noise, but in practice A<sub>DMRO</sub> increases only up to a point beyond which parasitic capacitances and supply resistance cause  $A_{\text{DMRO}}$  to decrease with further transistor width increases. After this point, ADMRO can only be increased further by reducing the number of DMRO delay cells,  $N_R$ . Given that  $e_{qr}[n]$  is proportional to  $\Delta_r = (2N_R)^{-1}$ , this would reduce the effectiveness of QNC, which would require the PLL bandwidth to be reduced to compensate for the increase in quantization noise power. Interestingly, increasing the number of DMRO delay cells does provide a modest net benefit. For example, doubling  $N_R$  reduces  $A_{\text{DMRO}}^2$  by 6 dB, but as shown in [13] it decreases the power of the DMRO's  $1/f^3$ phase noise component by 9 dB. Hence, each doubling of  $N_R$ reduces the power contribution of the DMRO's  $1/f^3$  phase noise component to the PLL's phase by 3 dB. Unfortunately, achieving large phase noise reductions in this manner typically requires impractically large numbers of DMRO delay cells.

These tradeoffs are avoided in this work because the TA provides amplification prior to the DMRO. As described above, the DMRO's contribution to the PLL's phase noise is proportional to  $A_{TA}^{-2}$ , so each doubling of  $A_{TA}$  reduces the power of the DMRO's contribution to the PLL's phase noise by 6 dB.

Both the TA and the DMRO are made up of dual-delay inverter based delay cells, but the TA is an open-loop chain and the DMRO is a ring, so transistor flicker noise gives rise to 1/f noise in  $J_{TA}[n]$  and  $1/f^3$  noise in  $\theta_{DMRO}(\gamma_n)$ . Nevertheless, as implied by Fig. 2, the contributions of  $J_{TA}[n]$  and  $\theta_{DMRO}(\gamma_n)$ to y[n] are first-order and second-order highpass shaped, respectively, so flicker noise injected by each TA transistor has a similar contribution to the PLL's phase noise as that injected by each DMRO transistor.

Yet, it is not the case that using the TA simply transfers the problem of reducing the effect of flicker noise from the DMRO to the TA. As implied by (3),  $A_{TA}$  depends on a ratio of inverter delays, so the TA's flicker noise can be reduced by increasing transistor widths without significantly decreasing  $A_{TA}$  or incurring other side effects similar to those mentioned above that come with reducing the DMRO's  $1/f^3$ phase noise component. Furthermore, the TA is only active for a fraction of each reference period, whereas the DMRO operates continuously. As the power of the noise introduced by a chain of inverters grows at least proportionally to the number of inverters that transition as explained in [13], it follows that the TA noise contribution can be made small compared to that of the DMRO.

For instance, in the implemented PLL, 100 TA delay cells transition each reference period, whereas 660 DMRO delay cells transition each reference period on average. Moreover, in contrast to the TA, each delay cell within the DMRO transitions four to six times each reference period. Given that flicker noise changes slowly relative to  $T_{\rm ref}$ , having an edge propagating four to six times through the same delay cell effectively increases the power of the DMRO's  $1/f^3$  phase



Fig. 3. Block diagram of the PLL showing implementation details and the four different power domains in dashed boxes.



Fig. 4. High-level block diagram of the PNR digital block and clocking scheme.

noise component by approximately 3-5 dB compared to the case where the edge propagates through four to six different delay cells. These features made it possible for the TA to suppress the DMRO's contribution to the PLL's phase noise without the TA's noise being a limitation.

#### **III. IMPLEMENTATION DETAILS**

The implemented PLL is shown in Fig. 3. It has four power supply domains, which correspond to the dashed boxes in Fig. 3. The place-and-route (PNR) digital block is clocked at a rate of  $f_{PLL}/8$  by  $v_{clk}(t)$  and contains the DLC, the DCO control logic, the  $\Delta\Sigma$  FDC's  $z^{-1}$  register, and all RPC components except the cycle counter and phase-sampling flip-flops.

As shown in Fig. 4, the PNR digital block comprises three sub-blocks, FDC digital, DLC, and DCO digital, that are clocked sequentially by gated versions of  $v_{clk}(t)$ . The signal  $v_{rdy}(t)$  is timed such that it goes high once each reference period when the DMRO phase information is ready to be processed by the PNR digital block. The  $clk_{FDC}$ ,  $clk_{DLC}$ , and  $clk_{DCO}$  clock signals are generated by the flip-flop chain driven by  $v_{rdy}(t)$ , and the numbers of flip-flops between adjacent clock signals are such that enough time is allocated for each digital sub-block to meet digital timing constraints across PVT variations for an input clock frequency of 1 GHz.

The details of the sub-blocks within the PNR digital block are similar to those presented in [7]. Most of the differences are in the  $\Delta\Sigma$  FDC's digital sub-block to incorporate the improvements proposed in [9] which include the gain calibration technique shown in Fig. 1(c). As explained in [9], the  $f_{ref}$ rate multiplier prior to the RPC's accumulator in Fig. 1(c) represents most of the gain calibration technique's added complexity. Its inputs have respective bit-widths of 12 and 14 bits, and its output has a bit-width of 25 bits. The RPC's accumulator would have required 24 bits in the absence of the multiplier, so the inclusion of the gain calibration technique negligibly increases the power consumption and circuit area of the RPC accumulator and subsequent digital sub-blocks. Furthermore, the relaxed timing of the implemented  $\Delta\Sigma$  FDC architecture relative to that presented in [7] causes the power consumption and circuit area of the multiplier to be negligible relative to those of the overall digital block.

# A. Timing

### B. TA

Fig. 6(a) shows a conceptual block diagram of the proposed TA. It consists of  $N_{\text{TA}}$  nominally identical inverter based delay cells, where  $N_{\text{TA}}$  is an even number. The delay of each delay cell,  $\tau_{\text{delay}}$ , takes on one of two values:  $\tau_{\text{fast}}$  when  $v_{\text{ref}}(t)$  is low and  $\tau_{\text{slow}}$  when  $v_{\text{ref}}(t)$  is high.

It follows from (6) that during the *n*th reference period, the time,  $t_n$ , of the rising edge of  $v_{ref}(t)$  occurs after the time,  $\tau_n$ , of the corresponding rising edge of  $v_{div}(t)$ , but before the time,  $\rho_n$ , at which the rising edge of  $v_{div}(t)$  finishes propagating through the TA. Therefore, at time  $\tau_n$ , when the rising edge of  $v_{div}(t)$  starts propagating through the TA, the



Fig. 5. PLL timing diagram.

delay cells have a delay of  $\tau_{\text{fast}}$ . When  $v_{\text{ref}}(t)$  goes high at time  $t_n$ , the rising edge of  $v_{\text{div}}(t)$  has already propagated through  $\lfloor (t_n - \tau_n)/\tau_{\text{fast}} \rfloor$  delay cells and a fraction, given by  $(t_n - \tau_n)/\tau_{\text{fast}} - \lfloor (t_n - \tau_n)/\tau_{\text{fast}} \rfloor$ , of a delay cell. Thus, at time  $t_n$ , the rising edge of  $v_{\text{div}}(t)$  has propagated through an equivalent of  $(t_n - \tau_n)/\tau_{\text{fast}}$  delay cells, including both integer and fractional parts. At this time, the TA's delay cells are switched to have a delay of  $\tau_{\text{slow}}$ , so the remaining TA delay cells through which the edge must propagate contribute a combined delay of  $(N_{\text{TA}} - (t_n - \tau_n)/\tau_{\text{fast}})\tau_{\text{slow}}$ . Consequently, the time,  $\rho_n$ , at which  $v_{\text{TA}}(t)$  goes high is given by

$$\rho_n = \tau_n + \left(\frac{t_n - \tau_n}{\tau_{\text{fast}}}\right) \tau_{\text{fast}} + \left(N_{\text{TA}} - \frac{t_n - \tau_n}{\tau_{\text{fast}}}\right) \tau_{\text{slow}}.$$
 (9)

This implies that the pulse-width of u(t) during the *n*th reference period,  $u_n = \rho_n - t_n$ , is given by (2) with  $A_{\text{TA}}$  given by (3), where the jitter term,  $J_{\text{TA}}[n]$ , represents the combined effect of all transistor noise sources within the TA.

It follows from the explanation above that for the TA to provide time-difference amplification it is necessary to ensure:

$$0 < t_n - \tau_n < N_{\rm TA} \tau_{\rm fast}. \tag{10}$$

Otherwise, the TA would only introduce a fixed delay between  $v_{\text{div}}(t)$  and  $v_{\text{TA}}(t)$ . Fig. 5 implies that the time at which the MMD loads its inputs also imposes a constraint on the maximum value of  $t_n - \tau_n$ . Specifically, the MMD must load its inputs at the time of the rising edge of  $clk_{\text{FDC}}$  at the earliest, which can occur up to  $37T_{\text{PLL}}$  after the falling edge of  $v_{\text{ref}}(t)$ . Therefore,  $t_n - \tau_n$  must satisfy

$$t_n - \tau_n < 1/2T_{\text{ref}} - 7T_{\text{PLL}} \tag{11}$$

in addition to (10). Moreover, for the  $\Delta \Sigma$  FDC to work properly, u(t) must go low before the DMRO phase is sampled at time  $\gamma_n$ , which requires

$$0 < u_n < 1/2T_{\rm ref} + 10T_{\rm PLL}.$$
 (12)

Equations (10)-(12) impose design constraints on the TA parameters  $N_{\text{TA}}$ ,  $\tau_{\text{slow}}$ ,  $\tau_{\text{fast}}$ , and  $A_{\text{TA}}$ .

As shown in Fig. 6(b), each of the TA's dual-delay inverters consists of a standard inverter in parallel with a larger tri-state



Fig. 6. (a) Dual-delay inverter chain based TA concept, (b) details of TA unit delay cell, and (c) illustration of  $\tau_{\text{delay}}$  versus  $t_n$  for low-to-high and high-to low input transition (not to scale for illustration purposes).

inverter. When  $v_{ref}(t)$  goes high, the tri-state inverter is disabled by disconnecting its ground and power supply terminals from the supply rails, thereby increasing  $\tau_{delay}$  from  $\tau_{fast}$  to  $\tau_{slow}$ .

Ideally,  $\tau_{\text{delay}}$  changes instantaneously from  $\tau_{\text{fast}}$  to  $\tau_{\text{slow}}$ when  $v_{\text{ref}}(t)$  goes high, in which case the TA performs linear amplification. Unfortunately, the  $\tau_{\text{fast}}$ -to- $\tau_{\text{slow}}$  transitions are



Fig. 7. Proposed TA core including the nonlinearity mitigation technique, (b) implemented TA architecture with PS mode, (c) TA unit delay cell circuit details, and (d) histogram of the highest fractional spur power that results from the (simulated) TA's unit delay cells' random mismatches.

non-instantaneous in practice, which causes TA nonlinearity. Moreover, as illustrated in Fig. 6(c), this transition also depends on whether the cell's input,  $d_{n-1}(t)$ , goes from low to high or vice versa.

The TA topology shown in Fig. 7(a) is proposed to reduce such nonlinearity. It consists of two nominally identical delay chains in parallel, where the input of one delay chain is an inverted version of that of the other delay chain, both delay chains are controlled by  $v_{ref}(t)$ , and each pair of parallel delay cells are cross-connected with averaging resistors. As shown in Fig. 7(a) for the top and bottom delay chains in isolation, the odd-indexed and even-indexed delay cells have inputs that transition in opposite directions, so they have different  $\tau_{\text{fast}}$ to- $\tau_{slow}$  transitions. This causes a quasi-periodic artifact in the input-output characteristics of the delay chains. Driving the bottom delay chain by an inverted version of  $v_{div}(t)$  causes its input-output characteristic to be shifted with respect to that of the top delay chain such that, when averaged via the crosscoupled resistor network, the nonlinearity of the cross-coupled delay chains is considerably smaller than that of either delay chain in isolation. Behavioral simulations of the PLL in which the TA's nonlinear behavior is considered and all other spurgeneration mechanisms are neglected suggest that the power of the PLL's worst-case fractional spur decreases by 7 dB when the proposed nonlinearity mitigation technique is used.

In addition to having improved linearity, the proposed TA topology's pseudo-differential nature can be exploited to implement a TA power-saving (PS) mode. Without the PS mode, the falling edge of  $v_{\text{div}}(t)$  propagates through the TA each reference period. This resets the delay cells' states for the next rising edge of  $v_{\text{div}}(t)$ , but the power consumed by the resulting delay cell transitions represents a significant portion of the TA's total power consumption. The idea behind the PS mode is to swap the differential inputs and swap the differential outputs of the TA each reference period to obviate the need to reset the delay cells, so the falling edge of  $v_{\text{div}}(t)$  can be prevented from propagating through the TA to save power.

The implemented TA, which includes the nonlinearity mitigation technique and PS mode option as described above, is shown in Fig. 7(b). It comprises the TA core shown in Fig. 7(a) as well as input and output swapping circuitry used when the PS mode is enabled. The transistor-level details of the TA's delay cells are shown in Fig. 7(c). The TA core was designed to maximize the value of  $A_{TA}$  while satisfying the constraints in (10)-(12). Specifically,  $N_{TA} = 100$ ,  $\tau_{fast} = 10$  ps,  $\tau_{slow} = 70$  ps, and  $A_{TA} = 7$ . Simulation results predict that the TA's gain varies by  $\pm 7\%$  across process corners,  $\pm 10\%$  across process corners and temperature variations (0 °C to 85 °C), and  $\pm 14\%$  across process corners, temperature variations, and supply voltage variations ( $\pm 10\%$ ).

The PS mode is enabled and disabled via the PS<sub>en</sub> signal. When enabled, the  $\phi_1(t)$  and  $\phi_2(t)$  signals are used to implement the input and output swapping operations. The signal  $\phi_1(t)$ , which is derived from  $v_{\text{div}}(t)$ , is used to swap the inputs and the outputs of the TA core each reference



Fig. 8. DMRO and delay-free asynchronous phase sampling scheme details.

period, whereas  $\phi_2(t)$  is used to control the input and output latches. As illustrated in the timing diagram shown in Fig. 7(b), these latches prevent the falling edges of  $v_{\text{div}}(t)$  from propagating through the TA core, and also prevent the output swapping circuitry from disturbing  $v_{\text{TA}}(t)$  while the swapping occurs.

The TA was laid out such that systematic mismatch among its unit cells is negligible, and the unit cells are sized such that the power of the PLL's worst-case fractional spur caused by random mismatches among the TA's delay cells is approximately -50 dBc. This was determined by performing a Monte Carlo simulation in Cadence to obtain 90 different TA input-output characteristics, and the results were imported into a bit-exact, event-driven, custom behavioral PLL simulator. Fig. 7(d) shows a histogram of the simulated PLL's worst-case fractional spur power. As shown in Fig. 7(d), the worst-case fractional spur power's expected value is -51.7 dBc, and its standard deviation is 2.5 dBc.

As mentioned in Section I, the proposed TA achieves better noise performance than a comparably configured TA of the type presented in [6]. One reason for this difference is that the TA in [6] incorporates two ring oscillators that both contribute noise to the output whereas the proposed TA incorporates a single delay-chain that contributes noise to the output. Another reason is that the TA presented in [6] requires NAND gate based delay cells instead of inverter based delay cells which each introduce more phase noise than comparable inverter based delay elements.

#### C. DMRO and Phase Sampling Scheme

The DMRO, which is shown in Fig. 8, consists of  $N_R = 127$  inverter delay cells and has  $A_{\text{DMRO}} = 670$  MHz ( $f_{\text{high}} = 730$  MHz and  $f_{\text{low}} = 60$  MHz). Each DMRO delay cell contains a dual-delay inverter that is similar to that used in the TA. It includes a standard ×1 inverter in parallel with a ×16 tri-state inverter, and the tri-state inverter's power and ground lines are connected to or disconnected from the supply rails when u(t) is high or low, respectively. This modulates each delay cell's propagation delay such that the DMRO frequency is  $f_{\text{high}}$  when u(t) is high and  $f_{\text{low}}$  when u(t) is low. In both cases, the DMRO outputs swing from rail to rail, which allow the DMRO outputs to drive standard digital logic

without the need for level-shifting. The  $\times 2$  inverter shown within the dashed box in Fig. 8 is used to buffer the delay cell's input to reduce the disturbance to the DMRO when its phase is sampled.

As explained in Section II-B, the TA causes the PLL phase noise contributed by the DMRO to be attenuated in power by a factor of  $A_{TA}^2$ . Additionally, the DMRO's  $1/f^3$  phase noise component is further mitigated by using a large number of stages [13]. This comes at the expense of higher digital complexity and higher power consumption, primarily due to the charging and discharging of the gates controlled by u(t).

To prevent the DMRO from running with multiple stages transitioning simultaneously, even for a brief period of time, the first delay cell includes a switch between the ground terminal of the  $\times 1$  inverter and the ground rail. At startup, both u(t) and the enable signal are set low. This opens the ring so that any transition propagating through it eventually reaches the first stage and stops propagating. The switch is subsequently closed after which the DMRO operates normally.

The DMRO phase sampling scheme is shown in Fig. 8. As explained below, it addresses the issue that the sampling clock,  $v_{samp}(t)$ , and the DMRO are asynchronous yet avoids the delay incurred by the DMRO sampling scheme in [7]. It consists of a cycle counter followed by sampling flip-flops and a phase decoder. The principle behind the sampling of the cycle counter's outputs is based on that of the asynchronous sampling schemes presented in [16] and [17]. To the knowledge of the authors, the proposed phase decoder implementation described below is introduced for the first time in this work.

The cycle counter consists of two 4-bit counters that are clocked, respectively, by the rising and falling edges of the DMRO delay cell with output  $d_1(t)$ . On each rising edge of the  $f_{\text{ref}}$ -rate signal  $v_{\text{samp}}(t)$ , the counter outputs  $c_{\text{pos}}(t)$  and  $c_{\text{neg}}(t)$  are sampled to generate  $c_{\text{pos}}[n]$  and  $c_{\text{neg}}[n]$ , and the DMRO outputs  $d_1(t), d_2(t), \ldots, d_{127}(t)$  are sampled to generate  $d_1[n], d_2[n], \ldots, d_{127}[n]$ . The phase decoder consists of a lookup table (LUT) that quantizes the sampled DMRO outputs to a 10-bit sequence,  $t_R[n]$ , which represents the fractional part of the sampled DMRO phase, and logic that computes  $c_R[n]$ , which represents the integer part of the sampled DMRO phase. The number of bits of  $t_R[n]$  was chosen to ensure that the



Fig. 9. MMD block diagram with example timing diagram.

contribution to the PLL's phase noise from the error introduced by the LUT's quantization operation is negligible compared to those of the other error sources.

The top and bottom counters in the cycle counter are clocked when  $t_R[n] \cong 0$  and  $t_R[n] \cong 126\Delta_r$ , respectively, where  $\Delta_r = 1/254$ . Hence,  $t_R[n]$  can be used to determine which counter output was not changing when the sampling event occurred. As shown in Fig. 8, whenever  $t_R[n]$  is between  $63\Delta_r$  and  $189\Delta_r$ ,  $c_R[n]$  is set to  $c_{\text{pos}}[n]$ . Ideally,  $c_R[n]$  should be set to  $c_{\text{neg}}[n]$  when  $t_R[n]$  is between  $190\Delta_r$  and  $253\Delta_r$ , and to  $c_{\text{neg}}[n] + 1$  when  $t_R[n]$  is between 0 and  $62\Delta_r$ , so as to account for the bottom counter being clocked half a DMRO cycle after the top counter is clocked. Yet to work correctly this would require  $c_{\text{pos}}(0) = c_{\text{neg}}(0)$  and the initial DMRO fractional phase to be such that the top counter is clocked before the bottom counter after startup, which are hard to ensure in practice.

These requirements are avoided via the  $c_{\text{corr}}[n]$  correction logic shown in Fig. 8. As both sampled counter outputs are reliable when  $t_R[n]$  is around  $63\Delta_r$  and  $190\Delta_r$ , the  $c_{\text{corr}}[n]$ logic block in Fig. 8 computes

$$c_{\rm corr}[n] = \begin{cases} c_{\rm pos}[n] - c_{\rm neg}[n] - 1, & \text{if } t_R[n] \in [53\Delta_r, 73\Delta_r], \\ c_{\rm pos}[n] - c_{\rm neg}[n], & \text{if } t_R[n] \in [180\Delta_r, 200\Delta_r], \\ c_{\rm corr}[n-1], & \text{otherwise}, \end{cases}$$
(13)

and  $c_R[n]$  is set to  $c_{neg}[n] + c_{corr}[n]$  when  $t_R[n]$  is between  $190\Delta_r$  and  $253\Delta_r$ , to  $c_{neg}[n] + c_{corr}[n] + 1$  when  $t_R[n]$  is between 0 and  $62\Delta_r$ , and to  $c_{pos}[n]$  otherwise.

### D. MMD

As shown in Fig. 9, the MMD consists of a finitestate machine (FSM), a 4/5 prescaler, an edge-select flipflop, and a resynchronization flip-flop. As explained below, the MMD causes the rising edges of  $v_{\text{div}}(t)$  during the *n*th and (n + 1)th reference periods to be separated by N - v[n]periods of  $v_{\text{div}2}(t)$ .



Fig. 10. Die photograph.

TABLE I Area and Power Breakdown of the IC

| Block                      | Area (mm <sup>2</sup> ) | Power (mW)            |
|----------------------------|-------------------------|-----------------------|
| PNR Digital                | 0.0242                  | 4.76                  |
| Reference Buffers          | 0.00158                 | 0.165                 |
| $\Delta\Sigma$ FDC         | 0.00713                 | 9.44/8 <sup>(1)</sup> |
| DCO                        | 0.137                   | 8.75                  |
| Total Area                 | 0.6321                  | $22.15/21.7^{(1)}$    |
| Active Area <sup>(2)</sup> | 0.1683                  | 25.15/21.7            |

<sup>&</sup>lt;sup>1</sup> Without and with TA PS mode enabled, respectively.

<sup>2</sup> Without decoupling capacitors.

When the FSM's  $p_{sel}(t)$  output bit is low, the prescaler divides by 4. Otherwise, it divides by 5. At the beginning of each MMD cycle, the FSM sets  $p_{sel}(t)$  low for five periods of  $v_{pres}(t)$ , so the first five periods of  $v_{pres}(t)$  each have a duration of four  $v_{div2}(t)$  periods. Then, the FSM sets  $p_{sel}(t)$ so that mod<sub>4</sub>[*n*] counts to 4 followed by mod<sub>5</sub>[*n*] counts to 5 occur, where

$$mod_{5}[n] = N - v[n] - 20 - 4\lfloor (N - v[n] - 20)/4 \rfloor \text{ and} mod_{4}[n] = \lfloor (N - v[n] - 20)/4 \rfloor - mod_{5}[n],$$
(14)

after which N - v[n] periods of  $v_{div2}(t)$  will have occurred.

As illustrated in the timing diagram shown in Fig. 9 for the example case of  $\text{mod}_5[n] = 1$ , the FSM's  $p_{\text{pass}}(t)$  output goes high at the start of the last full  $v_{\text{pres}}(t)$  period prior to the next rising edge of  $v_{\text{div}}(t)$ , which causes the edge-select flip-flop's output to go high on the next rising edge of  $v_{\text{pres}}(t)$ . The resynchronization flip-flop samples the edge-select flip-flop output on the next rising edge of  $v_{\text{div}2}(t)$  to prevent the MMD output edge from being corrupted by noise and modulus-dependent delay error that originated in the prior MMD components.

All MMD blocks were built using standard cells, with the exception of the resynchronization flip-flop which was custom-designed to minimize its contribution to the PLL's phase noise.

# E. DCO

The DCO is similar to that presented in [7]. It consists of a single-turn center-tapped inductor, a cross-coupled pair



Fig. 11. Measured PLL phase noise at  $f_{PLL} = 6.56$  GHz with and without the TA enabled for (a) 1-MHz bandwidth and (b) 4.5-MHz bandwidth.

of nMOS transistors, a tail resonant tank of the type proposed in [18], a triode MOS transistor tail source, an integer frequency control element (FCE) bank driven by  $c_I[p]$ , and a fractional FCE bank driven by  $c_F[p]$ . The implemented FCEs are of the type presented in [15], and the minimum-size FCE has an equivalent frequency step of  $\Delta_{\min} = 160 \text{ kHz}$ at 6.5 GHz. The DCO's 16-bit input sequence, d[n], is split into integer and fractional parts. The integer part is encoded to drive the integer FCE bank, which comprises eight  $32\Delta_{min}$ FCEs and five pairs of  $16\Delta_{\min}$ ,  $8\Delta_{\min}$ ,  $4\Delta_{\min}$ ,  $2\Delta_{\min}$ , and  $\Delta_{\min}$ FCEs. The fractional part is up-sampled and re-quantized by a second-order  $\Delta \Sigma$  modulator that generates a five-level output sequence. This output sequence is scrambled by a dynamic element matching (DEM) encoder, the outputs of which drive four  $\Delta_{min}$  FCEs within the fractional FCE bank. The PLL controls the DCO over a range of 41 MHz with a minimum step size of 625 Hz.

The DCO also contains a binary-weighted capacitor array controlled via a serial peripheral interface (SPI), which is in parallel with the integer and fractional FCE banks. The capacitor array has 7 bits of tuning over a frequency range of 5.6-6.6 GHz.

### **IV. MEASUREMENT RESULTS**

The prototype IC contains the PLL in Fig. 3 as well as an SPI port and test circuitry to measure internal signals during testing. It was fabricated in the GlobalFoundries 22-nm CMOS 22FDX technology. A die photograph is shown in Fig. 10, and area and power breakdowns are presented in Table I. The IC is packaged in a QFN28 package with a ground paddle and was tested with an Ironwood SG-MLF-7003 compression elastomer socket. Except where noted otherwise, all of the measurements presented below were taken with a common set of PLL parameters set via the SPI.

Unfortunately, the DCO tank's quality factor is severely degraded by a layout issue to the point that the DCO as-fabricated does not even oscillate, and the problem was not flagged by simulations prior to fabrication because of a post-layout extraction tool flaw. Removing metal near the DCO's main inductor via focused ion beam (FIB) surgery made the DCO functional, but even with its maximum current setting and its supply set to 0.9 V, its oscillation amplitude is extremely low. Consequently, the DCO's power consumption is that of a high-performance DCO, yet it achieves relatively poor phase noise performance (e.g., 10 dB worse at a 1-MHz offset than expected<sup>2</sup>) and its low oscillation amplitude makes it highly sensitive to interference from other circuit blocks. While the PLL's overall measured performance is nevertheless in line with the current state of the art, these issues limited its performance as quantified later in this section. The IC's measured output power is around -34 dBm, so an amplifier module was used to boost the output power to around -2 dBm.

Fig. 11 shows the measured phase noise of the PLL at  $f_{\rm PLL} = 6.5$  GHz with and without the TA enabled for PLL bandwidths of 1 and 4.5 MHz. The integrated random jitter (i.e., the jitter omitting spurious tones),  $\sigma_{\rm RJ}$ , is also reported in Fig. 11, where the integration band extends from 10 kHz to 80MHz. To estimate the expected noise reduction when the TA is enabled,  $A_{\rm TA}$  was calculated indirectly from (7) using measured values of  $g_n$  read through the SPI. It was found that  $g_n$  converged to about 0.758 and 4.832 with and without the TA enabled, respectively, with which two equations based on (7) were solved to find  $A_{\rm TA} = 6.37$ . This suggests that the TA reduces the power of the portions of the PLL's phase noise contributed by both the DMRO's circuit noise and its quantization noise by 16 dB.

In the case of Fig. 11(a), the in-band spot phase noise at a 100-kHz offset frequency decreases from -99 to -107 dBc/Hz when the TA is enabled, whereas in the case of Fig. 11(b), the in-band spot phase noise at a 1-MHz offset frequency decreases from -100 to -112 dBc/Hz when the TA is enabled. In the former case, the PLL's in-band phase noise has comparable contributions from the DMRO, reference

 $<sup>^{2}</sup>$ The spot phase noises of the DCO after the FIB surgery when tuned to 6.5 GHz are -59, -117, and -148 dBc/Hz at offset frequencies of 10 kHz, 1 MHz, and 100 MHz, respectively.



Fig. 12. Measured PLL phase noise at  $f_{PLL} = 6.56$  GHz with the TA enabled for out-of-band fractional spurs at 18-MHz offset frequency.



Fig. 13. (a) Largest measured fractional spurious tone and (b) total integrated jitter ( $\sigma_{TJ}$ ) as a function of the fractional frequency.

signal, and DCO, whereas in the latter case, the in-band phase noise is mostly dominated by the DMRO phase noise. Accordingly, as the TA suppresses the DMRO's contribution to the PLL's phase noise, the PLL's in-band spot phase noise reduction is more significant in Fig. 11(b). Nonetheless, as shown in Fig. 11(a), the spot phase noise at a 1-MHz offset frequency decreases from -100 to -112 dBc/Hz when the TA is enabled, which occurs because the PLL's phase noise is dominated by DMRO quantization error around that offset frequency.

Fig. 12 shows the PLL's measured phase noise with  $\alpha f_{ref}$  set to 18 MHz, the PLL bandwidth set to 1 MHz, and the TA enabled. In this case, the integrated total jitter (i.e., the jitter including spurious tones),  $\sigma_{TJ}$ , was 151 fs<sub>rms</sub>. This represents the best-case total jitter because it corresponds to a case where the spurious tones are well outside the PLL bandwidth.

The largest measured fractional spur and  $\sigma_{TJ}$  versus  $\alpha f_{ref}$  are shown in Fig. 13(a) and Fig. 13(b), respectively, for a PLL bandwidth of 1 MHz. The fractional frequency offset,  $\alpha$ , was



Fig. 14. Representative PLL output spectrum.



Fig. 15. PLL phase noise with and without gain calibration (GC) enabled for a 4.5 MHz bandwidth.

swept such that  $\alpha f_{ref}$  ranges from 1 kHz to 40 MHz with 20 equally-spaced values per decade on a log scale. The integration band of the jitter extends from 10 kHz to 80 MHz to include all significant spurs. The spur powers were measured with the spectrum analyzer's averaging option disabled, and for each value of  $\alpha$ , the instrument was configured to ensure that five negative and positive fractional spur harmonics were always visible. In each case, the largest fractional spur was one of the first three harmonics of  $\alpha f_{ref}$ , and was no higher than -49 dBc. The measured worst-case spurious tone powers are in line with those predicted by simulation results that include random mismatches among the TA delay cells.

For some values of  $\alpha f_{ref} > 5$  MHz, spurs with power lower than -60 dBc and frequencies that are not multiples of  $\alpha f_{ref}$  were measured. The authors have not definitively determined the origin of these spurs, but suspect they are from external interference that is parasitically coupled into the DCO and their effect is exacerbated by the DCO's abnormally low amplitude. These interference spurs are not reported in Fig. 13(a), although their contribution to  $\sigma_{TJ}$  is taken into account in Fig. 13(b), which is why  $\sigma_{TJ}$  increases somewhat for  $\alpha f_{ref} > 5$  MHz.

As shown in Fig. 14, the measured reference spur power is lower than -80 dBc. As mentioned above, the authors believe

|                                       | This                  | work            | C. Weltin-Wu       | A. Elkholy        | M. Heo             | C. Yao          | D. Liao          | Z. Xu             | Y. Wu             | L. Bertulessi    | X. Gao             | Z. Chen           |
|---------------------------------------|-----------------------|-----------------|--------------------|-------------------|--------------------|-----------------|------------------|-------------------|-------------------|------------------|--------------------|-------------------|
|                                       | TA PS dis.            | TA PS ena.      | JSSC'15<br>[7]     | JSSC'15<br>[19]   | ESSCIRC'17<br>[20] | JSSC'17<br>[21] | JSSC'17<br>[22]  | JSSC'16<br>[23]   | JSSC'17<br>[24]   | ISSCC'18<br>[25] | ISSCC'16<br>[26]   | ISSCC'15<br>[27]  |
|                                       |                       |                 |                    | DTC+TDC           |                    | TDC+SA          | 2D Vernier       | SAR-ADC           | DTC+              | Bang-            | Digital            | Digital           |
| Architecture                          | $\Delta\Sigma$ FDC+TA |                 | $\Delta\Sigma$ FDC | +TA               | PI+TA              | R-ADC           | TDC              | TDC               | TDC               | Bang             | Sampling           | Sampling          |
| Technology                            | 22 nm                 |                 | 65 nm              | 65 nm             | 40 nm              | 14 nm           | 55 nm            | 65 nm             | 40 nm             | 65 nm            | 28 nm              | 65 nm             |
| Supply (V)                            | 0.8 (1)               |                 | 1.0                | 1.0               | 1.1                | -               | -                | 1.0               | 0.65/0.8/<br>1.1  | -                | 1.05/1.5           | 1                 |
| Area (mm <sup>2</sup> )               | 0.63/0.17 (2)         |                 | 0.35               | 0.22              | 0.14               | 0.257           | 0.56             | 0.38              | 0.5               | 0.61             | 0.3                | 0.23              |
| $f_{ m ref}$ (MHz)                    | 80                    |                 | 26                 | 50                | 32                 | 26              | 80               | 50                | 50                | 52               | 40                 | 49.15             |
| f <sub>PLL</sub> (GHz)                | 6.5                   |                 | 3.5                | 4.5               | 3.6                | 2.7             | 2.08             | 3.63              | 2                 | 3.8              | 5.83               | 2.68              |
| BW (kHz)                              | 1000                  |                 | 140                | 750               | 1100               | 500 (4)         | 1000             | 1000              | 800               | 150              | -                  | 700               |
| In-band PN<br>(dBc/Hz) <sup>(3)</sup> | -107<br>@100kHz       | -106<br>@100kHz | -87.6<br>@100kHz   | -98.8<br>@100kHz  | -96.9<br>@300kHz   | -106<br>@100kHz | -97.1<br>@100kHz | -102.2<br>@500kHz | -98.7<br>@100kHz  | -102<br>@100kHz  | -104.6<br>@100kHz  | -102.9<br>@100kHz |
| Frac. Spur<br>(dBc)                   | -49                   | -50             | -60                | -51.5             | -50                | -74.5 (4)       | -55              | -41               | -42               | -50              | -54                | -62.3             |
| Ref. Spur<br>(dBc)                    | -80                   | -66 (5)         | -81                | -69               | -60                | -87.6           | -                | -39.6             | -                 | -                | -78                | -60               |
| Tot. Jitter                           | 151/270               | 170/240         | 665 <sup>(7)</sup> | 440/490           | 534 <sup>(7)</sup> | 137 (7)         | 549/-            | 390/622 (8)       | 330/490           | 183 (7)          | 159 <sup>(7)</sup> | 226/240           |
| (fs <sub>rms</sub> ) <sup>(6)</sup>   | 10k-80MHz             | 10k-80MHz       | 12k-20MHz          | 10k-20MHz         | 10k-30MHz          | 10k-10MHz       | 10k-10MHz        | 10k-10MHz         | 1k-30MHz          | 1k-30MHz         | 10k-40MHz          | 1k-100MHz         |
| Power (mW)                            | 23.15                 | 21.7            | 15.6               | 3.7               | 5                  | 13.4            | 9.9              | 9.7               | 10.7              | 5.28             | 8.2                | 11.5              |
| FoM <sub>jitter</sub> <sup>(9)</sup>  | -242.8/<br>-237.7     | -242/<br>-239   | -231.6             | -241.5/<br>-240.5 | -238.5             | -246            | -235.3/          | -238.3/<br>-234.3 | -239.3/<br>-235.9 | -247.5           | -246.8             | -242.3/<br>-241.8 |

TABLE II Performance Summary and Comparison Table

<sup>1</sup> DCO power supply is set to 0.9 V instead of 0.8 V.

<sup>2</sup> With and without decoupling capacitors, respectively.

<sup>3</sup> Phase noise (PN) normalized to 6.5 GHz.

<sup>4</sup> BW estimated from Fig. 15 and spur value taken from Fig. 17 in [21].

<sup>5</sup> Decreases to -70 dBc by raising the DCO supply to 1.1 V.

<sup>6</sup> Best and worst reported total integrated jitters (including spurs),  $\sigma_{TJ}$ .

<sup>7</sup> Not specified if it is random jitter ( $\sigma_{RJ}$ ), or best/worst total jitter ( $\sigma_{TJ}$ ).

<sup>8</sup> Worst jitter taken from Fig 18(b) in [23].
 <sup>9</sup> FoM<sub>iitter</sub> = 10log(jitter<sup>2</sup> × power/1mW) [28].

III [21]. TOM<sub>jitter</sub> – Tolog(jitter ~ p

that the DCO's low oscillation amplitude makes it extremely sensitive to external interference. This theory is supported by the observation that increasing the DCO supply, which increases its oscillation amplitude somewhat, tends to reduce the measured spurs. For example, measurements taken with the DCO supply set to 1.1 V yields a reference spur of -85 dBc. Accordingly, the reported reference spur power in Fig. 14 is a worst-case bound on the reference spur performance of the PLL, as the power of this spur is expected to decrease when the DCO problem mentioned above is fixed in a future version of the PLL.

Fig. 15 shows the measured phase noise of the PLL with and without the gain calibration technique enabled for a PLL bandwidth of 4.5 MHz. The results demonstrate the effect of non-ideal  $\Delta \Sigma$  FDC forward path gain, i.e., the effect of  $g_n$  not satisfying (7), on the PLL's performance at high bandwidth settings. As indicated in Fig. 15, the spot phase noise at a 20-MHz offset frequency decreases by 32 dB when enabling the gain calibration technique, which causes  $\sigma_{TJ}$  to decrease from 2.7 ps<sub>rms</sub> to 248 fs<sub>rms</sub>.

Measurements indicate that enabling the TA PS mode has several effects: 1) it decreases the PLL's power consumption by 1.45 mW, which corresponds to 37% of the TA power consumption when the PS mode is disabled, 2) it increases the best-case  $\sigma_{TJ}$  by 20 fs because the swapping circuitry shown in Fig. 7(b) introduces noise into the reference path, 3) it decreases the worst-case  $\sigma_{TJ}$  by 30 fs due to the slightly better fractional spur performance, and 4) it increases the reference spur power by 14 dB. The authors believe that the reference spur power increase is related to coupling from the analog domain to the DCO, again because of the DCO's low oscillation amplitude.

Table II summarizes the performance of the PLL with and without the TA PS mode enabled, along with that of the best digital PLLs published to date [7], [19]–[27]. As shown in Table II, the PLL achieves one of the best in-band spot phase noises, and its spurious tone performance is comparable to that of other state-of-the-art digital PLLs, even though no dedicated spur mitigation technique is used. In contrast, automatic time-to-digital converter (TDC) gain tracking is used to reduce the fractional spur from -35 to -55 dBc in [22], a TDC calibration technique is used to reduce the fractional spur power from -43 to below -74 dBc in [21], and a phase interpolation nonlinearity calibration technique is used to reduce the fractional spur from -24.58 to -53.1 dBc in [20]. Similarly, digital-to-time converter (DTC) range reduction techniques are used in [19], [24], and [25] to improve fractional spur performance.

The PLL's best-case  $\sigma_{TJ}$  is lower than most of the other PLLs in Table II, but its power consumption is higher than those of the other PLLs. As previously mentioned, the implemented DCO consumes the power of a DCO with much better phase noise. Simulations run by the authors suggest that for a properly designed DCO with similar phase noise to that of the implemented DCO, the power consumption should be around 4 mW instead of 8.75 mW. Alternatively, if the DCO had performed as expected, the PLL's best-case  $\sigma_{TJ}$ would have been 115 fs<sub>rms</sub> instead of 151 fs<sub>rms</sub>. Furthermore, as mentioned in Section III, the PNR digital was overdesigned to be clocked at 1 GHz, which is supported by measurements given that the digital domain power supply can be reduced from 0.8 to 0.55 V without affecting the PLL's performance. In this case, the power consumption of the PNR digital goes down from 4.76 to 2.66 mW. Therefore, the implemented PLL's power consumption is higher than necessary, and it could potentially be lowered by approximately 6.85 mW.

Nonetheless, as shown in Table II, even with the higher-than-necessary digital power consumption and worst-than-expected DCO performance, the PLL achieves a Gao figure of merit (FoM) comparable to or better than prior-art digital PLLs [28]. Had the DCO performed as expected, i.e., with performance comparable to that of the DCO presented in [7], the PLL's best-case FoM would have been -245.1 dB and -245.4 dB with and without the TA PS mode disabled, respectively. Alternatively, had the PLL's power consumption be 6.85 mW lower as explained above, the PLL's best-case FoM would have been -243.7 dB with and without the TA PS mode disabled, respectively.

# ACKNOWLEDGMENT

The authors are grateful to Colin Weltin-Wu, Yiwu Tang, and Dongmin Park for helpful advice, Raghavendra Haresamudram for his constant support with different software tools, Julian Puscar and Mahmoud Abdellatif for digital-flow advice, Prof. Gabriel Rebeiz for the use of his Signal Source Analyzer, Roddy Cruz for FIB support, Mohammed Salah El-Hadri for the die photograph, and Tom McKay and Global Foundries for IC fabrication, process design kit (PDK) support, and helpful advice.

#### REFERENCES

- A. M. Abas, A. Bystrov, D. J. Kinniment, O. V. Maevsky, G. Russell, and A. V. Yakovlev, "Time difference amplifier," *Electron. Lett.*, vol. 38, no. 23, pp. 1437–1438, Nov. 2002.
- [2] M. A. Abas, G. Russell, and D. J. Kinniment, "Design of sub-10picoseconds on-chip time measurement circuit," in *Proc. Design, Autom. Test Eur. Conf. Exhib.*, Feb. 2004, pp. 804–809.
- [3] M. Lee and A. A. Abidi, "A 9 b, 1.25 ps resolution coarse–fine time-todigital converter in 90 nm CMOS that amplifies a time residue," *IEEE J. Solid-State Circuits*, vol. 43, no. 4, pp. 769–777, Apr. 2008.
- [4] S. Lee, Y. Seo, H. Park, and J. Sim, "A 1 GHz ADPLL with a 1.25 ps minimum-resolution sub-exponent TDC in 0.18 μm CMOS," *IEEE J. Solid-State Circuits*, vol. 45, no. 12, pp. 2874–2881, Dec. 2010.
- [5] S. Mandai, T. Iizuka, T. Nakura, M. Ikeda, and K. Asada, "Time-todigital converter based on time difference amplifier with non-linearity calibration," in *Proc. IEEE Eur. Solid-State Circuits Conf. (ESSCIRC)*, Sep. 2010, pp. 266–269.
- [6] B. Kim, H. Kim, and C. H. Kim, "An 8bit, 2.6 ps two-step TDC in 65 nm CMOS employing a switched ring-oscillator based time amplifier," in *Proc. IEEE Custom Integr. Circuits Conf. (CICC)*, Sep. 2015, pp. 1–4.
- [7] C. Weltin-Wu, G. Zhao, and I. Galton, "A 3.5 GHz digital fractional-PLL frequency synthesizer based on ring oscillator frequency-to-digital conversion," *IEEE J. Solid-State Circuits*, vol. 50, no. 12, pp. 2988–3002, Dec. 2015.
- [8] C. Weltin-Wu, E. Familier, and I. Galton, "A linearized model for the design of fractional-*N* digital PLLs based on dual-mode ring oscillator FDCs," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 62, no. 8, pp. 2013–2023, Aug. 2015.
- [9] E. Alvarez-Fontecilla, A. I. Eissa, E. Helal, C. Weltin-Wu, and I. Galton, "Delta-sigma FDC enhancements for FDC-based digital fractional-N PLLs," *IEEE Trans. Circuits Syst. I, Reg. Papers*, early access, Dec. 21, 2020, doi: 10.1109/TCSI.2020.3040346.
- [10] S. Pamarti, L. Jansson, and I. Galton, "A wideband 2.4-GHz delta-sigma fractional-NPLL with 1-Mb/s in-loop modulation," *IEEE J. Solid-State Circuits*, vol. 39, no. 1, pp. 49–62, Jan. 2004.

- [11] E. Temporiti, G. Albasini, I. Bietti, R. Castello, and M. Colombo, "A 700-kHz bandwidth ΣΔ fractional synthesizer with spurs compensation and linearization techniques for WCDMA applications," *IEEE J. Solid-State Circuits*, vol. 39, no. 9, pp. 1446–1454, Sep. 2004.
- [12] C. Venerus and I. Galton, "Quantization noise cancellation for FDCbased fractional-N PLLs," *IEEE Trans. Circuits Syst. II, Exp. Briefs*, vol. 62, no. 12, pp. 1119–1123, Dec. 2015.
- [13] A. A. Abidi, "Phase noise and jitter in CMOS ring oscillators," *IEEE J. Solid-State Circuits*, vol. 41, no. 8, pp. 1803–1816, Aug. 2006.
- [14] C. Venerus and I. Galton, "Delta-sigma FDC based fractional-N PLLs," *IEEE Trans. Circuits Syst. I, Reg. Papers*, vol. 60, no. 5, pp. 1274–1285, May 2013.
- [15] C. Venerus and I. Galton, "A TDC-free mostly-digital FDC-PLL frequency synthesizer with a 2.8-3.5 GHz DCO," *IEEE J. Solid-State Circuits*, vol. 50, no. 2, pp. 450–463, Feb. 2015.
- [16] J. Daniels, W. Dehaene, and M. Steyaert, "All-digital differential VCO-based A/D conversion," in *Proc. IEEE Int. Symp. Circuits Syst.*, May 2010, pp. 1085–1088.
- [17] M. Baert and W. Dehaene, "A 5-GS/s 7.2-ENOB time-interleaved VCObased ADC achieving 30.5 fJ/cs," *IEEE J. Solid-State Circuits*, vol. 55, no. 6, pp. 1577–1587, Jun. 2020.
- [18] E. Hegazi, H. Sjoland, and A. A. Abidi, "A filtering technique to lower LC oscillator phase noise," *IEEE J. Solid-State Circuits*, vol. 36, no. 12, pp. 1921–1930, Dec. 2001.
- [19] A. Elkholy, T. Anand, W.-S. Choi, A. Elshazly, and P. K. Hanumolu, "A 3.7 mW low-noise wide-bandwidth 4.5 GHz digital fractional-N PLL using time amplifier-based TDC," *IEEE J. Solid-State Circuits*, vol. 50, no. 4, pp. 867–881, Apr. 2015.
- [20] M. Heo, S. Bae, J. Lee, C. Kim, and M. Lee, "Quantizer-less proportional path fractional-N digital PLL with a low-power high-gain time amplifier and background multi-point spur calibration," in *Proc.* 43rd IEEE Eur. Solid State Circuits Conf. (ESSCIRC), Sep. 2017, pp. 147–150.
- [21] C.-W. Yao et al., "A 14-nm 0.14-psrmsfractional-N digital PLL with a 0.2-ps resolution ADC-assisted coarse/fine-conversion chopping TDC and TDC nonlinearity calibration," *IEEE J. Solid-State Circuits*, vol. 52, no. 12, pp. 3446–3457, Dec. 2017.
- [22] D. Liao, H. Wang, F. F. Dai, Y. Xu, R. Berenguer, and S. M. Hermoso, "An 802.11a/b/g/n digital fractional-N PLL with automatic TDC linearity calibration for spur cancellation," *IEEE J. Solid-State Circuits*, vol. 52, no. 5, pp. 1210–1220, May 2017.
- [23] Z. Xu, M. Miyahara, K. Okada, and A. Matsuzawa, "A 3.6 GHz lownoise fractional-N digital PLL using SAR-ADC-based TDC," *IEEE J. Solid-State Circuits*, vol. 51, no. 10, pp. 2345–2356, Oct. 2016.
- [24] Y. Wu, M. Shahmohammadi, Y. Chen, P. Lu, and R. B. Staszewski, "A 3.5–6.8-GHz wide-bandwidth DTC-assisted fractional-N all-digital PLL with a MASH ΔΣ-TDC for low in-band phase noise," *IEEE J. Solid-State Circuits*, vol. 52, no. 7, pp. 1885–1903, Jul. 2017.
- [25] L. Bertulessi, L. Grimaldi, D. Cherniak, C. Samori, and S. Levantino, "A low-phase-noise digital bang-bang PLL with fast lock over a wide lock range," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2018, pp. 252–254.
- [26] X. Gao et al., "9.6 A 2.7-to-4.3 GHz, 0.16 psrms-jitter,-246.8 dB-FOM, digital fractional-N sampling PLL in 28 nm CMOS," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Jan. 2016, pp. 174–175.
- [27] Z. Chen et al., "14.9 sub-sampling all-digital fractional-N frequency synthesizer with-111dBc/Hz in-band phase noise and an FOM of -242dB," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2015, pp. 1–3.
- [28] X. Gao, E. A. M. Klumperink, P. F. J. Geraedts, and B. Nauta, "Jitter analysis and a benchmarking figure-of-merit for phase-locked loops," *IEEE Trans. Circuits Syst. II, Exp. Briefs*, vol. 56, no. 2, pp. 117–121, Feb. 2009.



**Eslam Helal** received the B.Sc. (Hons.) and M.Sc. degrees in electrical engineering from Ain Shams University (ASU), Cairo, Egypt, in 2014 and 2018, respectively. He is currently pursuing the Ph.D. degree with the University of California at San Diego, San Diego, CA, USA.

From 2014 to 2018, he was a part-time Analog Design Engineer with Si-Ware Systems (now Goodix Technology), Cairo. He was a Teaching and Research Assistant with the Electronics and Communications Engineering Department, ASU. His

research interests include analog/mixed-signal integrated circuits and systems, frequency synthesizers, and data converters.



**Enrique Alvarez-Fontecilla** received the B.Sc. and M.Sc. degrees in electrical engineering from the Universidad Católica de Chile (PUC), Santiago, Chile, in 2011 and 2013, respectively. He is currently pursuing the Ph.D. degree with the University of California at San Diego, San Diego, CA, USA.

From 2012 to 2015, he was an Adjunct Assistant Professor with the School of Engineering and also with the Institute of Philosophy, PUC.



**Amr I. Eissa** received the B.Sc. (Hons.) and M.Sc. degrees in electrical engineering from Ain Shams University (ASU), Cairo, Egypt, in 2012 and 2016, respectively. He is currently pursuing the Ph.D. degree in electronic circuits and systems with the University of California at San Diego, San Diego, CA, USA.

From 2013 to 2016, he was a Teaching and Research Assistant with Electronics and Communications Engineering Department, ASU. His research interests include the analysis and design of analog/mixed-signal integrated circuits and systems.



**Ian Galton** (Fellow, IEEE) received the B.Sc. degree from Brown University, Providence, RI, USA, in 1984, and the M.S. and Ph.D. degrees from the California Institute of Technology, Pasadena, CA, USA, in 1989 and 1992, respectively, all in electrical engineering.

Since 1996, he has been a Professor of Electrical Engineering with the University of California at San Diego, San Diego, CA, where he teaches and conducts research in the field of mixed-signal integrated circuits and systems for communications.

His research involves the invention, analysis, and integrated circuit implementation of critical communication system blocks such as data converters and phase-locked loops.