Theses Summary - Embedding and Batch-Scheduling Data Flow Graphs in Software Switches

This section summarizes theses presented in the dissertation. The main theme of the dissertation was improving software switch performance by improving its data-flow-graph batch-scheduling and embedding. Below we present the three main contributions and its sub-results.

The first thesis presents a model to describe data-flow-graph batch-scheduling in software switches. The model enables to formulate the optimal data-flow-graph batch- scheduling. An interesting equivalency is highlighted between weighted-fair queuing and run-to-completion scheduling modes under assumptions from Section 2.3.2. Chapter 2 and [C2] presents this work.

Thesis 1. [C2]I have designed a comprehensive analytical model for batch-scheduling in packet processing pipelines under delay constraints. The model covers run-to-completion and WFQ (weighted-fair queuing) scheduling mode. I formulated the problem in a novel analytical model and I gave an algorithm to the optimal batch-schedule for WFQ and run-to-completion mode. Furthermore, I have shown a deep equivalence between run-to-completion and WFQ scheduling under assumptions A1, A2, A3 and A4.

The first result quantifies network functions processing costs in software switches with thebatchinessmetricβ. Details and calculation ofβ is shown in Section 2.3.1.

Thesis 1.1. [C2]By extensive measurements on real-life packet processing engines, I have shown that batch processing time of common-case primitive packet processing functions exhibit a linear dependence on the number of packets in each batch.

The second result is a comprehensive system model for data flow graphs. Description of the model is in Section 2.3.2.

Thesis 1.2. [C2]I have designed a system model to describe batch processing packet processing pipelines. I have modeled packet processing primitives as a combination of an ingress queue and a network function at the egress connected back-to-back. The model covers all types of packet processing primitives that do buffering, processing or splitting.

The model (Thesis 1.2) and the network function processing cost quantification (The- sis 1.1) enables formulating the optimal data-flow-graph batch-scheduling. A formulation for weighted-fair queuing (WFQ) is given in Thesis 1.3 and for run-to-completion (RTC) in Thesis 1.4. Formulation and detailed proof is given for WFQ in Section 2.3.3, and for RTC in Section 2.3.4.

Thesis 1.3. [C2]I have shown that the optimal WFQ (weighted-fair queuing) data-flow- graph batch-scheduling is unique and it can be obtained in polynomial time.

Proof. The mathematical program is convex.

Observation. We can redefine assumptionA3Feasibilityusing the formulated program.

Feasibility stands, if there is at least one feasible solution for the mathematical program.

Thesis 1.4. [C2]I have shown that the optimal run-to-completion data-flow-graph batch- scheduling is unique and it can be obtained in polynomial time.

Proof. The mathematical program is convex.

A compelling finding of ours is that any feasible data-flow-graph batch-scheduling of WFQ and RTC can be interchanged by a simple conversion in our model. Section 2.3.5 details and provides a proof of this equivalence.

Thesis 1.5. [C2]I have shown that, under assumptions A1, A2, A3, and A4, any system state feasible under the WFQ scheduling model is also attainable with a run-to-completion schedule.

The second main contribution of this dissertation relaxes the assumptions of the model from Thesis 1.2 and introduces a controller framework to implement optimal data-flow- graph batch-scheduling on real software switches. Results are detailed in Chapter 3 and are published in [C2].

Thesis 2. [C2]I have designed a one-step receding horizon optimal controller to implement the optimal batch-schedule under delay constraints in the run-to-completion model and I deployed the controller in a real-life packet processing engine. My design includes two heuristic algorithms to adaptively compute the optimal queuing; a simplified gradient optimization algorithm that is shown to be optimal in steady state, and a much simplified on-off control algorithm which does not require fractional buffer. I improved the performance of these control algorithms by introducing a heuristic to adaptively insert/remove buffers. My implementation also contains a feasibility-recovery algorithm and a pipeline decomposition heuristics that operate on different timescales to recover the pipeline from a state when delay constraints are temporarily violated.

The controller framework takes inspiration from Nagle’s algorithm [60]. This requires fine-grained control over queue backlogs. For this purpose, The first result is thefractional buffer, a queuing abstraction (Thesis 2.1). Details are shown in Section 3.3.1.

Thesis 2.1. [C2]I have introduced the fractional buffer queuing abstraction, which enables to control the size of the batches a packet processing function receives at a fine grain.

The receding horizon controller is introduced in Section 3.3.2. The first control algorithm of the controller is a projected gradient control algorithm, presented in Section 3.3.3.

Thesis 2.2. [C2]I designed a gradient optimization algorithm to tune fractional buffer sizes.

I have shown that this control algorithm attains the optimal batch-schedule in steady state.

Thesis 2.3 presents an effective heuristics to improve performance of the controller.

Details in Section 3.3.4.

Thesis 2.3. [C2]I have designed an algorithm to adaptively remove buffers from the input of specific modules if (1) if it already receives large enough batches at the input, (2) if it would further fragment batches instead of reconstructing them, or (3) introducing the buffer already violates the delay-SLO.

The controller contains a heuristic to recover the control algorithm from an infeasible (SLO-violating) state at each control cycle. Section 3.4.1 shows the details.

Thesis 2.4. [C2]I have developed a feasibility recovery algorithm to bring the controller back to the feasible region.

Poor resource allocation can result an infeasible state for the controller in which it is unable to set triggers to conform with delay-SLOs. A feasibility-recovery mechanism to fix the delay-SLO violation caused by poor resource allocation is introduced in Section 3.4.2.

Thesis 2.5. [C2] I have developed a pipeline decomposition method to reallocate CPU resources across different flows by decomposing the pipeline to multiple disjunct connected worker subgraphs.

The fractional buffer is not available for most software switches yet. To make the controller work with simple queue or buffer implementations available in software switches, an on-off control algorithm is presented in Section 3.5.1.

Thesis 2.6. [C2]I have designed and implemented an on-off control algorithm which switches between two extreme control actions based on the delay-SLOs: it either inserts a full batch-size buffer at the input of each module or removes queuing all together. The on-off controller does not require a fractional buffer.

The controller framework is implemented on BESS [22]. Implementation details are shown in Section 3.5.2. Extensive evaluation of the controller over real use cases is presented in Section 3.5.3.

Thesis 2.7. [C2]In comprehensive evaluations on 6 real-life reference scenarios I have shown that my optimal batch-scheduling controller implementation provides up to2–3×

performance improvement while closely conforming to SLO requirements. In addition, I gave a comprehensive experimental evaluation of the operational regime where the controller provides the highest performance improvement.

As Thesis 2.5 highlights, a poor resource allocation limits performance of software switches. Chapter 4 and [C3] focuses on solving resource allocation issues by focusing on data-flow-graph embedding (i.e.,the act of assigning modules to CP U cores) to provide feasibility, efficiency, and resiliency.

Thesis 3. [C3]I have designed a novel, comprehensive analytical model for packet processing pipeline resource allocation (embedding). I have designed a method to protect high-availability data flow graph flows against CPU errors. I have formulated the embedding problem as an Integer Linear Program to find an optimal solution, and gave heuristic algorithms to obtain feasible solutions in polynomial time.

The first sub-result is a comprehensive model for data-flow-graph embedding presented in Section 4.1.

Thesis 3.1. [C3]I have designed a model to express the resilience requirements on data flow graph flows. The model enables marking high-availability flows which requires resilience against CPU errors. Additionally, I have designed a method to protect the high-availability flows against CPU errors.

The efficient embedding without resilience requirements is NP-hard and can be reducted tominimum bin-packing[94]. Hence the problem is NP-complete (Thesis 3.2). The problem remains NP-complete with resilience constraints (Thesis 3.3). Problem formulation and proof is presented in Section 4.3.1.

Thesis 3.2. [C3]I have shown that both MinSumEmbed and MinMaxEmbed are NP-complete.

Thesis 3.3. [C3]I have shown that both ResMinSumEmbed and ResMinMaxEmbed are NP-complete.

An exact algorithm is given in Section 4.3.2 to obtain optimal solution for the embedding problems.

Thesis 3.4. [C3]I have formulated the feasible embedding and resilient embedding as decision problems. I have shown optimal solution can be found with exponential time complexity.

Heuristic algorithms find embedding solution in polynomial time. The heuristics are presented in Section 4.3.3. The algorithms are evaluated numerically in Section 4.3.4, and in a case study in Section 4.3.5.

Thesis 3.5. [C3]I have developed a family of fast and effective heuristics for finding feasible embedding and resilient embedding in polynomial time. I have conducted extensive numerical evaluation and a real-life case study, and showed that the best-fit heuristic provides the best approximation of the optimal embedding in real-life applications.

International Journals

[ J1] Tamás Lévaiand Gábor Rétvári,“Batch-scheduling Data Flow Graphs with Service- level Objectives on Multicore Systems”, Infocommunications Journal, 2022. (Accepted) [ J2] Tamás Lévai, Gergely Pongrácz, Péter Megyesi, Péter Vörös, Sándor Laki, Feli- cián Németh, and Gábor Rétvári,“The Price for Programmability in the Software Data Plane: The Vendor Perspective”, IEEE Journal on Selected Areas in Communications, vol. 36, pp. 2621–2630, 2018.

[ J3] Tamás Lévai, István Pelle, Felicián Németh, and András Gulyás, “EPOXIDE: A Modular Prototype for SDN Troubleshooting”, ACM SIGCOMM COMP UTER COM- MUNICATION REVIEW, vol. 45, no. 4, pp. 359–360, 2015.

International Conferences

[C1] Jane Yen, Tamás Lévai, Qinyuan Ye, Xiang Ren, Ramesh Govindan, and Barath Raghavan,“Semi-Automated Protocol Disambiguation and Code Generation”, ACM SIGCOMM 2021, 2021

[C2] Tamás Lévai, Felicián Németh, Barath Raghavan, and Gábor Rétvári,“Batchy: Batch- scheduling Data Flow Graphs with Service-level Objectives”, 17th USENIX Symposium on Networked Systems Design and Implementation (NSDI 20), 2020.

[C3] Tamás Lévai and Gábor Rétvári, “Resilient Dataflow Graph Embedding for Pro- grammable Software Switches”, 11th International Workshop on Resilient Networks Design and Modeling (RNDM), 2019.

[C4] István Pelle,Tamás Lévai, Felicián Németh, and András Gulyás,“One tool to rule them all: a modular troubleshooting framework for SDN (and other) networks”, 1st ACM SIGCOMM Symposium on Software Defined Networking Research (SOSR), 2015.

[C5] Tamás Lévai,“Cost-effective Network Emulation using Cloud Infrastructure”, 20th International Scientific Student Conference POSTER 2016, 2016.

[C6] Tamás Lévai and Felicián Németh, “Distributed Network Emulation over Cloud Infrastructure”, Mesterpróba 2016, pp. 57–60.

Other publications

[O1] Jianfeng Wang,Tamás Lévai, Zhuojin Li, Marcos A. M. Vieira, Ramesh Govindan, and Barath Raghavan,“Galleon: Reshaping the Square Peg of NFV”, arXiv preprint arXiv:2101.06466, 2021.

[O2] Vivian Fang,Tamas Levai, Sangjin Han, Sylvia Ratnasamy, Barath Raghavan, and Justine Sherry,“Evaluating Software Switches: Hard or Hopeless?”, EECS Department, University of California, Berkeley, Tech. Rep. UCB/EECS-2018 136, 2018.

[1] M. Moradi, K. Sundaresan, E. Chai, S. Rangarajan, and Z. M. Mao, “Skycore: Moving core to the edge for untethered and reliable uav-based lte networks,” inProceedings of the 24th Annual International Conference on Mobile Computing and Networking, 2018.

[2] NGMN Alliance, “5g white paper,”Next generation mobile networks, white paper, pp.

1–125, 2015.

[3] M. A. Jamshed, “OMEC over the Berkeley Extensible Soft Switch Infrastructure,”

September 2019.

[4] F. Voigtländer, A. Ramadan, J. Eichinger, C. Lenz, D. Pensky, and A. Knoll, “5g for robotics: Ultra-low latency control of distributed robotic systems,” in2017 Interna- tional Symposium on Computer Science and Intelligent Controls (ISCSIC). IEEE, 2017, pp. 69–72.

[5] Y. Ren, G. Liu, V. Nitu, W. Shao, R. Kennedy, G. Parmer, T. Wood, and A. Tchana,

“Fine-grained isolation for scalable, dynamic, multi-tenant edge clouds,” in 2020 USENIX Annual Technical Conference (USENIX ATC 20). USENIX Association, Jul.

2020, pp. 927–942. [Online]. Available: https://www.usenix.org/conference/atc20/

presentation/ren

[6] D. Firestone, A. Putnam, S. Mundkur, D. Chiou, A. Dabagh, M. Andrewartha, H. Angepat, V. Bhanu, A. Caulfield, E. Chung, H. K. Chandrappa, S. Chaturmo- hta, M. Humphrey, J. Lavier, N. Lam, F. Liu, K. Ovtcharov, J. Padhye, G. Popuri, S. Raindel, T. Sapre, M. Shaw, G. Silva, M. Sivakumar, N. Srivastava, A. Verma, Q. Zuhair, D. Bansal, D. Burger, K. Vaid, D. A. Maltz, and A. Greenberg, “Azure

accelerated networking: Smartnics in the public cloud,” in15th USENIX Symposium on Networked Systems Design and Implementation (NSDI 18). USENIX Association, Apr. 2018, pp. 51–66.

[7] ETSI, “Network functions virtualisation – introductory white paper,” 2012.

[8] G. P. Katsikas, T. Barbette, D. Kostić, R. Steinert, and G. Q. M. Jr., “Metron: NFV service chains at the true speed of the underlying hardware,” inUSENIX NSDI, 2018, pp. 171–186.

[9] T. Barbette, C. Soldani, and L. Mathy, “Fast userspace packet processing,” inACM/IEEE Symposium on Architectures for Networking and Communications Systems, 2015, pp.

5–16.

[10] P. Bosshart, G. Gibb, H.-S. Kim, G. Varghese, N. McKeown, M. Izzard, F. Mujica, and M. Horowitz, “Forwarding metamorphosis: Fast programmable match-action processing in hardware for sdn,”SIGCOMM Comput. Commun. Rev., vol. 43, no. 4, p.

99–110, 2013.

[11] S. Chole, A. Fingerhut, S. Ma, A. Sivaraman, S. Vargaftik, A. Berger, G. Mendelson, M. Alizadeh, S.-T. Chuang, I. Keslassy, A. Orda, and T. Edsall, “Drmt: Disaggregated programmable switching,” inProceedings of the Conference of the ACM Special Interest Group on Data Communication, ser. SIGCOMM ’17, 2017, p. 1–14.

[12] B. Li, K. Tan, L. L. Luo, Y. Peng, R. Luo, N. Xu, Y. Xiong, P. Cheng, and E. Chen,

“Clicknp: Highly flexible and high performance network processing with recon- figurable hardware,” in Proceedings of the 2016 ACM SIGCOMM Conference, ser.

SIGCOMM ’16. ACM, 2016, p. 1–14.

[13] Z. Zhao, H. Sadok, N. Atre, J. C. Hoe, V. Sekar, and J. Sherry, “Achieving 100gbps intrusion prevention on a single server,” in14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20). USENIX Association, Nov. 2020, pp.

1083–1100.

[14] R. Giladi,Network processors: architecture, programming, and implementation. Mor- gan Kaufmann, 2008.

[15] Y. Le, H. Chang, S. Mukherjee, L. Wang, A. Akella, M. M. Swift, and T. V. Laksh- man, “Uno: Uniflying host and smart nic offload for flexible packet processing,” in Proceedings of the 2017 Symposium on Cloud Computing, 2017, p. 506–519.

[16] S. Han, K. Jang, K. Park, and S. Moon, “Packetshader: A GP U-accelerated software router,” inACM SIGCOMM, 2010, pp. 195–206.

[17] J. Kim, K. Jang, K. Lee, S. Ma, J. Shim, and S. Moon, “NBA (Network Balancing Act):

A high-performance packet processing framework for heterogeneous processors,”

inEuroSys, 2015, pp. 22:1–22:14.

[18] Z. Zheng, J. Bi, H. Wang, C. Sun, H. Yu, H. Hu, K. Gao, and J. Wu, “Grus: Enabling latency SLOs for GP U-accelerated NFV systems,” inIEEE ICNP, 2018, pp. 154–164.

[19] K. Zhang, B. He, J. Hu, Z. Wang, B. Hua, J. Meng, and L. Yang, “G-NET: Effective GP U sharing in NFV systems,” inUSENIX NSDI, 2018, pp. 187–200.

[20] X. Yi, J. Duan, and C. Wu, “GP UNFV: A GP U-accelerated NFV system,” in ACM APNet, 2017, pp. 85–91.

[21] W. Sun and R. Ricci, “Fast and flexible: Parallel packet processing with gpus and click,” inArchitectures for Networking and Communications Systems. IEEE, 2013, pp.

25–35.

[22] S. Han, K. Jang, A. Panda, S. Palkar, D. Han, and S. Ratnasamy, “SoftNIC:

A software NIC to augment hardware,” EECS Department, University of California, Berkeley, Tech. Rep. UCB/EECS-2015-155, May 2015. [Online]. Available:

http://www2.eecs.berkeley.edu/Pubs/TechRpts/2015/EECS-2015-155.html

[23] R. Morris, E. Kohler, J. Jannotti, and M. F. Kaashoek, “The Click modular router,” in ACM SOSP, 1999, pp. 217–231.

[24] D. Barach, L. Linguaglossa, D. Marion, P. Pfister, S. Pontarelli, and D. Rossi, “High- speed software data plane via vectorized packet processing,”IEEE Communications Magazine, vol. 56, no. 12, pp. 97–103, December 2018.

[25] J. Wang, T. Lévai, Z. Li, M. A. M. Vieira, R. Govindan, and B. Raghavan, “Galleon:

Reshaping the square peg of nfv,” 2021.

[26] A. Panda, S. Han, K. Jang, M. Walls, S. Ratnasamy, and S. Shenker, “NetBricks: Taking the V out of NFV,” inUSENIX OSDI, 2016, pp. 203–216.

[27] W. Tu, Y.-H. Wei, G. Antichi, and B. Pfaff, “Revisiting the open vswitch dataplane ten years later,” in Proceedings of the 2021 ACM SIGCOMM 2021 Conference, ser.

SIGCOMM ’21. New York, NY, USA: Association for Computing Machinery, 2021, p. 245–257. [Online]. Available: https://doi.org/10.1145/3452296.3472914

[28] “BPF and XDP Reference Guide,” https://docs.cilium.io/en/latest/bpf/.

[29] L. Rizzo, “Netmap: A novel framework for fast packet i/o,” inUSENIX ATC, 2012.

[30] Intel, “Data Plane Development Kit,” http://dpdk.org.

[31] I. Zhang, A. Raybuck, P. Patel, K. Olynyk, J. Nelson, O. S. N. Leija, A. Martinez, J. Liu, A. K. Simpson, S. Jayakar, P. H. Penna, M. Demoulin, P. Choudhury, and A. Badam,

“The demikernel datapath os architecture for microsecond-scale datacenter systems,”

inProceedings of the ACM SIGOPS 28th Symposium on Operating Systems Principles, ser. SOSP ’21. New York, NY, USA: Association for Computing Machinery, 2021, p.

195–211. [Online]. Available: https://doi.org/10.1145/3477132.3483569

[32] N. McKeown, T. Anderson, H. Balakrishnan, G. Parulkar, L. Peterson, J. Rexford, S. Shenker, and J. Turner, “Openflow: Enabling innovation in campus networks,”

SIGCOMM Comput. Commun. Rev., vol. 38, no. 2, p. 69–74, 2008.

[33] B. Pfaffet al., “The design and implementation of Open vSwitch,” inUSENIX NSDI, 2015, pp. 117–130.

[34] L. Molnár, G. Pongrácz, G. Enyedi, Z. L. Kis, L. Csikor, F. Juhász, A. Kőrösi, and G. Rétvári, “Dataplane specialization for high-performance OpenFlow software switching,” inACM SIGCOMM, 2016, pp. 539–552.

[35] P. Bosshart, D. Daly, G. Gibb, M. Izzard, N. McKeown, J. Rexford, C. Schlesinger, D. Talayco, A. Vahdat, G. Varghese, and D. Walker, “P4: Programming protocol- independent packet processors,”SIGCOMM Comput. Commun. Rev., vol. 44, no. 3, 2014.

[36] W. P. Stevens, G. J. Myers, and L. L. Constantine, “Structured design,”IBM Syst. J., vol. 13, no. 2, p. 115–139, Jun. 1974.

[37] P. Arumí and X. Amatriain, “Time-triggered static schedulable dataflows for multi- media systems,” inMultimedia Computing and Networking 2009, R. Rejaie and K. D.

Mayer-Patel, Eds., vol. 7253, International Society for Optics and Photonics. SPIE, 2009, pp. 131 – 142.

[38] H.-Y. Chenet al., “TensorFlow: A system for large-scale machine learning,” inUSENIX OSDI, vol. 16, 2016, pp. 265–283.

[39] W. Taymans, S. Baker, A. Wingo, R. S. Bultje, and S. Kost,GStreamer Application Development Manual. United Kingdom: Samurai Media Limited, 2016.

[40] “Apache Kafka,” https://kafka.apache.org.

[41] I. Pelle, T. Lévai, F. Németh, and A. Gulyás, “One tool to rule them all: A modular troubleshooting framework for sdn (and other) networks,” inProceedings of the 1st ACM SIGCOMM Symposium on Software Defined Networking Research, ser. SOSR ’15, 2015.

[42] J. R. Amazonas, A. Akbari-Moghanjoughi, G. Santos-Boada, and J. S. Pareta, “Service level agreements for communication networks: A survey,”INFOCOMP Journal of Computer Science, vol. 18, no. 1, pp. 32–56, 2019.

[43] N. Mahmudet al., “Evaluating industrial applicability of virtualization on a distributed multicore platform,” inIEEE ETFA, 2014.

[44] T. Lévai, G. Pongrácz, P. Megyesi, P. Vörös, S. Laki, F. Németh, and G. Rétvári, “The price for programmability in the software data plane: The vendor perspective,”IEEE Journal on Selected Areas in Communications, vol. 36, no. 12, pp. 2621–2630, Dec.

2018.

[45] “Tipsy Source Code,” https://github.com/hsnlab/tipsy.

[46] S. G. Kulkarni, W. Zhang, J. Hwang, S. Rajagopalan, K. K. Ramakrishnan, T. Wood, M. Arumaithurai, and X. Fu, “Nfvnice: Dynamic backpressure and scheduling for nfv service chains,” in Proceedings of the Conference of the ACM Special

Interest Group on Data Communication, ser. SIGCOMM ’17. New York, NY, USA: Association for Computing Machinery, 2017, p. 71–84. [Online]. Available:

https://doi.org/10.1145/3098822.3098828

[47] C. A. Waldspurger and E. Weihl W, “Stride scheduling: deterministic proportional- share resource management,” 1995, massachusetts Institute of Technology.

[48] A. Belay, G. Prekas, M. Primorac, A. Klimovic, S. Grossman, C. Kozyrakis, and E. Bugnion, “The IX operating system: Combining low latency, high throughput, and efficiency in a protected dataplane,” ACM Transactions on Computer Systems (TOCS), vol. 34, no. 4, p. 11, 2017.

[49] E. Warnicke, “Vector Packet Processing - One Terabit Router,” July 2017.

[50] L. Linguaglossa, D. Rossi, S. Pontarelli, D. Barach, D. Marjon, and P. Pfister, “High- speed software data plane via vectorized packet processing,”Tech. Rep., 2017, https:

//perso.telecom-paristech.fr/drossi/paper/vpp-bench-techrep.pdf .

[51] S. Lange, L. Linguaglossa, S. Geissler, D. Rossi, and T. Zinner, “Discrete-time modeling of NFV accelerators that exploit batched processing,” inIEEE INFOCOM, April 2019, pp. 64–72.

[52] J. D. Brouer, “Network stack challenges at increasing speeds,” Linux Conf Au, Jan 2015.

[53] J. Corbet, “Batch processing of network packets,” Linux Weekly News, Aug 2018.

[54] R. Gandhi, H. H. Liu, Y. C. Hu, G. Lu, J. Padhye, L. Yuan, and M. Zhang, “Duet: Cloud scale load balancing with hardware and software,” in ACM SIGCOMM, 2014, pp.

27–38.

[55] B. Stephens, A. Akella, and M. Swift, “Loom: Flexible and efficient NIC packet scheduling,” inUSENIX NSDI, Feb. 2019, pp. 33–46.

[56] J. D. Little, “A proof for the queuing formula: L=λw,”Operations research, vol. 9, no. 3, pp. 383–387, 1961.

[57] C. S. Wong, I. Tan, R. D. Kumari, and F. Wey, “Towards achieving fairness in the linux scheduler,”SIGOPS Oper. Syst. Rev., vol. 42, no. 5, p. 34–43, Jul. 2008.

[58] S. Boyd, S. P. Boyd, and L. Vandenberghe,Convex optimization. Cambridge university press, 2004.

[59] M. S. Bazaraa, H. D. Sherali, and C. M. Shetty,Nonlinear programming: Theory and algorithms. John Wiley & Sons, 2013.

[60] J. Nagle, “Congestion control in IP/TCP internetworks,” Internet Requests for Com- ments, RFC Editor, RFC 896, January 1984.

[61] P. Arumí, D. Garcia, and X. Amatriain, “A data flow pattern language for audio and music computing,” 2006.

[62] M. Quigley, K. Conley, B. P. Gerkey, J. Faust, T. Foote, J. Leibs, R. Wheeler, and A. Y.

Ng, “ROS: an open-source Robot Operating System,” inICRA Workshop on Open Source Software, 2009.

[63] C. E. Leiserson and J. B. Saxe, “Retiming synchronous circuitry,”Algorithmica, vol. 6, no. 1-6, pp. 5–35, Jun. 1991.

[64] T. W. O’Neil, S. Tongsima, and E. H. Sha, “Extended retiming: optimal scheduling via a graph-theoretical approach,” inIEEE ICASSP, vol. 4, 1999.

[65] M. Goldenberg, P. Lu, and J. Schaeffer, “TrellisDAG: A system for structured DAG scheduling,” inJSSPP, 2003, pp. 21–43.

[66] E. A. Lee and D. G. Messerschmitt, “Synchronous data flow,”Proceedings of the IEEE, vol. 75, no. 9, pp. 1235–1245, 1987.

[67] E. A. Lee and D. G. Messerschmitt, “Static scheduling of synchronous data flow programs for digital signal processing,”IEEE Transactions on Computers, vol. C-36, no. 1, pp. 24–35, 1987.

[68] A. H. Ghamarian, S. Stuijk, T. Basten, M. Geilen, and B. D. Theelen, “Latency mini- mization for synchronous data flow graphs,” in10th Euromicro Conference on Digital System Design Architectures, Methods and Tools (DSD 2007). IEEE, 2007, pp. 189–196.

[69] G. Bilsen, M. Engels, R. Lauwereins, and J. Peperstraete, “Cycle-static dataflow,”IEEE Transactions on Signal Processing, vol. 44, no. 2, pp. 397–408, 1996.

No documento Embedding and Batch-Scheduling Data Flow Graphs in Software Switches (páginas 92-110)