Industry-University Cooperation Programs in the context of HEP Computing and Instrumentation
E. M. GREGORES, R. L. IOPE, S. F. NOVAES
São Paulo Research and Analysis Center
Fundamental Research on High Energy Physics
◦ Physics analysis
◦ Data processing
◦ Scien;fic instrumenta;on
Innova;on
◦ Develop new state-of-the-art technologies
◦ Partnership with leading players in the private sector
◦ Joint ventures with high profile academic ins;tu;ons
Training and Educa;on
◦ Develop exper;se in advanced fields
◦ Find and promote new talents
Outreach
◦ Share the knowledge with society
◦ Poster, sites, game, OSG applica;ons, Masterclass, etc.
UNESP Center for Scien0fic
Compu0ng
Human Resources
Tackling relevant problems with socio-economic impact
Highly Qualified State-of-art HRs
Exper0se
Hardware SoFware
Key Partner
Key Ac;vi;es Revenue Streams Value Proposi;ons
Key Resources Cost Structure
“B USINESS M ODEL ”
(Ref.: hZp://www.businessmodelgenera;on.com/)Intel / Unesp First Collaborative Activities
Cloud infrastructure (IaaS) focused on educa;onal ac;vi;es
◦ Partnership with the São Paulo Secretary of Educa;on
◦ Automated, customized load balancing cloud for high-school teacher training
◦ Deployment of two OpenStack-based cloud infrastructures
◦ Sao Paulo state - Secretary of Educa;on datacenter
◦ NCC/Unesp datacenter
Cloud compu;ng security
◦ Evalua;on of Intel Trusted Execu;on Technology (Intel TXT)
◦ Key component of Intel vPro Technologies (Intel AMT, VT, TXT)
◦ One cri;cal component of Intel TXT is the Trusted Plaaorm Module (TPM)
(secure key genera;on and authen;cated access to data encrypted by this key)
◦ Evalua;on of the weaknesses of the Intel OpenAZesta;on SDK
◦ Detailed report sent to Intel development engineers
2012
Intel / Unesp Manycore Testing Lab
Jan/2013: NCC/Unesp => 1st Xeon Phi loaned to an academic ins;tu;on in Braz
il
◦ Procedures for installa;on / configura;on of Xeon Phi coprocessor
◦ Development of first hands-on ac;vi;es using Intel Xeon Phi
◦ Technical support to other academic groups
Mar/2013: Coopera;ve program for training & technical support on Xeon Phi
◦ Hardware: two servers and several Xeon Phi cards loaned by Intel
◦ Sohware: access to Intel sohware development tools
◦ One of the first manycore tes;ng labs outside US First results: hands-on ac;vi;es at
◦ INFIERI Summer Schools 2013, 2014 (Oxford, Paris)
◦ Intel Sohware Conference 2013 (NCC/Unesp and COPPE/RJ)
◦ SBAC/PAD, ERAD-SP, ERAD-RS Parallel Programming Marathons
2013
Intel Parallel Computing Center @ Unesp
NCC/Unesp joins the world’s elite Intel PCC program
◦ A dis;nc;on currently held by only selected ins;tu;ons worldwide
◦ Explore paraleliza;on / vectoriza;on on new Intel manycore architectures
◦ R&D efforts to adapt HEP sohware tools (code moderniza0on) Main target: Paralleliza0on of Geant (Geometry And Tracking)
◦ Essen;al simula;on plaaorm for High Energy Physics (HEP)
◦ Need to explore the new hardware srchitectures and sohware dev. tools
◦ Requires long calcula;on ;mes
◦ Broad impact (beyond HEP): rad-hard electronics, medical applica;ons
◦ Work plan associated with CERN and Fermilab development teams Goals
◦ Contribute to the development of GeantV, the next genera;on of Geant, which will include massive parallelism na;vely
◦ Test vector-coprocessor prototypes in hybrid parallel systems
◦ Analyze the performance of Geant4 / GeantV on Intel Xeon Phi coprocessors
◦ Evaluate the redesign efforts to adopt the next genera;on of Intel manycore coprocessors
2014
Intel PCC @ Unesp: Program of Work
Intel PCC @ Unesp: GeantV Code Structure
Intel Modern Code Partner @ Unesp
hZps://modern-code.ncc.unesp.br/
hZps://indico.ncc.unesp.br/category/1/
hZps://sohware.intel.com/en-us/modern-code/live-workshops
Tutorials on the exploita;on of mul;threading and vectoriza;on on
mul;/many-core architectures
Training sessions on interna;onal events
◦ INFIERI Summer Schools 2013-2017
◦ IEEE/ACM CCGrid 2016 (Cartagena, Colômbia)
◦ VECPAR 2016 (Porto, Portugal)
◦ Universidad Distrital (Colômbia)
Tutorials and mini courses at Brazilian HPC Regional Schools
◦ ERAD-SP, - RS, -RJ, -NE
◦ WSCAD’16
Workshops at several Brazilians ins;tu;ons
◦ Engineering / Compu;ng Weeks at Unesp and other Universi;es
Workshops at NCC-Unesp, CENAPAD-SP, TOTVS, FATEC-Santos, SENAI- CIMATEC, UFSCar, UFRN, Poli-USP, Ciências Moleculares USP, etc Intel special events
◦ Intel Sohware Days
◦ Advanced training with Intel experts (“training the trainers”)
◦ Intel HPC Developer Conference 2016 (oral presenta;on of a whitepaper)
2015
Intel MCP @ Unesp: Hardware Resources
Intel MCP @ Unesp: Results
Achieved results from May 2015 to July 2017
◦ Workshops delivered: 39
◦ Interna;onal training sessions: 7 (Colombia, Germany, Portugal)
◦ Par;cipants: 1673 (589 followed also hands-on ac;vi;es)
◦ Op;miza;on of ProFrager - LNCC (presented at Intel HPC Dev. Conf. 2016)
New training ac;vity (introduced in 2016) <= seed for the CoE for Machine Learning
◦ Introduc;on to Data Science
◦ Basic theory and hands-on ac;vi;es using Intel DAAL and other tools
◦ hZps://intel-unesp-mcp.github.io/datascience-workshop/
Academic results
◦ 8 academic projects / experiments supported by the server infrastructure loaned by Intel
◦ Publica;ons: 2 posters; 3 whitepapers; 2 book chapters; 2 academic papers (3 more in review process)
Target ini;ally defined for the end of the 2nd year of the project
◦ Total number of training sessions: 40 (> 1.5 sessions per month)
◦ Total number of par;cipants: 1600 (~40 par;cipants per session)
2016
Intel / Unesp CoE for Machine Learning
Purpose
◦ Establish a Center of Excellence to tackle challenging projects related to Machine Learning
◦ Build an interna;onal network of partners interested in collabora;ng and exchanging knowledge in the area
◦ Train new, promising and mo;vated professionals in sohware development / code op;miza;on
Ac;vi;es
◦ R&D, consul;ng services, and delivery of training sessions in Data Science / Machine Learning
◦ Partners: Intel, CERN, Caltech
◦ Create a poraolio of highly qualified young researchers to work at Unesp CSC (2-4 data scien;sts)
◦ Create a poraolio of enterprises looking for ML solu;ons to overcome their problems
◦ AZack problems with scien;fic relevance and/or social and economic impact
◦ Partnership with enterprises to deliver ‘Proof of Concept’ projects => BB, Itaú, SERPRO, Petrobrás
Work already done or in progress
◦ White paper related to the Intel DAAL framework
◦ Detailed document on the deployment of Intel Machine Learning tools
◦ Training workshops: Itaú, BB, SERPRO
◦ Proof of Concept project ongoing with SERPRO
2017
Intel / Unesp CoE for Machine Learning
Topics of a typical training session (2-day workshop):
Sta;s;cal learning algorithms
◦ Linear and logis;c regression
◦ Clustering techniques
Introduc;on to Neural Networks
◦ Biology-inspired computa;on
◦ Ar;ficial Neurons
◦ Back-propaga;on
Synergies between machine learning and High Performance Compu;ng
◦ HPC-op;mized frameworks for Machine Learning
Prac;cal exercises showing how to implement a simple Neural Network
Huawei / Unesp R&D on SDN
Huawei Technologies Co. Ltd.
◦ Leading global ICT provider
◦ Largest telecom equipment manufacturer in the world
◦ Over 170K employees (more than 45% engaged in R&D)
Project Proposal: R&D on Sohware-Defined Networking (SDN) over WAN for Data-Intensive Science
◦ Project started in January 2016
◦ Development of a new SDN Plaaorm for Global Scale Science
◦ Wide-area SDN-based testbed integrated with OpenStack
◦ 3 “islands”: Unesp (Brazil), Caltech (USA), CERN (Europe)
◦ High-end WAN data transfer & SDN experimental system
◦ São Paulo - Miami at 100 Gbps
◦ Demonstra;ons at annual Supercompu;ng Conferences in U.S.
R&D on SDN: Milestones & Funding
Project milestones / deliverables
◦ Development of an open-source SDN Plaaorm
◦ Testbed design & deployment
◦ Deployment of web portal and monitoring tools
◦ Deployment of cloud-based systems (VMs + OVS) on each island
◦ SDON / T-SDN prototype system (next-gen DWDM systems)
◦ Development of control tools for WAN network orchestra;on
Huawei funding: up to 3 years
◦ 20-40% in hardware/sohware each year
◦ 4-6 full-;me fellowships (Master / PhD level)
◦ Extra budget for covering 3rd party services and travel expenses
Wide area SDN-based testbed
The Kytos Project
Kytos: an SDN Plaaorm being developed from the ground up
◦ Event Handler with "Pub&Sub" methods and decorators
◦ High Level Language API to write Network applica;ons (Napps)
◦ Event Driven
◦ Ecosystem with “Plug&Play” Network Applica;ons repository
◦ User friendly with a nice and responsive web UI
◦ 100% Open Source (MIT License)
◦ Always with "keep it simple" paradigm in mind
◦ Designed to be vendor and protocol agnos;c
◦ Highly customizable: Kytos distribu;ons (a.k.a. “flavors”)
Kytos Development Cycle
2017.2 2017.1
AmLight switch RNP/Rio AmLight
Santiago - Chile
Sprace / Unesp
ANSP / Sao Paulo
AmLight switch RNP/Fortaleza AmLight
Miami
SC'17 Denver Caltech booth
2x 40G + 2x 10G
100G (2x) 100G
Century Link
100G DTN - 100G Host 10G
2 x DTN servers
(2x 100G each) UNESP Juniper MX-480
RNP PoP-RJ 10 or 100G
Juniper MX-480 RNP PoP-CE 100G OpenWave
SC'17 Denver (Unesp booth)
10 or 100G 100G
100G
100G 100G
100G
100G
DTN 100G RNP
DTN 100G RNP Juniper MX-480
RNP PoP-SP 10G 100G 4x
100G 10G
Authors: CSC/Unesp, AmLight, RNP & ANSP engineering teams DTN - 100G Host
Data Transfer Nodes (DTNs) 1. LNCC (2Gbps) 2. UFRJ 3. IFPE 4. USP (10Gbps) 5. RNP-RJ 6. INPE 7. UERJ perfSONAR 10G 8 PoPs
100G (Pacific)
100G (Atlantic)
Corsa switch
Corsa switch
Corsa switch
Lessons learned (I)
Project Management for R&D is far from trivial
◦ the “research” part of the project is the main issue: the work is subject to unexpected developments and results
◦ managers have to be capable of con;nuously adjus;ng to new situa;ons
Systems Engineering complements Project Management
◦ an experienced engineer that assumes direct responsibili;es for the development and control of ac;vi;es at a much deeper level compared to a non-technical project manager is key to success
PI and management staff (PM, SE) need to handle a huge amount of non-technical stuff
◦ a permanent interac;on with lawyers and experts from the University Technology Transfer Office (TTO) is mandatory
The most fruiaul partnerships take ;me and effort to bear fruit
◦ Unesp team has been interac;ng with Intel Brazil representa;ves for more than 12 years
Lessons learned (II)
Successes in HEP have always been closely ;ed to advances in instrumenta;on and compu;ng
◦ HEP has a long history of inven;ng detectors and building compu;ng infrastructures to address the science needs
◦ We need to push in both areas for being able to use new technologies and approaches that are transforma;ve
Key compu;ng areas that are increasingly becoming essen;al to HEP:
◦ distributed compu;ng, XaaS, networks, virtualiza;on, GPUs and accelerators, high-speed op;cal links, data analy;cs, machine learning, neural networks, SDN, SDS, SDDC, SD-WAN, code moderniza;on techniques
We have learned that partnerships with private sector can:
◦ help iden;fy opportuni;es for innova;on, technology transfer, and collabora;on with other areas
◦ provide young researchers with opportuni;es to learn compu;ng skills that are marketable for non-academic jobs
◦ provide career paths in HEP for researchers who work at the forefront of computa;on techniques
◦ provide a way to have uncommiZed reserves to enable opportunis;c investments or cover unexpected difficul;es
Ref.: hZps://science.energy.gov/~/media/hep/hepap/pdf/201309/Demarteau_HEPAP.pdf
Unesp CSC Engineering / Development Team
Intel
◦ Julio Amaral
◦ Jefferson F. Coelho
◦ Silvio Stanzani
◦ Jose Ruiz Vargas
Huawei
◦ Ricardo Aguiar
◦ Macartur Carvalho
◦ Vitor Ferreira
◦ Marco Gomes
◦ Diego R. Oliveira
◦ Renan Rodrigo
◦ Carlos Eduardo Santos
◦ Erick Vermot
Unesp
◦ Marcio A. Costa
◦ Sidney T. Santos
◦ Jadir M. da Silva
◦ Allan Szu
Fundunesp
◦ André Cascadan
◦ Raphael M. O. Cobe
◦ Rogério L. Iope
◦ Beraldo C. Leal
◦ Ângelo S. Santos
◦ Artur Baruchi <= Nic.br
Thank you
E. M. GREGORES, R. L. IOPE, S. F. NOVAES
Extra slides
GridUnesp Project
A distributed computa;onal system with widely dispersed compu;ng resources Spin-off of the SPRACE Project
First Campus Grid in La;n-America
◦ Provides scien;fic compu;ng to Unesp
◦ Partnership with US Open Science Grid
◦ First VO outside US
2-;ered architecture
◦ 1 central cluster in São Paulo capital
◦ 256 worker nodes
◦ 6 secondary clusters in other campuses
◦ 16 worker nodes
Unesp Center for Scientific Computing
Hos;ng / Coloca;on
◦ GridUnesp: Central Cluster
◦ 85 TFlops & 420 TB
◦ BR-SP-SPRACE: WLCG Tier-2 Cluster
◦ 15 TFlops & 1,200 TB
◦ Manycore Intel Xeon Phi Cluster
◦ Mirror of Unesp Administra;ve Systems
◦ Ins;tu;onal Repository (Digital Library)
Cer;fica;on Authority @ São Paulo
◦ ANSP Grid CA
Network Infrastructure
◦ Interna;onal Connec;on: 2x 100 Gbps
◦ University Connec;on (UnespNet): 10 Gbps
◦ Experimental Connec;on (Kyatera) : 10 Gbps
Ac;vi;es
◦ Scien;fic Processing (HTC, HPC)
◦ Technical Support
◦ LHC data analysis & processing
◦ High Performance Network
◦ MegaTelecom & Telefonica links
◦ Technical Training
◦ Intel ‘Modern Code’, Intel CoE for ML
◦ Outreach ac;vi;es
◦ Interna;onal MasterClass on HEP
◦ 250+ High School Students/year
◦ Innova;ve Projects with the Private Sector
◦ R&D based on Tax Waiver Federal program
◦ Demonstra;on on HSN @ SC Conferences
Unesp CSC - Quick Facts
BR-SP-SPRACE
First Official WLCG Tier-2 in La;n America
◦ MoU signed in April/2009
◦ FAPESP & CERN
Processing
◦ Physics analysis
◦ MC simula;on
◦ Reconstruc;on
Storage of datasets Network connec;on
◦ 100 Gbps
SPRACE LAN
SPRACE MAN/WAN
Academic Network of São Paulo (ANSP / NAP do Brasil)
40G 40G 10G 10G
40G 40G 10G 10G
Huawei CE 6870
100G
Brocade MLXe To AmLight (4x 10G + 2x 100G)
~ 40 Km WDM link (100 Gbps) Padtec
Flexponder
Padtec Flexponder
UNESP Center for Scientific Computing
optical patch cord
(LC-LC)
100G
Padtec Transponder
100G
Padtec Transponder Padtec
Mux / Demux
Padtec Mux / Demux
10G
Huawei CE 8860
ANSP Grid CA
◦ Grid Cer;ficate Authority for the State of São Paulo
◦ Ini;a;ve of NCC/Unesp
◦ Managed by the Academic Network at São Paulo (ANSP)
Hardware Collocated at NCC
◦ Two Hardware Security Modules (HSM)
Accredita;on:
◦ Approved by
◦ TAGPMA in April 2012
◦ IGTF in August 2012
◦ Included in TACAR (TERENA Academic CA Repository)
◦ October 2012
Intel PCC @ Unesp: GeantV Collaboration
Intel PCC @ Unesp: Presentations
Intel PCC @ Unesp: Joint Publications
Intel MCP @ Unesp: Hardware Resources
Compu;ng resources
◦ Phi01: server loaned by Intel in 2013
◦ Phi02, Phi03: new high-end servers acquired in 2015 (Intel grant)
◦ Phi04: worksta;on with Intel Knights Landing (KNL)
◦ Phi05, Phi08: servers w/ Intel Knights Landing (KNL-F)
◦ Phi09, Phi10: new pair of servers with Xeon Phi Knights Mill (enhanced hardware for Machine Learning)
◦ DevPhi: worksta;on used for internal devel./tes;ng
◦ Intel NUCs: used as auxiliary systems during training sessions
Servers phi01-phi10 will be integrated into a cluster
◦ External link: 10Gb Ethernet
◦ Internal network based on Intel Omni-Path (100Gbps)
Ref.: Tutorial para uso dos nós acelerados por Intel® Xeon® Phi™ no NCC-Unesp hZps://sohware.intel.com/pt-br/ar;cles/tutorial-para-uso-dos-n-s-acelerados- por-intel-xeon-phi-no-ncc-unesp
Intel MCP @ Unesp: Projects / Experiments
Title Ins0tu0on
Op;miza;on of complex numerical modelling applica;ons UERJ (Universidade Estadual do Rio de Janeiro)
Profrager Op;miza;on NCC/UNESP
Op;miza;on of Geant NCC/UNESP
Op;miza;on/Moderniza;on for PDE solvers applied to flow dinamycs (Oil & Gas) LNCC (Laboratório Nacional de Computação Cien‰fica) Accelera;ng Weather Forecast microphysics using Heterogeneous Parallel
Compu;ng UDistrictal (Districtal University - Bogotá, Colombia)
Op;mizing BRAMS for Mul;core Processors INPE (Ins;tuto Nacional de Pesquisas Espaciais) Analysis of Energy Consump;on and Performance Efficiency on the Intel MIC
architecture Unipampa (Universidade Federal do Pampa)
Dataflow Resiliency and Scalability UERJ (Universidade Estadual do Rio de Janeiro) UFRJ (Universidade Federal do Rio de Janeiro)
(A detailed descrip;on of each project/experiment can be found at hZp://modern-code.ncc.unesp.br/?page_id=220)
Center of Excellence for Machine Learning
Ongoing consul;ng services to partners (industry, academia) on the following topics:
◦ Op;miza;on of model training and inference performance
◦ Model Improvement for achieving higher accuracy
◦ Help deploying tools and solu;ons on highly parallel servers (Intel Xeon Phi - Knights Landing)
◦ Tuning Intel-based servers to achieve best performance
What is SDN?
The SDN approach splits the switching func;on between a data plane and a control plane, keeping them on separate devices, contrary to what happen in tradi;onal switches and routers.
The central concept behind SDN is to enable developers and network managers to have the same type of control over network equipment that they have over servers and
storage systems.
SDN Plaaorms act as strategic control points in an SDN network, managing flow control on the switches/routers ‘below’ them and the applica;ons and business logic ‘above’
them, in order to deploy intelligent networks.
The Kytos Project
# kytosd -f _ _ | | | |
| | ___ _| |_ ___ ___
| |/ / | | | __/ _ \/ __|
| <| |_| | || (_) \__ \ |_|\_\\__, |\__\___/|___/
__/ | |___/
Welcome to Kytos SDN Platform!
We are doing a huge effort to make sure that this console
will work fine. But for now is still experimental.
Kytos website.: https://kytos.io/
Documentation.: https://docs.kytos.io/
OF Address....: tcp://0.0.0.0:6633 WEB UI...: http://0.0.0.0:8181/
kytos $>
R&D on SDN: Project evolution
Year One (2016)
◦ ‘In-house’ OF controller development (Kytos)
◦ Acquisi;on of servers and switches for the testbed
◦ Local testbed setup
◦ SC’16: first public demo of Kytos, record of data transfer (100Gbps, mem to mem)
Year Two (2017)
◦ OF controller : interac;on with real-world environment and with network engineers (ANSP, AmLight)
◦ Consolida;on of the remote testbed sites (CERN, Caltech)
◦ R&D on Sohware-Defined Op;cal Networking (SDON) or ‘Transport SDN’ on Huawei CDC-ROADMs
◦ SC’17: Kytos will be shown in produc;on, high-end data transfers (100Gbps, disk to disk)