专题论坛

边缘服务专题论坛

发布时间： 2022.08.03

时间：2022年8月19日 14:00-17:00

地点：江苏南京

论坛联合主席：

徐梦炜 北京邮电大学计算机学院特聘副研究员（人才引进），博士生导师。于北京大学获得本科与博士学位，入选中国科协青年人才托举工程，微软亚洲研究院“铸星计划”访问学者，普渡大学访问学者，ACM SIGMobile China 2021优博。主要研究领域为移动/边缘计算和系统软件，近些年专注于低功耗芯片与终端设备在边缘场景下的调度与优化工作，相关成果发表于ACM MobiCom/MobiSys/UbiComp/USENIX ATC/IEEE TMC等国际顶级会议期刊，受邀担任UbiComp/ICWS/IEEE TMC等会议期刊审稿人，主持国家自然科学基金、科技部重点研发项目课题、北京市科技新星、百度松果基金等多个项目。

蔡志成 南京理工大学副教授、硕士生导师；澳大利亚墨尔本大学访问学者；江苏省科技副总；中国计算机学会服务计算、分布式计算与系统、协同计算专业委员会委员；于东南大学获得学士和博士学位，并获得江苏省计算机学会优博；主持国家自然科学基金面上、青年和江苏省自然科学基金青年基金项目等；担任CCGrid 2021和2022中国数字服务大会等重要会议的宣传委员会联合主席。主要研究方向是云计算、边缘计算、服务计算等环境下的资源调度与优化、基于深度强化学习、控制理论和排队系统的软件智能化运维与管理、以及大型批处理和容器开源系统研发等。在TPDS、TC、TCC、TSC、FGCS、TASE、CCF THPC、CCPE、HPCC、ICSOC、ICPADS、ISPA、ICA3PP、SMC、CLOUD、CBD、CASE和ChineseCSCW等国际与国内高水平期刊和会议，以第一或通讯作者身份发表论文20余篇。

论坛简介：

本次大会设有“边缘服务计算”专题论坛（Special Track），旨在分享“服务计算”、“边缘计算”和“云计算”等交叉领域的最新研究成果。探讨面向各种边缘服务（例如：微服务系统、服务工作流、函数计算等）的多类资源（例如虚拟机、容器）部署、迁移、动态供应、任务调度与性能优化方法；面向边缘计算场景的服务发现与推荐、服务编排、服务组合与执行、服务智能化协同方法；以及其他云边端协同的服务计算理论与方法。促进“边缘服务计算”相关理论和技术的交叉融合与发展。

报告1：边缘云技术创新让“云”无处不在

报告人：付哲，阿里巴巴云计算，博士后研究员/技术专家

报告简介：在5G和万物互联的背景下，网络设备数量迅速增加，流量带宽压力持续增大。边缘计算将计算、存储、网络、安全等能力延伸至用户边缘，可以提供高覆盖度、低延时、少带宽、本地化和安全的计算服务。边缘计算将原有云计算中心的部分或者全部计算任务迁移到用户边缘执行，与云计算相辅相成，二者有机结合，为万物互联时代的数据计算和处理提供较完善的软硬件支撑平台。阿里云是业界最早探索边缘云计算的公司之一，本次报告将向大家介绍阿里云在边缘云计算领域的技术演进路线与商业场景实践。

报告2：Endpoint Communication Contention-Aware Cloud Workflow Scheduling

报告人：吴全旺，副研究员，重庆大学

报告简介：Existing cloud workflow scheduling algorithms are grounded on an idealized target platform model where virtual machines are fully connected, and all communications can be performed concurrently. A significant aspect neglected by them is endpoint communication contention when executing workflows, which has a large impact on workflow makespan. This work investigates how to incorporate contention awareness into cloud workflow scheduling and proposes a new practical scheduling model.

报告3：Romou: Rapidly Generate High-Performance Tensor Kernels for Mobile GPUs

报告人：Ting Cao, Senior Research Manager, MSRA

报告简介：Mobile GPU, as a ubiquitous and powerful accelerator, plays an important role in accelerating on-device DNN (Deep Neural Network) inference. The frequent-upgrade and diversity of mobile GPUs require automatic kernel generation to empower fast DNN deployment. However, current generated kernels have poor performance.

The goal of this paper is to rapidly generate high-performance kernels for diverse mobile GPUs. The major challenges are (1) it is unclear about what is the optimal kernel due to the lack of hardware knowledge; (2) how to rapidly generate it from a large space of candidates. For the first challenge, we propose a cross-platform profiling tool, the first to disclose and quantify mobile GPU architecture. Directed by the results, we propose a mobile-GPU-specific kernel compiler Romou. It supports the unique hardware feature in kernel implementation, and prunes inefficient ones against hardware resources. Romou can thus rapidly generate high-performance kernels.

Compared to the state-of-the-art generated kernels, it achieves up-to 14.7× speedup on average for convolution. The performance is even up-to 1.2× faster on average than the state-of-the-art hand-optimized implementation.

报告4：Failure-aware Elastic Cloud Workflow Scheduling

报告人：姚光顺教授滁州学院计算机与信息工程学院副院长

报告简介：With an increasing complexity and functionality in cloud data centers, fault tolerance becomes an essential requirement for tasks executed in clouds, especially for workflows with task precedences. Hosts and network devices are the main physical components in a cloud data center. The PB (Primary-Backup) model is a desirable approach to fault tolerance. Many PB-based workflow scheduling algorithms have been proposed for host faults. However, only a few studies focus on cloud workflow scheduling considering network device faults. This paper analyzes the fault-tolerant properties for scheduling dependent tasks and migrating VMs based on the PB model, considering both host and network device faults in a cloud data center. A failure-aware elastic cloud workflow scheduling algorithm is designed for both host and network device fault tolerance. Additionally, an elastic resource provisioning mechanism is proposed and incorporated into the proposed algorithm to improve resource utilization. Performance evaluations on both randomly generated and real-world workflows show that the proposal effectively improves resource utilization while guaranteeing fault tolerance.

微信扫一扫：分享

边缘服务专题论坛