湾区同学技术沙龙

Large-scale data science and engineering with Spark (Reynold Xin)

1 March 2015

1:30PM ~ 4:00PM, 03/01/2015, Sunday

Registration

Event Info

  • Language: Chinese
  • Time: 1:30PM ~ 4:00PM, 03/01/2015, Sunday
  • Location: 1601 McCarthy Boulevard, Milpitas, CA 95035 (TIPark Silicon Valley)

Agenda

  • 1:30pm – 2:00pm: Reception and social time
  • 2:00pm – 3:30pm: Talk and QA
  • 3:30pm – 4:00pm: offline networking

Abstract

Apache Spark has taken Big Data by storm, subsuming Hadoop MapReduce. In this talk, Reynold Xin from Databricks will give a quick introduction to Spark, with a focus on the latest development activities aimed at making large-scale data science and engineering more approachable. In particular, the following will be discussed:

  • Spark's basic programming API
  • the new DataFrame API for big data
  • machine learning pipeline integration
  • Databricks Cloud

Speaker’ bio

Reynold Xin is a committer and PMC member on Apache Spark. He is also a co-founder of Databricks. He has been instrumental in the development of Spark as the maintainer of many components. He recently led an effort to scale up Spark and set a new world record in 100 TB sorting (Daytona Gray). Before Databricks, he was pursuing a PhD at UC Berkeley AMPLab. He wrote the two highest cited papers in SIGMOD 2011 and SIGMOD 2013.

主办

协办

  • TIPark Silicon Valley(感谢TIPark赞助场地)
  • 南京大学硅谷校友会
  • 硅谷清华联网
  • 中国科技大学校友会创业俱乐部
  • 浙江大学校友会海纳创新创业俱乐部
  • 北京大学北加州校友会
  • 武汉大学北加州校友会
  • 东南大学硅谷校友会

Related articles