Yuqing Zhang Presented at the 2019 Symposium of Data Science and Statistics

PhD candidate Yuqing Zhang attended the 2019 Symposium of Data Science and Statistics from May 29 – June 1, 2019 in Bellevue, Washington. Where she presented her poster “Batch Effect Adjustment via Ensemble Learning”.

Some quotes from Yuqing on her experience at the symposium: "The Symposium of Data Science and Statistics (SDSS) is an annual conference sponsored by American Statistical Association (ASA), where data scientists from both academia and industry gather to discuss advancements in machine learning, data visualization techniques, and the application of data science in solving real-world problems. eThis year’s conference featured presenters from RStudio, Google, H2O.ai, SAS, etc. One of the most interesting talks I attended was from Dr. Erin LeDell, chief machine learning scientist from H2O.ai. She introduced their AutoML module, which performs automated training and evaluation for machine learning algorithms. AutoML supports a typical workflow of training a machine learning model, from data processing to the validation of predictive performance, and it also comes with an R interface. Since we often use the same workflow in biomarker development at CBM, the AutoML module could be a very useful tool for future research. In addition, I had a chance to connect with engineers from RStudio. I discussed how to handle large datasets within R Shiny apps with Javier Luraschi and Kevin Kuo, who are engineers working on the R API of Spark. There has been plenty of efforts in developing R Shiny apps in my lab. Our discussion inspired me to potentially incorporate cloud-based data storage into our software in development.

I presented my research as a poster titled “batch effect adjustment via ensemble learning”, and received constructive feedback. For example, my current ensemble learning framework for batch correction explores stacking weights, and I received inquiries of using boosting strategies. Other interesting feedback was to leverage the estimates of batch effects in weighting the single learners. I discussed this feedback with my advisors after I returned from the conference, and we decided to include them in our manuscript in preparation. Attending this conference has benefited my current project. It was also a wonderful opportunity to learn about advancements in data science, and develop a professional network."



24 Cummington Mall, Boston, MA 02215