Sunday, May 31, 2020

Statistically Significant

SASInnovativeDataMiningAward2018 4

An academic perspective on the growing field of data science


Data science students from HKU compete annually for the SAS Innovative Data Mining Award, which recognizes projects that address real-world problems. Pictured are the winners from 2018.


By Nayantara Bhat | Data science has been around for longer than one might think, but it only began to pique public interest a few years ago. With greater technological integration into daily business functions and the rapid growth of startup tech ecosystems across the globe, data science has evolved from a little-known combination of data analytics and computer science to something of a necessity for startups and corporates alike.


In part, the growth of data science has been driven by success stories, with big companies implementing data management solutions to improve their KPIs. However, a big reason for the rise is the sheer volume of data the world is dealing with–to a point almost beyond control.


According to SINTEF, more data has been created in the past five years alone than in the entirety of human history before that. Startups can undoubtedly benefit from exploiting this data to unearth new opportunities in their current markets or discover new markets entirely. Before that’s possible, experts believe that both technology and regulations must evolve.


By-products of data generation have been the creation of new technologies to process and analyze information, unearthing patterns, and making predictions in nanoseconds. On the flip side, data privacy concerns loom over the field. Regulators are rushing to keep up with the pace of technological development in the face of incidents like the Cambridge Analytica scandal, which raised questions of election fraud in the United States, and the Google+ and Cathay Pacific hacks in October 2018 that compromised user data.


Demand for data scientists has pushed universities to introduce data science programs. One such example is the recently-introduced Master in Data Science at The University of Hong Kong (HKU).  Jumpstart speaks to two professors heading the program – program coordinator Dr Y.K. Chung (YKC) and program director Professor W.K. Li (WKL) – about the growing field of big data. What can this field offer to individuals, startups, and corporations, where can it take us, and what will it cost to get us there?


 Data science lessons are held on HKU’s main campus. Photos courtesy of HKU. Data science lessons are held on HKU’s main campus. Photos courtesy of HKU.


What led to the creation of the Master in Data Science program at HKU?


YKC: As you already know, big data, data science, and data analytics are in high demand from students and the workforce. Data scientists are trained as statisticians and computer scientists, so there’s a lot they can offer.


WKL: [The Department of Statistics and Actuarial Science] and the Computer Science Department are leading departments in Hong Kong and the region, and have been for many years, so it’s appropriate for us to launch this new Master of Data Science degree together to meet the needs and demands of society.


What kind of jobs are your graduates likely to get?


YKC: They have many opportunities. For example, companies have large datasets and need to mine these datasets to discover relationships in customer behavior. Companies also often manage geographic information systems (GIS) and global position systems (GPS) in their businesses, which are all needs our graduates can fill.


WKL: Many companies may not traditionally be very quantitative, but after hearing success stories in the application of data science, they create new positions for data scientists.


What interests or excites you about data science?


YKC: I think what the Chief Economist of Google [Hal Varian] said is true. [Quote: “I keep saying the sexy job in the next ten years will be statisticians. People think I’m joking, but who would’ve guessed that computer engineers would’ve been the sexy job of the 1990s?”]


WKL: Data science is now referred to as the fourth paradigm of science, which will open many doors for statistical research. There are many significant data challenges, and we need new statistical methodologies to address it. We’re dealing with unstructured data, high volumes of data, high dimensional data, etc.


YKC: Also, there are non-traditional types of data, such as images. They’re not numerical, so we have to invent new methods and apply new technologies.

 Data science lessons are held on HKU’s main campus. Photos courtesy of HKU. Data science lessons are held on HKU’s main campus. Photos courtesy of HKU.


What are some tangible ways data science can benefit people and the economy?


WKL: You use social media, right? Then you know: you’re surfing the web looking for products or service. After you’re done, you go on Facebook or Twitter, and you’ll get recommendations. That’s one example. Also, some companies offer indoor navigation services, which uses a Wifi signal to track your behavior as you window shop or walk around a space. They then provide this data to shops or businesses to help them optimize customer experience.


Soon, we’ll be diagnosed by robots when we go to the doctor. Google DeepMind, which created AlphaGo, reported a successful case on the diagnosis of eye disease [DeepMind’s AI system accurately can interpret eye scans and recommend treatment.] The machine and AI algorithm can already perform better than human doctors, and the results were published in Nature Medicine.


YKC: Our department has started a big data analysis project with the Hospital Authority.


WKL: It’s about analyzing the risk of acute eczema sores based on 20-second scans of the patient. We can identify 95% of large occlusion cases.


What are the main obstacles preventing it from being more widely used?


WKL: Right now, many people think that data is money. It’s a valuable resource, so companies will try to keep it to themselves, which could be an obstacle. Also, privacy.


YKC: Governments are making slow progress in sharing their data to the public. We can go onto government websites and download data, but some are published in PDF files, which is not user-friendly. This is the situation for most of the governments.


Is privacy a big concern?


WKL: Take the example of the Hospital Authority project. Of course, for the researcher, the more data, the better. But if you’re a patient, you will be very sensitive about whether your personal information is being leaked.


You have to find the right balance between how much information you want while protecting the privacy of your patients. There’s no easy answer. It also depends on how extensive the database is. If you have a small database, then there’s the danger that personal information will be leaked. If you have a large dataset, then the probability of that is much lower.


YKC: I think it’s changing because we now have the General Data Protection Regulation (GDPR) in Europe, and people are more aware of information security issues.


WKL: As time goes by, there will be better regulations, and companies will care more about personal data. I also think there will be more incidents and scandals, but it’s through these incidents that things will improve.


YKC: The law needs to keep up with the pace of the new technology. Another obstacle is the shortage of data scientists. This is still a very new field.


How many people are enrolled in your master’s program?


YKC: Last year, we admitted 88 students.


WKL: Full-time and part-time with over 1200 applicants.


YKC: More people are more aware of opportunities in this field. We want to stay ahead of [other universities] in launching this program.


WKL: When we screen applicants, we are looking for people who have a solid background in mathematics, statistics, and computer science. Few are competent in all areas, so it’s quite rigorous.


What are some changes we can expect going forward?


WKL: I definitely see many more applications of data science to guide you through all life decisions. In another HKU department, they are developing an app to guide people who have allergies on where to go and where not to go, as certain routes are less polluted than others. The potential for various applications is endless. On the technology side, there will be faster, more efficient methods of making computations and arriving at solutions. These capacities double every two years.


YKC: The field will mature, so there will be more career paths.


Nayantara is Jumpstart’s Editorial Associate.

Email This Post Email This Post