Transkrip - Episode 20 - Patricia Wahyu Haumahu
Kiki (K): Hello everyone, we are back at Kartini Teknologi. This time it’s the 20th episode and we happen to be recording it in the beginning of the year. It’s been a long time since we didn’t have a new episode, Gal. This time the topic will be something close to Galuh’s world so maybe Galuh will be talking more here. Anyway, our guest today is Patricia Ayu, she is a Commercial Data Analytics Manager in Blibli.com . Hello Ayu!
Ayu (A): Hi Kiki, Galuh. I’m good, thank you.
K: How about WFH so far?
A: WFH is really good, I can multitask with others.
K: When did you start to have your career in data?
A: I obtained my bachelor’s degree in 2011. So it’s been 10 years ago that I started working in the data field. For data science, it’s only been 5 years from 2016. So previously it was still related with data, just not data science.
K: Tell me about the background—where did you study, how did you become interested in technology and data?
A: So I studied at Undip, majoring in statistics. Well in 2011 I graduated. After that I went to Jakarta looking for a job. So at that time most of the statistics graduates were still pretty traditional, mostly worked on a research company, marketing research, or something like that… R&D, or bank. So data science 10 years ago was not often heard in Indonesia. I remember that in 2012, at that time I ended up working in marketing research as a researcher. Then I had the desire to continue my education. So I was busy collecting various kinds information on what to take for my master’s. In 2012 the Harvard Business Review released the article, maybe Galuh knows, that data scientist is the sexiest job in the 20th century. At that time, apparently there was no data science in Indonesia yet. When I first read the article. My friend told me, this data science thing is really interesting, and you want to take a master’s right, are you interested? So I learned the terms like data science, data mining, big data. Only that seems like 8-9 years ago I think it was mostly still just a theory, so there is no clear curriculum like how. And in Indonesia there weren’t many tech companies yet. So when I got a scholarship and continued my master’s degree, in 2014 my friends started resigning and joining tech companies like Traveloka or Gojek. But data scientists as a title, I haven’t heard about it 6-7 years ago. When I returned to Indonesia, I heard that a data scientist was being sought after. Actually, at that time I didn’t really think about becoming a data scientist. But because I think I liked the qualifications they were looking for, I just applied and got a position as a data scientist. That was 5 years ago. Since then I have switched career from being a marketing researcher to a data scientist until now.
K: So you became a data scientist after you graduated from your master’s right?
A: That’s right.
K: What did you study for your master’s?
A: For bachelor’s I took statistics, for master’s I took applied statistics. Though in the university there was already this feel that data science is currently happening. So they have begun to infusing the data science curriculum into his courses. Even though it wasn’t solid yet but we were given the opportunity to take some classes in programming like that. Then there were materials about data science as well. The campus was quite fast to adapt. Even though my major is more about applied statistics, it also intersects with data science.
Galuh (G): So you came from statistics, right. Meanwhile, data science is made of a lot of fields that we have to be good at, for example stats, programming, etc. And as for me since my background is in computer science I had to catch up a lot with statistics. It was taught in class but not much. In your case, apart of statistics, what else did you have to learn and where did you learn them from?
A: What I really needed to catch-up with is the programming part. Back then when I was doing my bachelor’s degree, it was still as simple as SPSS. Then there was SAS. When I took my master’s R and Python skills were in demand. My campus taught R for data science. So lthat I learned right in class. But because I also realized that Python and SQL might be necessary too, I took online classes for them too. When I studied undergraduate and when I was at my first job I didn’t really code so I ended up learning it again later though. Before I had my job I just knew simple SQL so I had to study more complex SQL. Same for Python. So overall, I had to catch up with the technical skills.
G: Ayu is very good at R. When I did an internship I took a Data Science class and Analytics, I don’t know what R is so I asked for her help.
K: Data science had been mentioned a few times before. Maybe you can explain first to friends who are not too familiar with the term, what is data science?
A: From my point of view, data science is actually further development of than basic science, namely mathematics and statistics, with additions like computer science. If we go back to statistics or mathematics, what does that mean? It’s more of a science to understand data, present data, and how we can get insight or interesting conclusions from a data. But for data science, for example, if we look at the data, right, then we draw conclusions, oh this is the point here. Sometimes it just ends there. But data science can also provide a solution. Like oh maybe we can model to predict something from this data. For example I make something that can automate other things, or do something like faster predictions, bigger scale. With data science, we can go further to make use of this data. Like that.
K: So you could actually extract a lot from the data. Maybe it also has to do with the thinking structure of a data scientist, how a data scientist can go digging into that data so you can get interesting, useful insights. What kind of framework of thinking or what kind of character does a data scientist actually need?
A: In my opinion, the first thing is people who like to learn, want to learn, and can study independently. Data science is a new science. Maybe in the future there will be new knowledge in collaboration with several other sciences. In data science it doesn’t take long until there is a new method, a new approach. So it’s okay not to understand everything, it’s okay to not know everything, but at least they are fast to learn new things, they can quickly adapt. So you can always be up to date. Sometimes I also forget some principles of certain models, sometimes I still double check because there are so many of them. The important thing is we can quickly learn and keep up like that. As for characters, like what Kiki mentioned… you can think logically, can dig up data, trace, connect the dots. That’s very important because usually when I interview, looking for a data scientist, the candidate will be presented with a problem and I’ll ask the candidate to try to elaborate their steps of thinking like how can they connect the dots, how they scope the problem. Because at the end of the day we want the data scientist to solve the problem. Then, be skeptical but not pessimistic. I’m the type who likes questioning the data, for example when someone gives me data I usually try to crosscheck it with other facts or other data. Because in data, there is a term “garbage in garbage out”, if the data is not good, the output is also not good. Data scientists have to be able to question and cross-check it so they can get the source of truth. And data scientists shouldn’t only be skeptical but also can come up with an experiment. Another thing is I do not know whether this is important or not, but I think data scientists should know when to stop. Sometimes I get curious, I dig into this data, I tinker with the model until I get, but in the end I don’t deliver. When we already have deadlines, sometimes the good ones are the ones that are useful or good enough. It doesn’t have to be perfect. If not, we keep looking and tinkering without delivering.
G: I think there are many important points that are super important but rarely discussed. Many tools or algorithms have already been discussed before. But in practice these points are actually very important. If I summarize, we must be able to adapt. And being skeptical is also super important. The same, the last one, is really true, knowing when to stop. Data science itself is different from software engineering, in the way there’s no clear definition of done.
A: True.
G: As for software engineering maybe it’s clear enough that there must be features A, B. For data science such thing doesn’t exist and sometimes we get too excited, thinking that “I can do another experiment” while the time is running out.
A: Yes, the clock is ticking.
G: Sometimes you have to know and you have to stop when it’s good enough. That’s sometimes hard.
K: Can you tell us about the kind of work you’ve done as a data scientist?
A: In data science, I often encounter things related to fraud. Even in all my current and previous companies. Somehow I keep encountering such projects. My boss once said, I’m a curious person so I’m suitable to handle fraud cases. This is a broad example that we often encounter. If you look at e-commerce there are flash sales and customers try to be as fast as possible to get the flash sale. But in some news we saw that some of these customers were using bots so they weren’t real customers. Or for example, a merchant builds a shop in e-commerce, but they never actually sell anything, they just pretend to be the customer, using a discount, then the e-commerce has to reimburse the voucher or discount. There is a lot of such thing in tech companies. As long as there is burn money, there is still a promo, there must be fraud. If we use humans to solve it, suppose there is a suspicious customer. If it’s human, we’ll checK: has the customer ever shopped with us? Or where do these customers live? We try to find clues or information that can help us identify whether this a good or bad customer? It’s easy if there’s only one customer. What if there are one million customers and make transactions within one hour and the scale is really big? There’s no way humans can quickly check that and avoid it. I once worked on building a model that can decide if there is a transaction, the transaction is suspicious or not? Or we can predict this customer is good or not a real customer, a bot. The model can be automated and scaled. So in one minute we can find out, oh this looks like a bad customer, this looks like a good customer.
K: You mentioned and you kept discussing that we have to know when to stop. How do you usually know if the model that you made is already good, is good enough? What are the indicators?
A: Usually, if there is a problem like this and the model can, for example, catch how many % of potential fraudsters. Either 50%, 90%, at least useful enough to reduce the current rate, I think it’s good enough.Even though it’s not 100%,because for 100% we can combine it with humans. For example, with humans we use a model, but if there are doubts we will throw it to humans. This is useful enough to reduce drastically the problem we have, and for the rest we will find a way to get us closer to 100%. I find it sometimes that there are stakeholders who ask, “how come we didn’t catch this? How come there are false positives? Stakeholders often feel like the model is too aggressive. Even though there is actually a way to do reduce that. The key is don’t get too caught up with accuracy, false positives, false negatives numbers or metrics. That could be guidance, however at the end of the day we are also racing against time, right. When we wait until it’s perfect, we will continue to have losses. So for me I think if it’s useful enough, if we can find a way to combine it with humans, it’s a green light.
G: So yeah, for the model to give impact, it must be connected to the business context. So I want to ask you, as someone who doesn’t come from statistics or math. When studying … sometimes when I study like those in class or course, there are statistical materials and theories, right? Lots of formulas. Sometimes I used to struggle to imagine, I know the formula for confidence interval but where do we use it? Or sampling et cetera. When I used to study that it was hard to imagine how to apply it. Can you share one example of a math or statistics concept that we usually find in the real world?
A: In the past, to understand a concept of statistics or mathematics. This is taught by my lecturer when I got my master’s degree. They said to just try to find out the intuition. So usually I add keywords, for example, what is the confidence interval and its intuition? I’m usually looking for the examples. Intuition is more like… I know that this and that cannot go more than a certain percentage. Even until now, I know that I have to check this but for more details I double check with Google. So it doesn’t mean that if we remember once and if we don’t remember it, we failed, the important thing is that we just remember the basics, the intuition, and if we forget we can Google it later. Like if we come across a problem we can think, oh we can use this tool because I remember the basic principle. As for everyday life, there are lots of statistical concepts in everyday life. For example, understanding where the potential bias is from. For example, right now the public is being presented with Covid data right. I don’t know whether it would affect the way people see data or not. But now we see it on Instagram, Twitter, there is a lot of covid data, there are trends, there are tables, there are charts, lots of it. And there was this clamor around the chart that represents misleading percentage of actives. So the chart shows many active cases today are divided with a total of all Covid cases that have existed from the beginning. For example, today there are 10,000, in total our covid cases are almost at 1 million. Let’s say it’s 10%. In the beginning, yes, it’s really big but as time goes on it will keep going down. Such statistical biases if we understand statistics, outliers, misleading data, we will not be deceived with such things. At least our data literacy gets better, and we can draw a more unbiased conclusion. That’s a simple concept in everyday statistics. If you want to make it a lot complex, like optimization, there are data science models, many of them use statistical concepts, which are quite complex. At work, for example… usually statistics are very useful for data exploration, that’s really useful. Because when we build a model we must know the weaknesses and advantages of our data. Weaknesses are as in potential outliers, where is the potential bias.For optimization we usually use derivatives, there’s also basic stats.
K: From my point of view, as a non-data scientist, I feel like there’s a lot of room in terms of hte job that you do. Do you usually determine the effectivity of your model from the start, or how’s the process?
A: For me, usually we start with a problem statement, for example oh there are fraudulent activities from customers or merchants. We want to reduce or decrease or have an improvement by a certain percentage. So usually it starts from there. The bottom line this should be able to solve a problem, at least this can improve the current situation. Usually that is what I translate into numbers or metrics, for example oh if you want reduce it by a percentage, what is the good number? Well from there when we make a model, we will test, we will see oh this model works, but it turns out that it can solve only this much, or it turns out that there are cases that we still miss. We will later check whether this is good enough or perhaps we can make a few iterations, for the first iteration we will solve this much and later we will improve. So there are a lot of steps but it always starts from the problem statement. How big is the scope and how many e.g. cases you want to reduce. Galuh do you have anything to add?
G: Same with me, from the problem statement, determine the problem. Some people say oh we have that problem, but often the problem is not that actually. So it’s not that straightforward. So sometimes we, data scientists and business stakeholders also spend a lot of time to determine the problem. After knowing the problem, we also determine the goal with the business stakeholders. How many fraudulent transactions they want to reduce for example, or with business stakeholders too.
K: You’re in Blibli now as a data analytics manager, which is a managerial position right. How’s your day-to-day like?
A: I lead three people now. Usually every day I would meet with them, coach, coordinate the project. The difference from when I was an individual contributor (IC) is I used to only work on a project, I was the one responsible, I tried to really solve that. I didn’t really think about the stakeholders, the helicopter view. Now I’m doing more strategic thinking, think about the implications, whether we need this or not.
G: Was your early transition from IC to managerial difficult? Any challenges?
A: Well, in the beginning, before moving to managerial position, I didn’t know if I could do this. Because most of the time when I was an IC, I really only worked on my own project. So at first I asked a lot of people, then I read books. At first I tried not to be too involved into details, it was a bit difficult because ICs are used to checking all the details. But now I have to believe that this person can do it, so when I discuss with them I only discuss the outline, the strategic parts. The difficult thing is how to not get too much into details but we can step back to see the relationships between things. Then I also had to up my interpersonal skills, because I now try to understand more about my team, understand their character.s Because if we want to lead projects or want to know what they like so that their work is more enjoyable, I would need to interact with them a lot. And then also leadership skills. So it’s more about the soft skills than the technical skills.
K: You’ve been working on data for a really long time. What’s interesting about working in data for you?
A: I’m happy because … I think this is a bit counterintuitive but I don’t think in the data field, we don’t have to memorize a lot of things. At first I took the statistics because I didn’t like memorizing, I liked looking for patterns and see oh I think I can do this and this. What I like about data is I don’t have to memorize a lot of things but I can get a lot of insights from there. What’s also rewarding is if we can translate from the information into data, and in the end we can make something useful out of it or make people’s lives easier. I’m also a curious person so I love tinkering around with data.
G: I’m not surprised if Ayu said she loves tinkering around with data because she’s a curious person. She’s always the first to know all kinds of gossip so she’s very suitable to work in the field. I’m not surprised.
A: When I grew up I wanted to be a detective. I loved reading detective books. I thought it would be cool to work in CSI, FBI but right now what I do is catching frauds. And data are like the clues, these are like the proofs that I can use to get something.
G: In a way your wish is granted.
A: But it’s neverending, because such crimes always keep happening.
G: In books, one book and it’s done, the culprit is caught. But in real life it’s never ending.
A: Yeah, there are always new ways, methods, we need to think how to gather this data and that data. But I think what’s most important is we can rate the fraud rate. It’s impossible to have it at 0.
K: What kind of challenges do you usually face in data science?
A: The challenge for data science is … usually the data scientists that I meet in general are not accustomed to presenting the results and dealing with stakeholders. Because stakeholders are, in games, the last bosses you have to overcame. You have to convince and you have to be able to persuade them and usually have skills to be able to make a good presentation, determine which one matters a lot or not. Mostly, if people are somewhat unfamiliar with data science or data in general, they would sometimes challenge us with questions that are important but difficult to convince. For example, they saw oh false positive the model was really big, the model was too aggressive to catch suspicious transactions or people. Or, oh this is no good. And in their position sometimes they don’t know what good looks like. But sometimes stakeholders become anxious and thus they are really questioning us. So that’s the challenge, how can we educate the people who will eventually adopt our solutions, adopt this model. People who will eventually say, this project can move forward. That’s the biggest challenge I feel and ever since I was in the managerial position I’m facing it even more often In the past, when I was an IC usually my lead would handle it. Now, I’m the one who has to handle those stakeholders. How to convince them, that’s the difficult part.
K: That means good communication skills are needed to become a data scientist right?
A: Yes, and in the beginning I was on the defensive, just imagine us working on something and it was attacked or not accepted. But if in the end we can show it that “if we don’t have this at all the damage will be bigger” or “it’s not perfect but we can improve it later” that’s good. I shouldn’t have been too defensive as well, maybe they don’t really understand what this model is, they just know that it’s not perfect. So they got anxious. If we can understand their concern, where are they coming from, what is their main concern, we can convince them to accept our work. Because if not, despite our hard work, the project won’t move forward.
G: Yep. But it’s interesting, the challenges are not only technical, for example, if we already have a sophisticated model, with the most sophisticated technology and algorithm. If stakeholders aren’t convinced that we should adopt this, then bye-bye. But on the other side I also understand why they are questioning. After all, they are just doing their job.
A: Yes, because if there is anything wrong they also have to take responsibility, right. Like … “you know this model isn’t good but why did you take it?”. So it’s natural to keep chasing us, to think that this can be improved. But yes, it’s neverending. If we wait until it’s perfect, the damage might continue to get bigger.
G: Yep, so we sometimes have to meet them in the middle.
A: Yes, there must be an “angel” in the middle of it to bridge the two parties.
K: I think such skills are needed in all lines of work right? Like how can we communicate what we do. I think that’s the case in all kinds of jobs.
A: Totally agree. Those are skills that should be possessed by everyone.
K: Yes, I mean no matter how good your work is, but if you can’t communicate that to others, then it’s goodbye. Finally, what message do you want to say to Kartini Teknologi listeners who are interested in becoming data scientist?
A: My first message is maybe you don’t have to be a data scientist. So more often than not I ask people who say to me they want to be a data scientist, I usually ask bacK: why do you want to be data scientist? Is it because of the title, is it because of the job, is it because, for example the salary? Or is it because it is currently highly demanded? Maybe first find out what the motivation is. Because if we know what the motivation is, we can decide whether to become a data scientist or not. Don’t get trapped by the title of a data scientist. I never thought I wanted to become a data scientist. I just like working on data, I like to read a little more advanced about data, and it just so happens that this data scientist’s description fits what I want. And there could be people who like working on data but want to engage with the business side more. So these people could be in business intelligence or be a data analyst. So not necessarily a data scientist. So you need to explore the motivation. Like in my team they didn’t want to be a data scientist at first, they just like working on data. There are two people from physics, one person from math, they like doing data, modeling, and the job description of a data scientist happens to match with them. Unfortunately in my team all of them are guys, I rarely meet a female data scientist. Even though when I majored in statistics there were many women, but somehow somewhere along the way in tech I don’t meet a lot of them.
G: In our team back then though all of us are women.
A: But entirely there aren’t many right?
G: Yeah just our team.
A: Also, learn a lot. My team took a data science acceleration class, now there are many institutions that open a bootcamp or data science training classes. However I found that during interviews it seems that some people were too quick to learn them so they didn’t get the chance to understand the basic or the intuition. Like if we’re teaching someone to cook, the cooking techniques, sometimes we need to know what kind of ingredients we should use to make this, but it’s like they gloss over these… so when they are tested, sometimes they only know the surface but not very in-depth. It will be shown when they’re asked to solve a problem. So my suggestion is, if you want to take classes or learn by yourself which is totally okay—you can find materials in the Internet or classes—make sure that you also understand the essence of it. Because there are some basic stats, math principles that we still have to understand. So it’s not like we feed some data into some random model and that’s it, we have to understand it. And also I’m not sure if this tips is related or not, but I told my juniors when I spoke at a seminar at my almamater, you need to improve you English if you can. Because there are many good data science materials but they’re not in Indonesian. Like all online courses back then were only in English at the start right. So it’s good if you can improve your English.
K: Okay, maybe because I don’t really tinker around with data science, I think I really learned a lot about data science from this chat. Like from the characteristics, anything that needs to be learned, and what’s most important in my opinion, what framework and character do you need to be a data scientist. Because I think technical can be learned right, but the way of thinking is the most basic thing foundation for us to learn anything. So I think it’s important really. So that’s it. If you’re also curious like Ayu, if you like data, statistics, connecting the dots, don’t be afraid to learn data science. You can start from anywhere, join a course, sign up for a bootcamp. But if you do sign up for a bootcamp don’t only rely on it but also hone your intuition. And if you’re interested, Ayu herself said there are still few women in the field. So there are a lot of opportunities.
A: I really hope that there will be more female data scientists. Because there are still a lot of demands for data scientists. My team is still looking, I often see that there are still many people who are looking for data scientists but the supply is… the Indonesian curriculum is still slowly going in that direction. I really hope many women are also interested in data science.
K: Especially that you can work remotely too. Sometimes women have to do other things as well especially if you already started a family and thus flexibility when working becomes a very important point.
A: True.
K: And now I’m campaigning remote work again. Anyway, thank you Ayu for chatting with us, hopefully many listeners also get insights from our discussion. Thank you!
A: Thank you!