A Data Scientist answered an interview question…
I had an hour to kill before my meeting, so I decided to stop by the university student center. That random visit turned out to be life changing as I walked in on an event about job opportunities for students. I would have never imagined that four years later I would be working at the FDA. Here is my story…
I am a very introverted person. I would rather leave the grocery store empty handed than ask for help. With this nature, building a professional network in a new country seemed daunting at first. When I first moved to the USA in 2020, like many immigrants I faced the usual work authorization restrictions. Getting a U.S. degree seemed like the logical first step. This just makes your professional journey ten times easier. While a PhD was an option, I was not ready to commit to five years in a new environment. It seemed overwhelming so I opted for a master’s program instead.
Most of my decisions have been very “spur of the moment”. I did what I felt was right at the time. I was ready to give a try to new opportunities, even if they were not planned. Having said that, I have always been fascinated with biomedical data and I am excited to learn how advancements in technology improve the healthcare system. (I am the person who watches Grey’s Anatomy and Big Bang Theory on repeat).
When I was doing my master’s program in data science, we had to do end-to-end projects for all the courses. From creating a problem statement, finding data, building models to sometimes deploying them. I did a total of seven projects in two years. I found that using those same old Kaggle datasets was not helping me learn and stand out in my class, because it turned out that everyone else was eventually picking similar data which is precleaned and easy to model with. I was just looking for some interesting data where I can do some fancy analysis. My husband suggested that if I really wanted a challenge, I should use a PDB database (He works in the computational drug discovery field). PDB is an open-source database for the three-dimensional structural data of large biological molecules like proteins. We both discussed and designed a project to predict the resolution of 3D structure of protein based on the X-ray crystallography data.
I proposed this topic and mentioned that I am not sure if the models would perform well or if I would even get to the part of modeling due to the unknown nature of the data. What worked in my favor was that my Master of Professional Studies program was taught by practicing data scientists who worked in the industry and taught part-time at our university. So, they had an idea that most of the real-world projects are challenging and that you may not always get perfect results.
While working on the project I was very frustrated. Everyone was getting 98% accuracy, 0.95 F1 score and I was still doing the data cleaning work! But let me tell you, being patient with your data cleaning and preprocessing is the secret key all great data scientists have. Finally, I finished my project on time with 85% accuracy. After this I went all out on finding these unique datasets which no one had even heard of.
Initially, I used to do a literature review and come up with problems which can be solved by Machine Learning solutions, and then look for suitable data. However, I realized that no matter how good the solution sounds, it is very difficult to actually implement it within a limited time. Also, it gets difficult to find open source, easily accessible, biomedical or healthcare datasets due to privacy concerns. Then, I started looking for resources for biomedical data which are available for research and built a problem statement based on the data I found. With this approach, I managed to do projects like Prediction of Drug Binding Affinity of Protein Using Spark ML, Predicting the resolution of protein structures and Classification of Severity of Drug Adverse Effects.
During my master’s degree I also worked as a full time research assistant on a Weather Forecasting project in collaboration with NASA. Most of my classes were online (the COVID era) so I rarely went to the university. One day I was at university for my Research Assistant meeting with a professor and I had an hour to kill. So, I just went to the student center event, asking about job opportunities for immigrant students. I saw one of my classmates on the panel who I knew worked in FDA. After speaking with him, learned that if you are living in the USA for more than 3 years, you are eligible to apply for an ORISE fellowship. Through these fellowships, you get to work with US federal agencies on research projects.
I was browsing Oak Ridge Institute for Science and Education (ORISE) fellowship application portal Zintellect and I found one open position which was based on the FAERS dataset, the same data which I used for my capstone project of classifying drug adverse effects. My current supervisor was looking to characterize the FAERS drug adverse effect network using R. I didn’t have much experience in R or Network Analysis using statistics methods, yet I applied for the position. During the interview I proposed that it would be interesting to analyze this network with a Machine Learning approach and I shared some resources regarding Graph Neural Networks (GNN). After a few weeks, my supervisor got back to me with a new fellowship position to try GNN on the FAERS dataset.
Being from a computer science background, I landed this position purely based on my interest in this field. This current role has given me so much domain knowledge about biomedical, clinical and pharmaceutical data analysis which is very difficult to acquire if you are not from the public health, epidemiology or biostatistics domains.
These last 5 years of my life have been very exciting. There is no alternative to reaching out or networking if you want to grow. In-person social interactions make me nervous, so I go out and beyond to attend all the virtual events. I reach out to people through LinkedIn DMs (That’s how I got to writing this article). I try to find study groups and voluntary working groups to network. Similarly, people started to reach out to me asking about my professional journey, and I’ve begun creating technical content on LinkedIn and blogs on Medium. I find this is the best way to showcase my knowledge and people with similar interests can reach out to me. I have been very lucky with the people and mentors I got along the way. All these things truly shaped where I am right now. Next, I want to learn about Statistics in Public Health. It might seem that I am going in the opposite direction of the learning path, but I feel it is necessary to have a strong foundation if I want to work as a Clinical Data Scientist.
If you are hesitating to send that LinkedIn DM or thinking of switching your industry, take that first step. Don’t let nervousness hold you back. I know I sound like one of those motivational speakers, but it worked for me, and it will work out for you as well. Whether you are interested in data science, healthcare, or pharmaceutical fields, connect with me on LinkedIn. I would love to hear about your journey and share insights that could help you along the way.