Table of Contents
- About me
- Machine Learning Projects
- Reserach Projects and Publications
- Blogs
- Class Projects/Notes
- Courseworks
About me
I am a data scientist with a board range of technical experience in data science and material science. I am interested in applications of machine learning (ML) in anomaly detection, data inference, pattern identification, data driven decision-making, and model deployment.
My current work focus on detecting anomaly behaviors in semiconductor equipment. The anomaly detection system can help with yield improvement and preventive matainance. I am also a working with UN-ESCAP, Thailand to quantify the impact of agricultural burning and traffic on air pollution problems and health outcomes in ASEAN cities. The finding can help shape data-driven policy change.
I was the Principal Investigator of a scanning probe microscopy group at Suranaree University of Technology, Thailand. My group focused on how nanoscale material properties give rise to the rich phenomena observed in mixed phase systems. We made extensive use of automated image and data processing techniques to analyze our large data sets, and to reveal the underlying physical mechanisms of the phenomena. Some topics investigated by my group include - the ferroelectric and diode behavior in Sm-doped BiFeO3, resistive switching behavior in ZnO nanowires, and the structure and biomechanics of Spirulina and avian sperm. We also collaborated with Western Digital Thailand on the development of novel materials for the memory storage industry.
My PhD research focused on the development and applications of a scanning near-field microwave impedance microscope, a novel scanning probe technique capable of measuring the local dielectric constant and conductivity of materials. This technique allows us to study low temperature physics involving metal-to-insulator transitions such as colossal magnetoresistance and the quantum hall edge state. Applications at room temperature have also been shown in semiconductor materials, nanowires, and graphene. This research lies at the intersection of microwave engineering, image analysis, microfabrication, MEMS technology, material sciences, and condensed matter physics. I was involved with the commercialization process of this microscopy technique, through a spin-off company PrimeNano Inc.
Machine Learning Projects
Anomaly Detection
- Design, build, runtime-optimize, and test anomaly detection systems for microfabrication equipment. Explore different algorithmic approaches such as clustering(scikit-learn), neural networks(pytorch and Keras) and Markov chain Monte Carlo(pymc3).
- Rapid-prototype in Python and depolyment in Java.
- Communicate with customers to understand their needs and formulate testing criteria and benchmarks. Provide trade-off table summary of different approaches.
- Exploratory data analysis to understand relationships between sensors in the database and identify critical sensors for feature engineering.
Analysis of Air Pollution Data in ASEAN Cities
- git repository and git repository
- Model Air Pollution in Southeast Asia (ASEAN) for the UN, UN-ESCAP, Thailand.Enabled effective environmental policy change by identifying main sources of air pollution in ASEAN cities. Built a model to predict hourly pollution level, isolated main pollutant sources using model’s feature of importance, predicted scenarios that would help ease air pollution. Findings shared among cities governments and agricultural companies. UN’s blog.
- Automated web scrapping, data visualizationnotebook, feature engineered and cloud-based hyperparameter optimizationnotebook, and inferencenotebook pipeline for air pollution modeling that can be scaled to any ASEAN city (Requests, wget, selenium, BeautifulSoup, matplot.pyplotlib, Bokeh, geopanda, scipy, TPOT).
- Worked with various data types such as time-series air pollution data, geospatial landuse data, satellite burning hotspot and traffic data.
- Made extensive use of time-series feature engineering and cleaning: included influence time lag features, distance weighed pollution sources, extracted features from weather patterns.
- Cleaned and merged survey landuse with satellite hotspots data to identify agricultural burning area and the crop types. areanotebook.
- Utilized social economic and health data to identify between diseases caused by high PM2.5 pollutant using correlation, z-test, and linear regressionnotebook.
- Built proof-of-concept an SQLite database of the air pollution data notebook
- Blog: Scraping Air Pollution Data from Thailand EPA. Published in Medium.com
- Blog: Identifying the Sources of Winter Air Pollution in Bangkok Part I. Published in Medium.com
- Blog: Identifying the Sources of Winter Air Pollution in Bangkok Part II. Published in Medium.com
Predicting US Monthly Electricity Consumption
project page
- Predicted monthly electricity consumption in the US on a state by state basis using economic and weather data, with more granularity than the short-term energy outlook from the Energy Information Administration (EIA). Identified major contributing factors, which not only help with infrastructure planning and economical projections, but also estimates of electricity sales revenue and the deployment of more energy efficient products.
- Project repository in GitHub.
- Obtained data by downloading and using web API (selenium library).
- Performed extensive feature engineering and exploratory data analysis, cross checking the accuracy and consistency of the data. Cleaned up missing data (pandas, numpy, seaborn). Visualize hierarchical relationship between features (scipy library)
- Performed feature selection and built machine learning models for the three sectors, analyzed model performance by state (scikit-learn, TPOT libaries). Use random forest regressor and feature of importance for feature selection. Achieve 0.99 R-squared for the test set (EIA’s prediction is 0.9999). Build a prediction pipeline and visualization from saved models (joblib library). Plotted interactive time-series prediction in Bokeh.
- Deployed model prediction in Heroku Repository
Health Data
Automatic liver patients identification from their blood test data repository
- Exploratory data analysis and feature engineering using Python (pandas, numpy, matplot. pyplotlib)
- Feature selection from feature of importance. Removed redundant features using hierarchical clustering (scipy library)
- Applied different classification models. Compared performance of random forest, logistic regression, and neural network
- Achieved 78% accuracy on the validation set(baseline = 72%) , loss = 0.47(baseline 0.69)
Image Classifications
Identified oil palm plantations from satellite images repository
- Predicted how likely a satellite image contains oil palm plantations
- Libraries: fastai, pytorch libraries, opencv
- Worked with Imbalance class: only 6% of the images belong to a second class (with oil palm). Resolved by creating augmented images from the training set (OpenCV library)
- Used transferred learning from pretrained models
- Explored different CNN architectures: resnet34, resnext201, dn201
- Achieved 0.997 score for the Kaggle hold out dataset (leader board score is 0.999), which is approximately #113 on the leaderboard.
EiffelTower vs WatArun repository
- Mined images from Google Images
- Performed dataloader and image augmentation in GPU (fastai, pytorch libraries)
- Transferred learning using pretrained resnet weight
- Used learning rate annealing with restart
- Achieved 93% accuracy
Image Segmentation
Nuclei segmentation of stained tissue images of tumor patients in MICCAI2018 challenge repository
- Trained Unet architecture to perform image segmentation (fast.ai library). Used various image processing libraries (OpenCV, PIL, imageio, skimage)
- Worked with small dataset. There were only 30 training images. Each has 1000 x 1000 pixels. Augmented training set by generating 125x125 cropped images and mask files
- Achieved 90% accuracy on the validation set
Database:
- Build a MongoDB Databasenotebook: Create a MongoDB database. Query documents and subdocuments, Counting Documents, Survey Distinct Values with filters, element match operator, Filter with Regular Expressions, Indexes in MongoDB, aggregation
- Build a SQLite Database of Air Pollution Datanotebook: Setup an SQLite server and populate tables, Insert, delete colums from the tables, Query the table and output as dataframe, Filter query using WHERE, AND, OR, IS NULL, LIKE, Aggregation using GROUP BY, ORDER BY, Create index, Joins
Blogs
Deciphering black box air pollution data in Thailand UN-ESCAPE-blog.
Scraping Air Pollution Data from Thailand EPA. Published in Medium.com
Identifying the Sources of Winter Air Pollution in Bangkok Part I. Published in Medium.com
Identifying the Sources of Winter Air Pollution in Bangkok Part II. Published in Medium.com
Reserach Projects and Publications
Data Science and Data Analysis
- Air Pollution:Interpolation Methods to Produce a Spatially Continuous Map of PM2.5 and Its Contribution to Policy Making. S. Han, D. Stratoulias, W. Kundhikanjana (Leader), E-poster, 5th Global Alliance of Disaster Research Institutes Summit.
Study Novel Materials
-
Memory Device: Identified defect-dominated conduction behavior in resistive switching ZnO nanowire devices that can improve the growth process. Used non-linear fitting in MATLAB. O. Srikimkaew, W. Kundhikanjana (Leader), et al., Journal of Electronics material (2019) pdf
-
Ferroelectric Material: Studied effectiveness of Sm substitution in improving ferroelectric property, and reducing conductivity in bismuth ferrite ceramics. Identified an optimal doping level for ferroelectic device applications. Used image cross-correlation, curve fitting(python). P. Sriboriboon, W. Kundhikanjana (Leader), et al., Physics Letter A, (2019)pdf. J. Nonkumwong, W. Kundhikanjana (CO-Leader), et al., Integrated Ferroelectrics (2018) pdf
- Biological study
- Studied survival strategy of Arthrospira platensis (Spirulina) by shape transformation from spiral to rod shapes for effective food production. A. Chaiyasitdhi, W. Kundhikanjana (CO-Leader), et al., PLoS ONE, (2018) pdf
- Investiage effect of storage mediums on ultrastructure of spermatozoa. C. Riou, W. Kundhikanjana (CO-Leader), Avian Model Sytems Conference, (2018)
-
Strongly Correlated Oxides: Designed and conducted imaging experiments to capture microscopic glassy behavior of a metallic phase in a perovskite manganite film. Cross-correlation analysis to register large sets of images (MATLAB). Image segmentation to calculate areal fraction of the metallic phase. W. Kundhikanjana (main author), et al, Physical Review Letters, (2015) pdf
- Semiconductor Material:
- Failure analysis of a RAM device. Identified the cause of an unexpected surface implanted layer. Figure selected for Journal cover page. W. Kundhikanjana (main author), et al, Semiconductor Science and Technology (2013)
- Dissolution mechanism of MgO thin film shielding layer in tunneling magnetoresistance hard disk drive read head. W. Kundhikanjana (CO-Leader), et al, Material Today Communications, (2020) link
- Surface Science:
- Effects of catalyst surfaces on adsorption revealed by atomic force microscope force spectroscopy: photocatalytic degradation of diuron over zinc oxide. W. Kundhikanjana (co author), et. all, Physical Chemistry Chemical Physics, (2020) link
Image Analysis
- Investigated nanoscale electronic properties in novel materials. Image analysis to understand statistical inference of the data. W. Kundhikanjana (main author), Nano Letters, (2009) pdf
- Image segmentation and edge detection in (MATLAB). K. Lai, W. Kundhikanjana (co-author), Physical Review Letters, (2011) pdf
- Used image autocorrelation to calculate correlation length (MATLAB). K. Lai, W. Kundhikanjana (co-author), Science, (2010) pdf and pdf
Microfabrication
Failure analysis to help optimize microfabrication process of cantilever probes for commercial purpose. Resulted in a spin-off company (PrimeNano Inc). Y. Yang, W. Kundhikanjana (co-author), et al, Journal of Micromechanics and Microengineering (2012) pdf, Y. Yang, W. Kundhikanjana (co-author), Journal of Micromechanics and Microengineering (2014) pdf, Y. Yang, W. Kundhikanjana (co-author), MEMS (2011) pdf, and A.J. Haemmerli, W. Kundhikanjana (co-author), MEMS (2012) pdf
Software
Designed automatic control and data acquisition software in LabVIEW for high-vacuum and low temperature scanning probe microscopy platform. Resulted in many follow up publication in high-impact journals. W. Kundhikanjana (main author), et al, Review of Scientific Instruments, (2011) pdf
Hardware
- Designed and implemented calibration procedure for quantitative dielectric measurement with microwave microscopy technique. K. Lai, W. Kundhikanjana (co-author), et al, Review of Scientific Instruments, (2009) pdf and K. Lai, W. Kundhikanjana (co-author), et al, Review of Scientific Instruments, (2009) pdf
- Explored applications and provided show cases for the invented microscopy technique. K. Lai, W. Kundhikanjana (co-author), Nano Letter, (2009) pdf, S.-S.Hong, W. Kundhikanjana (co-author), Nano Letter, (2010) pdf, E.Y. Ma, W. Kundhikanjana (co-author), Nature Communication, (2015) pdf
Simulation
Developed microwave impedence microscopy technique. Finite element modeling(FEA) of tip-sample interaction(COMSOL) to understand the contrast mechanism of nanoscale microwave microscopy technique, which provides hard-to-obtain information for material development. K. Lai, W. Kundhikanjana (co-author), et al, Review of Scientific Instruments, (2008) pdf, K. Lai, W. Kundhikanjana (co-author), Applied Nanoscience, (2011) pdf
Class Projects/Notes repository
- Deep Learning notebook
- Unsupervised Learning in Python notebook
- Fraud Detection in Python [notebook] notebook
- Data Visualization notebook
- Interactive Visualization with Bokeh notebook
- Time Series Analysis notebook
- Machine Learning for Time Series Data in Python notebook
- A/B Testing: Analyzed data from the popular mobile game, Cookie Cats. Used bootstrap analysis to compare effectiveness of time pause at level 30 and 40 toward user retention notebook
- Inferential Statistic notebook
Courseworks