Hâtvalues

Data Pipelines For Machine Learning

Data pipelines are essential for machine learning projects as they help to manage the flow of data from various sources, ensure data quality, and automate the process of data preparation. What’s the end goal for a data pipeline? The resulting clean and processed data may be used for analysis, business intelligence and reporting. Another common use these days is as the input for Machine Learning (ML). While data pipelines are often hand-coded in high level programming languages such as Java, there are plenty of configurable (point-and-click) tools available to do this task. One benefit of using such tools is that many of them have built-in components for completing the ML tasks of feature engineering, training, evaluating and deploying ML models. In this blog post, I will compare and contrast SAS, KNIME, Alteryx, and RapidMiner, which I have used extensively.

Data Pipelines and Data Engineers

Data pipelines are critical for ensuring that data is accurate, timely, and available to those who need it. Without data pipelines, organizations would struggle to process and make sense of the vast amounts of data that they collect. Data pipelines enable organizations to build machine learning models, conduct data analysis, and make data-driven decisions. In this blog post, we will discuss data pipelines and the role of data engineers in building and maintaining them.

Non-Parametric Survival Analysis of a Sleep Diary

Introduction # Earlier this month I carried out a parametric survival analysis over a self-generated dataset of my sleep times each day over the previous year. Using the scientific method, of course, I set about the task with a null hypothesis that eating certain food groups for dinner had no effect on my sleep. The findings were indeed very interesting, and I was able to reject the null hypothesis using a parametric regression of the data set using a Weibull survival regression. You can read about that here, so I won’t repeat myself and I’ll skip the exploratory analysis. Just remember, the diagnosis is not great! My median nightly sleep time is 5.84 hours, or 05 hours 50 minutes 34 seconds.

Parametric Survival Analysis of a Sleep Diary

Introduction # As a teenager, I began to suffer with chronic and ongoing insomnia that has stayed with me for all of my adult life with very few periods of real respite. However, rather than turn this into a sob story, I’ve managed to come up with a really interesting data story! In the search for triggers and patterns, I decided to buy a wearable fitness tracker and keep a tally of my nightly quality sleep hours. Around the same time, I finally had the self-awareness to realise that a lot of my worst nights seemed to be accompanied by digestive disturbances, even so to hypothesise that certain food groups may be acting as triggers or exacerbating the problem. I decided to keep a food diary alongside the sleep data. I might also add that I’m proud of myself for not skipping a day during the collection period.

Round Peg in a Square Hole

Introduction # This post is a prologue to a forthcoming post on survival analytics and featucres the exploratory analysis of a self-generated data set that I will use for another demonstration post in the next few weeks. Let me give you the background quickly, because I will go deeper with the next post: I have a really chronic problem with insomnia, since I was a teenager. I had a sense that certain foods were triggering bad nights’ sleep so I kept a food diary of what I’d eaten for dinner and recorded my quality sleep hours with a fitness tracker for a year.