MM Thomas

A blog by a data scientist in Georgia

Coupling and Cohesion

Two concepts highlighted in Chelsea Troy’s Leveraging LLMs in Software Engineering on O’Reilly that will supplement young developers and data scientists in this era where verbose yet effective code can be generated quickly are coupling and cohesion. Coupling describes how independent disparate parts of code are. For example, how much trouble will it cause in your data visualization section if the data validation needs to accept a different column name? If you solve the problem with code generated quickly, you might miss that your data input step creates a data set with column names that are dependent on the input data, and then it may break some visualization code that calls that column. Finding a way to avoid that problem is an improvement in your code’s Coupling. ...

Whatever the Post Is Named

Some things I may want to talk about on here- TidyTuesday: Although that may end up becoming just part of my GitHub. The Parts that Mattered: Things I learned in grad school that actually ended up being useful in my career. 31 Day/Engineer’s Log: Every once in a while I try to read some programming or data science text and implement a tiny example or talk about a useful concept once a day. Admittedly, these live on local drives, but they’re a good way to put something short out there when I need to be posting but don’t have anything long form. Here’s an example: The other day I was trying to copy a list of files from one place to another, and I was not sure how to do this based on a .txt file of their contents. A couple ways of doing that are the following, ...

The Idea of Llm PEFT

This is mostly a summary, but this week I have been trying to wrap my head around Parameter Efficient Fine-Tuning. A method where changes are made to Large Language Models (LLMs) by applying changes to sections of the parameters. According to the article by Mangrulkar and Paul, this applies specifically to Large Language Models based on the transformer architecture. This includes the big names of today; GPT, T5, and BERT (at least the big ones of the day in 2023). ...

First Post 2025

New year, new theme! Just updated the theme of the site from minimal to paperMod.

A Lot of Things

I submitted a copy of my dissertation tonight. Four years and it took me this long to realize what it takes to write something long. I learned a ton of accessory skills and ways of approaching problems and nomenclature too, but more than anything I think the last week have been very clear: if you want to write something long, you have to read a lot. If you want to finish it in a reasonable amount of time, it helps to approach it repeatedly. I watched this video during the pandemic (yes, it’s about Smash Bros.) and it reminded me of what it’s like to improve at something. You can’t force a ton of work every day. You’ll burn out. I can’t speak for people in other fields; but in fields that require mental activity, it’s extremely hard to know for sure when your mind will be the clearest. ...

First Post 2023

I hate to keep doing this, but this is the first post of 2023. I am mostly doing this to grease the posting wheels. See you soon! -emem_tee

Another Python Hiccup

The Idea I generally work through silly pet projects to improve my fluency with a language or data analysis library. This month I’ve been trying to work through things I already know how to do with R and the tidyverse and implement them in a python script with pandas. This Week Here is one thing that I got through this week– I’m sure this is elementary to some, but if I’ve learned one thing in statistical programming, it’s that if something has annoyed me it’s probably also annoyed to someone else. ...

How to Post

Alrighty, I have managed to fix the issue with not having the Hugo .exe file at a location where my shell (I use CMD in Windows) can find it. Pro Tip: Do not forget to click OK on ALL the dialogue boxes in the System Properties dialog box. Also make sure you restart your shell when testing. Now that I have that down, I just have to is navigate to my site’s directory (for me that’s .\emem-thomas.com) and enter the following, ...

How I Am Learning R

Introduction A question I get often when talking to younger statisticians and data scientists is “how did you learn R”? The real/cheating answer I want to give first is “well I took a Java class and that made everything click wayyy smoother.” And although that’s an important step, taking one course in another language didn’t get me to where I am today. I should add that I have not learned R. I would argue there are very few people who really know R, especially because like any active language, it changes every few hours. Okay, that is a little hyberbolic, but the point I’m making is that even when you get good, there will always be ways to change what you’re doing for the better. ...

Sftp Test

This Is yet another test unfortunately.