The growing popularity of ChatGPT over the past year has led to much speculation regarding what such a tool entails for the workforce. Many believe it indicates automation of many jobs performed by humans, especially programming ones.
I prefer a different perspective. While ChatGPT is powerful, it’s not omnipotent. If you work in tech, it cannot replace your job, but it can very much help you be better at it.
In fact, ChatGPT can be a particularly beneficial tool for data scientists, because data folks aren’t necessarily trained in the "polished" aspects of writing code. When training to be data scientists, people often learn how to use code as a statistical and sociological tool without being initiated into rigorous computer science practices.
ChatGPT can help you fill these gaps – making it easier to do your job and teaching you new skills along the way. Let’s take a look at a few ways it might do so.
What does this error mean?
I have been programming for close to six years now. Accordingly, for six long years, I have been continually frustrated by the lack of specificity and detail in the error messages I get when my code breaks.
I used to find solace in random Google searches. With any luck, some programming guru on Stack Overflow would have a detailed explanation regarding my error and solid advice on how to fix it. However, this was always hit or miss, never a guarantee.
Since ChatGPT came into existence, I have found it extremely useful for understanding error messages from code. Its responses facilitate a number of things, all of which are conducive to improving one’s programming skills:
- It provides a detailed description of what the error means.
- It gives detailed guidance on how to fix the error.
- It occasionally even provides general guidance on how to approach the larger class of errors that your error falls under.
One could argue that this information is also technically available via Google. While this is true, having it consolidated for your review and understanding in a single location makes a significant difference in terms of impact.
It’s best to illustrate this by example.
Recently, I ran into the following error while trying to insert some data into a PostgreSQL database via Python’s psycopg2 package:
WARN: <class 'psycopg2.errors.StringDataRightTruncation'>
Traceback (most recent call last):
File "<redacted>", line 363, in insert
cur.execute(self._insert_sql(table, set(keyval.keys())), keyval)
psycopg2.errors.StringDataRightTruncation: value too long for type character varying(32)
Typing the error into Google led to a few different results, and upon choosing the second one (Stack Overflow), I found someone with a similar issue who received the following top-rated answer:
_"Given the error: value too long for type character varying(200)
you should look for model fields that have a max_length
of 200. Since you have multiple fields with a max_length set to 200, you need to determine which model and field are throwing the error. Check the stacktrace, run a debugger and/or insert some debugging print(instance.__dict__)
s. Once you find the culprit, extend that field’s max_length
to something larger or turn it into a TextField
."_
Typing the same error into ChatGPT and asking if it could assist me, I received the following response:
"Certainly! The error message you provided indicates that there was an issue with inserting data into a PostgreSQL database using the psycopg2
Python library. Let’s break down the error message step by step:
WARN: <class 'psycopg2.errors.StringDataRightTruncation'>
: This part indicates that it’s a warning rather than a critical error. The warning is related toStringDataRightTruncation
, which means that some of the data being inserted into the database is being truncated because it exceeds the allowed length.Traceback (most recent call last)
: This is the Python traceback that shows the sequence of function calls leading up to the error. It helps in understanding the context of the error.File "<redacted>", line 363, in insert
: This line tells you which file and which specific line of code caused the error. The issue lies in the file<redacted>.py
on line 363.- _
cur.execute(self._insert_sql(table, set(keyval.keys())), keyval)
: The error occurred inside theexecute
method of a cursor (cur
) object. It seems that this is where the data insertion is taking place._ psycopg2.errors.StringDataRightTruncation: value too long for type character varying(32)
: This is the actual error message. It tells you that the value you are trying to insert into a column of typecharacter varying(32)
is too long and cannot fit within the specified length constraint of 32 characters.
In summary, the error occurred while trying to insert data into the database table, and one of the values being inserted exceeds the maximum allowed length for the corresponding column, which is set to 32 characters. To resolve the issue, you can either truncate the data to fit within the 32-character limit or modify the database schema to allow for longer strings in that particular column."
The ChatGPT response is much more detailed, and it breaks down the error step by step. Especially if you’re new to a language or aren’t formally trained in programming/software engineering (i.e., many data scientists), this level of detail is invaluable. Beyond just telling you what the error means, it explains to you how to parse and interpret it in depth, to the point where you may not even need to resort to external assistance in the future.
Automating "mindless" tasks
Let’s start with a definition of the word mindless here. When programming, there are certain tasks, such as writing an algorithm to update a file’s contents or designing a web application for your employer from scratch, that require a fair bit of active problem solving. These are not the types of problems I recommend handing off to ChatGPT. If you’re expecting ChatGPT to just code everything for you, 1) you’re in for a rude awakening and 2) you’re missing the opportunity to use it as an effective tool to improve your programming.
On the other hand, there are certain tasks that simply must be done when programming that don’t require much mental effort but still eat up precious time:
- Generating basic HTML templates for a web application
- Cleaning up code to meet style guidelines
- Finding a minor bug in a file with hundreds of lines of code
The list goes on. In these situations, ChatGPT can save you much time and several headaches.
Once again, I will present my argument by way of example. Recently, while writing an application in Flask, I realized I needed to write a new HTML file to serve as a template for one of my pages. I then sighed in exasperation, because writing such templates is an annoying task prone to occasional little bugs. However, I then wondered if I could just ask ChatGPT to help:
"I want a simple HTML template for my Flask app that takes in a list of strings called "filepaths," and renders a page with each string in this list displayed in a series of bullet points. I want the page to be titled "List of Files." Can you generate such an HTML file for me?"
"Sure, I can help you create a simple HTML template for your Flask app. Below is the HTML code that you can use:
<!DOCTYPE html>
<html>
<head>
<title>List of Files</title>
</head>
<body>
<h1>List of Files</h1>
<ul>
{% for filepath in filepaths %}
<li>{{ filepath }}</li>
{% endfor %}
</ul>
</body>
</html>
_Save this code in a file named `files.html`or any other appropriate name. This HTML template uses the Jinja templating engine syntax, which is integrated with Flask. The `{% for filepath in filepaths %}` block iterates through the `filepaths` list, and for each string in the list, it generates a list item (`<li>`) with the string displayed._
_In your Flask app, you can use this template with the following code:_
```python
from flask import Flask, render_template
app = Flask(__name__)
@app.route('/')
def list_files():
filepaths = [
"File 1",
"File 2",
"File 3",
# Add more files to this list as needed
]
return render_template('files.html', filepaths=filepaths)
if __name__ == '__main__':
app.run()
_Make sure that the files.html
file is placed in a folder named templates
within your Flask app directory. When you run the Flask app and navigate to the specified route (e.g., http://localhost:5000/), it will render the files.html
template with the list of files displayed as bullet points. You can modify the filepaths
list in the list_files
function with the desired data you want to display on the page."_
There is much to discuss, but let’s start with an important point: This has simply gone beyond Google’s capabilities. You might be able to use a traditional search engine to procure tips and even some general outlines, but you won’t get the exact file you want conveniently generated for you.
Secondly, ChatGPT does much more than just give you the file:
- It breaks down the file’s syntax, underscoring the functionality of important lines. This information is important, and you should read it. View ChatGPT as a way to improve your programming and learn, not as a worker who does your job for you while you sit back blindly.
- It explains how to use the template correctly in your Flask application.
- It mentions where the file should be placed to ensure proper access.
This is helpful for any programmer, but particularly so for a data scientist. Most Data Science training involves using code as a tool for statistical analysis, model building, and so on. While you might be well versed in pandas and scikit-learn, there is a fair chance that the intricacies of building web applications – if we are to take the example above – are not your specialty. Yet, you will likely still encounter it sometime in your career.
And that’s where ChatGPT can be incredibly helpful.
Finding a quick solution to a localized problem
"Hello, I need to build an end-to-end system that has full authentication capabilities, with a back-end in MySQL and a front-end written using Django. I want it to accept financial data from users and make predictions about future success. Can you write this whole thing for me? Thanks."
Wouldn’t that be nice? But as I have repeatedly mentioned, ChatGPT is not something you can use to just write an entire system. Asked the above, this is its response (I’ve removed the details to keep things concise):
"Hello! I’d be glad to help you outline the process of building such a system, but creating an entire system like this requires a significant amount of work and coding. I can guide you through the general steps and components you’ll need to build it. Let’s break down the process:
_1. Project Setup:
- Set up your development environment with the necessary tools and software, including Python, Django, MySQL, and any other dependencies you might need._
_2. Database Design:
- Design …_
_3. Authentication:
- Use Django’s built-in authentication system …_
_4. Front-End Development:
- Create the front-end interface using …_
_5. Back-End Development:
- Implement the back-end using …_
_6. Data Submission:
- Set up forms or API endpoints for users …_
_7. Prediction Model:
- Develop or integrate a machine learning model for …_
_8. Data Processing and Prediction:
- When users submit their financial data, use …_
_9. Security and Privacy:
- Implement proper security measures, including …_
_10. Testing:
- Thoroughly test your application for …_
_11. Deployment:
- Deploy your application to a production environment …_
_12. Monitoring and Maintenance:
- Set up monitoring tools to …_
Remember that building a system of this complexity requires strong development skills in both back-end (Django, database) and front-end (HTML/CSS, JavaScript) technologies, as well as knowledge of machine learning for predictions. It’s also important to keep security in mind throughout the development process. If you’re not experienced in these areas, you might want to consider working with a development team or seeking professional assistance."
Not too much more helpful than a Google search, in my opinion.
On the other hand, ChatGPT is great at helping solve localized problems.
For instance, while writing some code the other day, I had to figure out how to take a range of numbers and convert them into a string containing all the numbers separated by commas. I attempted to do this myself first, with the following line of code:
",".join(list(range(9)))
However, this gave me an error because apparently the join
function can only handle strings, not integers. When I asked ChatGPT to help resolve my problem, it provided the following solution:
"You’re correct that the join
method expects a sequence of strings, not integers. To fix the error, you can convert each integer in the list to a string before using the join
method. Here’s the corrected code:
",".join(str(i) for i in range(9))
In this code, the str(i)
converts each integer i
to its string representation, and the generator expression (str(i) for i in range(9))
creates an iterable of strings. Then, the ",".join(...)
call combines these strings using commas as separators. This will produce the output "0,1,2,3,4,5,6,7,8"
."
This solved my problem, and it also taught me something new. When I first reviewed the code, I thought it was using a list comprehension (which would have been my first instinct), but I noticed there were no brackets. Looking at the initial output, we can see ChatGPT mentions using a generator expression. When I asked ChatGPT to explain further, it broke down the details of the generator expression and explained how they work in general.
Thus, throughout this process, I quickly gained working code for my issue as well as learned to use a new feature in Python (generator expressions) which I had never previously encountered.
Recap + Final Thoughts
Here’s a small cheat sheet of ways you can use ChatGPT to improve your data science workflow:
- Understanding errors. ChatGPT can conveniently describe your error in layman’s terms for you, as well as provide guidance on how to fix it.
- Automating busywork. Many necessary programming tasks don’t require active problem solving but still take up time. Hand them off to ChatGPT.
- Solving small problems. If you need a quick and efficient way to accomplish a small task, try asking ChatGPT. You might just learn something new.
While the title of this article and many of its points focus on the time ChatGPT can help you gain back, the benefits of using it effectively really are twofold. The examples above are all drawn from my personal experiences, and in nearly every case I learned something new about programming.
Technology is advancing, and we would do well to advance with it. Instead of feeling threatened by ChatGPT, learn to use it to your advantage.
You might just be surprised at what you accomplish.
Want to excel at Python? Get exclusive, free access to my simple and easy-to-read guides here. Want to read unlimited stories on Medium? Sign up with my referral link below!