The world’s leading publication for data science, AI, and ML professionals.

Path Representation in Python

Stop using strings to represent paths and use pathlib instead

Photo by Pawel Czerwinski on Unsplash
Photo by Pawel Czerwinski on Unsplash

Working with filesystems is one of those tasks that seem trivial at first glance, yet can easily catch even experienced developers off guard. I’ll be the first to admit – I’ve made my fair share of mistakes. One of the most common anti-patterns I’ve encountered is representing file paths as strings in Python.

It’s time to rethink this approach.

In today’s article, we’ll explore why using strings (or even the os module) for file paths is a recipe for disaster. We’ll dive into best practices and see how the Pathlib package can help you write cleaner, more maintainable code.


Why Using Strings for Paths is a Bad Idea

If you’ve ever worked on a project that needs to run on different operating systems, you know the pain of dealing with paths. Different systems have different ways of representing paths. Unix-based systems (like Linux and macOS) use a forward slash /, while Windows uses a backslash “. It’s a small detail, but one that can lead to big headaches if you’re not careful.

# Unix (e.g. Linux, OSX, etc.)
/home/this/is/a/path/to/a/directory

# Windows
C:homethisisapathtoadirectory

I once spent hours debugging a script that worked flawlessly on my Mac but completely fell apart when a colleague ran it on Windows. The issue? It was, of course, the file paths. I had hardcoded paths as strings, assuming they’d behave the same on any system.

The problem here is code portability. If you’re using strings to represent paths, you’ll need to write conditional logic to handle different operating systems. Not only does this make your code messy, but it also introduces unnecessary complexity.

Python"># This is a bad practice
import platform

if platform.system() == 'Windows':
    filepath = 'C:homethisisapathtoadirectory'
else:  # e.g. 'Darwin' for OSX or 'Linux'
    filepath = '/home/this/is/a/path/to/a/directory'

The worst part? It doesn’t stop there. Let’s say you need to concatenate two paths. Using plain string concatenation can lead to invalid paths, especially if one of the strings contains a slash at the end.

Trust me, I’ve been there.

path_1 = '/this/is/a/path/'
path_2 = '/another/path'

# filepath = '/this/is/a/path//another/path'
filepath = path_1 + path_2

Why the OS Module Isn’t the Solution

At this point, some of you might be thinking, "But what about the os module?". Yes, using os.path.join() can help you avoid some of the pitfalls of string concatenation. However, it’s far from a perfect solution.

import os

path_1 = '/this/is/a/path/'
path_2 = '/another/path'

# filepath = '/another/path'
filepath = os.path.join(path_1, path_2)

While this avoids the issue of double slashes, paths in the os module are still just strings. That means you’re still stuck dealing with all the limitations and complexities of string manipulation. Need to extract the parent directory? Want to check if a file exists? You’ll be writing a lot of extra code to get these basic tasks done.

In my early Python days, I relied heavily on the os module, thinking it was the go-to solution for path manipulation. But as my projects grew in complexity, so did my path-related bugs.


Pathlib to the rescue!

That’s where pathlib comes in. It’s part of Python’s standard library and is designed to make filesystem path manipulation a breeze. The beauty of pathlib lies in its ability to represent paths as objects—pure paths for computational operations and concrete paths for I/O operations.

Pathlib represents paths as objects - Source: Python documentation (licensed under PSF licence)
Pathlib represents paths as objects – Source: Python documentation (licensed under PSF licence)

As an example, let’s consider the following chunk of code that simply constructs a Path using Pathlib library:

import pathlib

path = pathlib.Path('folder') / 'subfolder' / 'another_subfolder'

print(path)
print(type(path))

The output of this code will vary depending on the operating system:

  • On Unix systems (Linux or macOS):
# The output when running it on Unix systems (Linux or OSX, for example)

folder/subfolder/another_subfolder
<class 'pathlib.PosixPath'>
  • On Windows
# The output when running it on Windows systems

foldersubfolderanother_subfolder
<class 'pathlib.WindowsPath'>

This automatic handling of paths based on the OS you’re running is a game-changer. And the benefits don’t stop there.


What Makes Pathlib So Powerful?

When it comes to handling file paths in Python, pathlib offers a range of features that go beyond just fixing the shortcomings of strings and the os module. Here are some key reasons why pathlib stands out as the best tool for the job.

  1. Dealing with current working directory (cwd)No more worrying about whether your script will run on Windows or Unix. Pathlib takes care of it for you.
from pathlib import Path

print(Path.cwd())
# On Unix: PosixPath(/Users/gmyrianthous/test)
# On Windows: WindowsPath(C:Usersgmyrianthoustest)

2. Programmatically creating new directory

from pathlib import Path

Path('new_directory').mkdir()

If the newly created directory exists already, the above code snippet will fail. If you want to ignore this failure, you can specify the corresponding argument when calling mkdir() method;

Path('new_directory').mkdir(exist_ok=True)

3. Checking if a file existsIn order to check if a particular file exists on the file system, you will first have to construct a Path and then make use of exists() method over the path object:

from pathlib import Path

file = Path('my_directory') / 'data.txt'
print(file.exists())

4. Listing the contents of a directoryTo list the contents of a directory you can call iterdir() that will return an iterator:

from pathlib import Path

filepath = Path('folder') / 'subfolder'
content = filepath.iterdir()

If the filepath is expected to contain a huge volume of paths and files, then I would advise you to iterate over the files within the iterator one by one, in order to avoid loading everything into memory.

Alternatively, if you are expecting a relatively small volume of contents, you can directly convert the iterator into a list:

from pathlib import Path

filepath = Path('folder') / 'subfolder'
content = list(filepath.iterdir())

Final Thoughts

It’s easy to overlook best practices when dealing with seemingly simple tasks like file path manipulation. I know I did. But as your projects grow, these "little" things can become major pain points. Investing the time to learn and use tools like pathlib will pay off in the long run, saving you from headaches and potential bugs.

So, if you’re still using strings to represent paths in Python, it’s time to make the switch. Your future self (and your teammates) will thank you.


Related Articles