Python 3 – os.walk() Method
Do you need to go through a whole directory and all its subdirectories to locate specific files? Want to include all the different files inside these directories? Good news, Python has a method for that!
Meet the os.walk() method, which simplifies the file searching process. With just a few lines of code, you can iterate through all directories and subdirectories to locate the files you want.
Understanding os.walk() Method
os.walk() method is a built-in function within the os module that allows you to easily iterate through all the files and directories nested under a starting root directory. Here’s the demo of how it works:
import os
for rootDir, subDirs, files in os.walk("C:\\Users\\your_folder"):
for filename in files:
print(os.path.join(rootDir, filename))
The os.walk() method takes a starting directory and returns three things at each iteration:
- The full path of the folder being parsed represented by root (str).
- A list of all directories within the root (list).
- A list of all files in the root directory (list).
The code first imports the os module and then runs the walk() method to iterate through the files in a specified starting folder, represented by “C:\Users\your_folder.”
The for loop prints out the results by joining the root directory and file name using os.path.join().
Filtering Files with Extensions
If you’re looking to filter specifically by file extensions (say, .txt), Python offers another useful method called glob. Here’s an example on how to use the glob to get all text files inside any directory:
import glob
for file in glob.glob("C:\\Users\\your_folder\\*.txt"):
print(file)
The first line of code imports glob module, and then the for loop prints out every *.txt file within “C:\Users\your_folder”.
Customizing Filters
What if you’re looking to search through only specific files or directories? You can customize your filters accordingly in the walk() method. This can be done by adding an if statement that checks if the current file or directory matches your criteria. Here’s an example of the above where the code will only display .csv files:
import os
for rootDir, subDirs, files in os.walk("C:\\Users\\your_folder"):
for filename in files:
if filename.endswith(".csv"):
print(os.path.join(rootDir, filename))
This code is similar to the first example, but it includes an if statement on the second for loop that checks for files with the extension .csv.
Challenges with Large Directories
Searching for files in large directories can be quite time-consuming, especially if you need to scan through nested directories.
One way to speed up the process is to exclude known directories to reduce the number of iterations required. This can be done using the optional argument topdown, which begins at the top of the directory tree or in a bottom-up direction.
Here’s an example of how to exclude specific directories from your search using a top-down approach where the code will only display files in folders in “C:\Folder” but not in “C:\Folder\Sub_Folder”:
import os
for rootDir, subDirs, files in os.walk("C:\\Folder", topdown=True):
subDirs[:] = [d for d in subDirs if d not in ['Sub_Folder']]
for filename in files:
print(os.path.join(rootDir, filename))
In this code, the subDirs[:] element is an idiom used to modify a list in-place. The second for loop prints out every file found by the walk() method to join the root directory and the filename, while excluding searches in the “Sub_Folder” subdirectory.
Conclusion
The os.walk() method is a powerful tool that saves a great deal of time when searching for files in complex directory structures. Whether you want to search for particular types of files, filter results or exclude directories, Python offers a variety of options to customize your search.
With a little bit of creativity and attention to detail, this method can make the file search process much more efficient!