File System
Updated on 28 Dec 2022
It is easy to open up windows explorer, click on a folder and see the contents of that folder. In Python we can do the same thing, and populate variables with our filenames.
listing contents of a folder
There are several Python modules we can use for this type of task, but in this training course we’ll keep it simple and stick with the Pathlib module.
import pathlib
# define the path
currentDirectory = pathlib.Path('.')
# get list of files in current directory
for currentFile in currentDirectory.iterdir():
print(currentFile)
We can see here that the script lists the filenames and directories that are found in the current folder. This can be very powerful piece of code that we can reuse if we need to iterate over all the files in a directory and do stuff with them.
Filtering the contents
Quite often we don’t need to know about all the files in a directory, just the ones we want to work with. Let’s say we only want to look at .txt files.
import pathlib
# define the path
currentDirectory = pathlib.Path('.')
currentPattern = "*.txt"
# get list of files that match the pattern
for currentFile in currentDirectory.glob(currentPattern):
print(currentFile)
In this case, only the .txt files are listed.
Additional functions
So far we’ve seen a few Pathlib related functions.
- Path -> returns a path object in which we can call other functions with (i.e. the 2 below)
- iterdir -> iterates over each item in the directory if Path points to a directory
- glob -> only return files / directories that match the pattern.
A few other interesting ones include
- is_file -> is the object a file
- is_dir -> is the object a dir
- exists -> does the path point to a valid file or dir
- mkdir -> make directory
- rmdir -> remove directory
import pathlib
if pathlib.Path('dest').is_dir():
print('dest is a directory')
In this example, dest actually exists as subdirectory, hence we get a message printed.
moving files around
It could be that we need to move files to different directories. This is usually done when you have a process that resembles something similar to this:
pickup folder -> working folder -> done folder
To move files around, we’ll use the shutil library.
import shutil
shutil.move("meme.txt", "dest/meme.txt")
Another useful function in the shutil
library is copy. Similar to move, this will copy a file to a new location. copy2 will attempt to copy the file, but keep the metadata exactly the same (i.e. owner, date modified etc).
File System - guided exercise
I want to combine my knowledge of dates, reading and writing files plus the new knowledge in this section to solve my next problem. I want to run a script that will list all the files that have changed in the current directory since the last time I ran the script.
Step 1 - try out the stat function
The first step is try our ideas out on a single file. Lets consider the meme.txt as a testcase, and we’ll have a go at using the stat function.
import pathlib
import datetime
myfile = pathlib.Path('meme.txt')
stat = myfile.stat()
print(stat)
print(stat.st_mtime)
The 3 time parameters that we might find useful are:
- atime -> last access datetime
- mtime -> last modification datetime
- ctime -> creation datetime
step 2 - save the current date
We know from the date chapter, that we can get the current date with code similar to this:
import datetime
today = datetime.datetime.today()
print(str(today))
But that date is a different format to the one returned from stat.mtime
.
modifiedDate = datetime.datetime.fromtimestamp(stat.st_mtime)
print(modifiedDate)
In the first example we want to save that datetime object to a file. This will be used in the comparison from date derived in the second example.
step 3 - mapping our solution
The solution will be:
- read the datefile and extract the date when the script last ran.
- compare this date with that from stat.st_mtime to determine if the file has been modified
- update the datefile with the current date.
If all of this works with our sample file, we’ll be able to update the code to use iterdir function. Another thing is that the initial run of this script doesn’t have a datefile. We’ll create one using the output from the examples above.
Implementing a trial solution with meme.txt
Using the information in an earlier chapter we can read the contents of a file into a variable like this:
f = open('datefile.txt')
lastdate_str = f.read()
f.close()
lastdate = float(lastdate_str)
This lastdate variable is in the same format as the the mtime from the stat function. This means that we can make direct comparisons as we have done in the below snippet.
myfile = pathlib.Path('meme.txt')
stat = myfile.stat()
modifiedDate = stat.st_mtime
if modifiedDate > lastdate:
print('meme has been modified')
else:
print('meme has NOT been modified')
Finally we can save the current timestamp to datefile.txt.
today = time.time()
with open('datefile.txt', 'w') as f:
f.write(str(today))
full source code for interim solution
import pathlib
import datetime
import time
# -----
# --get the datetime for when this script was last run
# -----
f = open('datefile.txt')
lastdate_str = f.read()
f.close()
lastdate = float(lastdate_str)
# -----
# -----
# --interim solution - get the mtime for meme.txt and compare to lastdate
# -----
myfile = pathlib.Path('meme.txt')
stat = myfile.stat()
modifiedDate = stat.st_mtime
if modifiedDate > lastdate:
print('meme has been modified')
else:
print('meme has NOT been modified')
# -----
# -----
# --update the datefile with the current datetime, and save it to file
# -----
today = time.time()
with open('datefile.txt', 'w') as f:
f.write(str(today))
# -----
We can run this script, change meme.txt and run it again. We should get output similar to below.
step 4 - introducing iterdir()
Now we need to replace our practice solution with meme.txt with iterdir. This will fulfil our goal of scanning the directory and listing any files that have changed since the script last ran.
Lets replace the entire section for interim solution
# -----
# --interim solution - get the mtime for meme.txt and compare to lastdate
# -----
with this:
# -----
# --use iterdir and compare to lastdate with the mtime of the file referenced in iterdir.
# -----
currentDirectory = pathlib.Path('.')
for currentFile in currentDirectory.iterdir():
stat = currentFile.stat()
modifiedDate = stat.st_mtime
if modifiedDate > lastdate:
print('{0} has been modified'.format(currentFile))
# -----
Now if we rerun our script, we might get output similar to what is shown below.
Full solution
import pathlib
import datetime
import time
# -----
# --get the datetime for when this script was last run
# -----
f = open('datefile.txt')
lastdate_str = f.read()
f.close()
lastdate = float(lastdate_str)
# -----
# -----
# --use iterdir and compare to lastdate with the mtime of the file referenced in iterdir.
# -----
currentDirectory = pathlib.Path('.')
for currentFile in currentDirectory.iterdir():
# --skip the datefile.txt because we don't want this file to be part of the check.
if currentFile.name == 'datefile.txt':
continue
stat = currentFile.stat()
modifiedDate = stat.st_mtime
if modifiedDate > lastdate:
print('{0} has been modified'.format(currentFile))
# -----
# -----
# --update the datefile with the current datetime, and save it to file
# -----
today = time.time()
with open('datefile.txt', 'w') as f:
f.write(str(today))
# -----
File System - extended exercise
Modify the guided exercise so that the date & time that is saved to the datefile.txt file is in a human readable format.