File I/O


File handling in Python

Getting directory list:

three method:

1. import os

os.listdir('C:\sonu')

2.

dir_name=os.scandir('C:\sonu')

for i in dir_name:

    print(i)

3. 

from pathlib import Path

entry=Path(path)

for i in entry.iterdir():

     print(i)

Get only file not directory

Method 1

listdir

path='C:\sonu'

entry=os.listdir(path)

for i in entry:

    if os.path.isfile(os.path.join(path,i)):

        print(i)    

Second method

entry=os.scandir(path)

for i in entry:

    if i.is_file():

        print(i)

3rd method:

from pathlib import Path

entry=Path(path)

for i in entry.iterdir():

    if i.is_file():

        print(i) 

Getting list of  subdirectory:

Method 1

import os

path='C:\sonu'

entry=os.listdir(path)

for i in entry:

    if os.path.isdir(os.path.join(path,i)):

        print(i)

Method 2

entry=os.scandir(path)

for i in entry:

    if i.is_dir():

        print(i)

Method 3

from pathlib import Path

entry=Path(path)

for i in entry.iterdir():

    if i.is_dir():

        print(i) 

 Getting File attributes:

Python makes retrieving file attributes such as file size and modified times easy. This is done through os.stat()os.scandir(), or pathlib.Path().

os.scandir() and pathlib.Path() retrieve a directory listing with file attributes combined.

1.

import os

path='C:\sonu'

entry=os.scandir(path)

for i in entry:

    info=i.stat()

    print(info,info.st_mtime)


2.
from datetime import datetime
from pathlib import Path
entry=Path('C:\sonu')
for i in entry.iterdir():
    info=i.stat()
    print(info)
    print(datetime.utcfromtimestamp(info.st_mtime))


Making Directory

FunctionDescription
os.mkdir()Creates a single subdirectory
pathlib.Path.mkdir()Creates single or multiple directories
os.makedirs()Creates multiple directories, including intermediate directories

 single directory:

1.

import os

path=r'C:\sonu\office\uko'

os.mkdir(path)


from pathlib import Path

path=r'C:\sonu\office\ukotest'

p=Path(path)

p.mkdir()

NOTE: use try block it will handle if file already exists

Alternatively, you can ignore the FileExistsError by passing the exist_ok=True argument to .mkdir():

Creating Multiple Directory:

1.

import os

path=r'C:\sonu\office\uko\test\jhg'

os.makedirs(path)

If you need to create directories with different permissions call .makedirs() and pass in the mode you would like the directories to be created in:

os.makedirs('2018/10/05', mode=0o770)


2.

from pathlib import Path

path=r'C:\sonu\office\ukotest\test'

p=Path(path)

p.mkdir(exist_ok=True)

Passing parents=True to Path.mkdir() makes it create the directory 05 and any parent directories necessary to make the path valid.

File name pattern matching

FunctionDescription
startswith()Tests if a string starts with a specified pattern and returns True or False
endswith()Tests if a string ends with a specified pattern and returns True or False
fnmatch.fnmatch(filename, pattern)Tests whether the filename matches the pattern and returns True or False
glob.glob()Returns a list of filenames that match a pattern
pathlib.Path.glob()Finds patterns in path names and returns a generator object

Method 1

import os

path=r'C:\sonu\office'

for f in os.listdir(path):

    if f.endswith('.txt'):

        print(f)


Method 2

import os

import fnmatch

path=r'C:\sonu\office'

for f in os.listdir(path):

    if fnmatch.fnmatch(f,'*.txt'):

        print(f)

More Advanced Pattern Matching


import os
import fnmatch
path=r'C:\sonu\office'
for f in os.listdir(path):
    if fnmatch.fnmatch(f,'*wor*.txt'):
        print(f)

Output: file name contains word

File name pattern matching using glob:

import os

import glob

path=r'C:\sonu\office'

for f in glob.glob('C:\sonu\office\*.txt'):

    print(f)

File search in subdirectory too:

import os

import glob

path=r'C:\sonu\office'

for f in glob.iglob('C:\sonu\office\**/*.txt',recursive=True):

    print(f)

Path.glob() method:

Another way of file listing


import os
import glob
from pathlib import Path
path=Path(r'C:\sonu\office')
for f in path.glob('*.t*'):
    print(f)

Traversing Directories and Processing Files

A common programming task is walking a directory tree and processing files in the tree.
os.walk() is used to generate filename in a directory tree by walking the tree either top-down or bottom-up. 

os.walk() returns three values on each iteration of the loop:

  1. The name of the current folder

  2. A list of folders in the current folder

  3. A list of files in the current folder

import os
path=r'C:\sonu'
for dir_path,list_of_file,files in os.walk(path):
    print(dir_path)
    for f in list_of_file:
        print(f)
    for f in files:
        print(f)

os.walk() defaults to traversing directories in a top-down manner.Passing the topdown=False argument will make os.walk() print out the files it finds in the subdirectories first:

Making Temp File and Directory: tempfile can be used to open and store data temporarily in a file or directory while your program is running. tempfile handles the deletion of the temporary files when your program is done with them.

from tempfile import TemporaryFile

fp=TemporaryFile('w+t')

fp.write('dfjhfgyRO')

fp.seek(0)

data=fp.read()

fp.close()

Create a Temp Directory:

 import tempfile

fp=tempfile.TemporaryDirectory('w+t')

print(fp)


Deleting Files in Directory:

FunctionDescription
os.remove()Deletes a file and does not delete directories
os.unlink()Is identical to os.remove() and deletes a single file
pathlib.Path.unlink()Deletes a file and cannot delete directories
os.rmdir()Deletes an empty directory
pathlib.Path.rmdir()Deletes an empty directory
shutil.rmtree()Deletes entire directory tree and can be used to delete non-empty directories

You can delete single files, directories, and entire directory trees using the methods found in the osshutil, and pathlib modules. 

To delete a single file, use pathlib.Path.unlink()os.remove(). or os.unlink().

Syntax: 

1.

import os

path=r'C:\sonu\psc\report rule.txt'

os.remove(path)

   

2.

import os

path=r'C:\sonu\psc\report rule.txt'

os.unlink(path)

you can also use pathlib.Path.unlink() to delete files:

from pathlib import Path

data=Path(path)

data.unlink()

Deleting Directory:

The standard library offers the following functions for deleting directories:

  • os.rmdir()
  • pathlib.Path.rmdir()
  • shutil.rmtree()
1.

import os
path=r'C:\sonu\office\uko\test\jhg'
os.rmdir(path)

2.
from pathlib import Path data=Path(path) data.rmdir()

3.

import shutil path=r'C:\sonu\office\ukotest' shutil.rmtree(path)


Copying Moving and renaming File:

shutil offers a couple of functions for copying files. The most commonly used functions are shutil.copy() and shutil.copy2().If dst is a file, the contents of that file are replaced with the contents of src. If dst is a directory, then src will be copied into that directory. shutil.copy() only copies the file’s contents and the file’s permissions. 

Other metadata like the file’s creation and modification times are not preserved.


1. import shutil

path=r'C:\sonu\production.txt'

dest=r'C:\sonu\psc'

shutil.copy(path,dest)

2.import shutil
path=r'C:\sonu\production.txt'
dest=r'C:\sonu\psc'

shutil.copy2(path,dest)

Copying Directories

While shutil.copy() only copies a single file, shutil.copytree() will copy an entire directory and everything contained in it. shutil.copytree(src, dest) takes two arguments: a source directory and the destination directory where files and folders will be copied to.

import shutil

path=r'C:\sonu\psc\ca'

dest=r'C:\sonu\test\dest'

shutil.copytree(path,dest)

The destination directory must not already exist. It will be created as well as missing parent directories. shutil.copytree() is a good way to back up your files.

Move directory:

import shutil

path=r'C:\sonu\test'

dest=r'C:\sonu\test2'

shutil.move(path,dest)


Renaming File:

import os

path=r'C:\sonu\production.txt'

os.rename(path,r'C:\sonu\productionnew.txt')

Rename directory:

 import os

path=r'C:\sonu\ca'

os.rename(path,r'C:\current')

Another way to rename files or directories is to use rename() from the pathlib module:

from pathlib import Path

datafile=Path(r'C:\sonu\psc\ca\test.txt')

datafile.rename('newtest.txt')

Archiving: 

Archives are a convenient way to package several files into one. The two most common archive types are ZIP and TAR. The Python programs you write can create, read, and extract data from archives. You will learn how to read and write to both archive formats in this section.

The zipfile module is a low level module that is part of the Python Standard Library.

 To read the contents of a ZIP file, the first thing to do is to create a ZipFile object. ZipFile objects are similar to file objects created using open()

.namelist() returns a list of names of the files and directories in the archive.

import zipfile

with zipfile.ZipFile(r'C:\sonu\office\project document\Gost Specialized\file\data.zip','r') as z:

    print(z.namelist())


To retrieve information about the files in the archive, use .getinfo():

.getinfo() returns a ZipInfo object that stores information about a single member of the archive. To get information about a file in the archive, you pass its path as an argument to .getinfo(). Using getinfo(), you’re able to retrieve information about archive members such as the date the files were last modified, their compressed sizes, and their full filenames.

import zipfile

with zipfile.ZipFile(r'C:\sonu\office\project document\Gost Specialized\file\data.zip','r') as z:

    print(z.getinfo('2023-CTE-Adon.xlsx'))


import zipfile

with zipfile.ZipFile(r'C:\sonu\office\project document\Gost Specialized\file\data.zip','r') as z:

    obj=z.getinfo('2023-CTE-Adon.xlsx')

    print(obj.date_time)

    print(obj)


Extracting Zip Archive:

The zipfile module allows you to extract one or more files from ZIP archives through .extract() and .extractall().

These methods extract files to the current directory by default. They both take an optional path parameter that allows you to specify a different directory to extract files to. If the directory does not exist, it is automatically created. 

extract single file into current directory

import zipfile
import os
data=zipfile.ZipFile(r'C:\sonu\office\project document\Gost Specialized\file\data.zip','r') 
data.extract('2023-CTE-Adon.xlsx')

Extract all file into different directory:

import zipfile
import os
data=zipfile.ZipFile(r'C:\sonu\office\project document\Gost Specialized\file\data.zip','r') 
data.extractall(path=r'C:\sonu\office\project document\Gost Specialized', pwd='Quish3@o')

Extracting Data From password protective archive:

import zipfile
import os
data=zipfile.ZipFile(r'C:\sonu\office\project document\Gost Specialized\file\data.zip','r') 
data.extractall(path=r'C:\sonu\office\project document\Gost Specialized')


Creating New Archives:

To create a new ZIP archive, you open a ZipFile object in write mode (w) and add the files you want to archive:

 import zipfile

>>> file_list = ['file1.py', 'sub_dir/', 'sub_dir/bar.py', 'sub_dir/foo.py']
>>> with zipfile.ZipFile('new.zip', 'w') as new_zip:
...     for name in file_list:
...         new_zip.write(name)


>>> # Open a ZipFile object in append mode
>>> with zipfile.ZipFile('new.zip', 'a') as new_zip:
...     new_zip.write('data.txt')
...     new_zip.write('latin.txt')


Working with TAR Files:

 The table below lists the possible modes TAR files can be opened in:
ModeAction
rOpens archive for reading with transparent compression
r:gzOpens archive for reading with gzip compression
r:bz2Opens archive for reading with bzip2 compression
r:xzOpens archive for reading with lzma compression
wOpens archive for uncompressed writing
w:gzOpens archive for gzip compressed writing
w:xzOpens archive for lzma compressed writing
aOpens archive for appending with no compression

Read Tar Files:

import tarfile
import os
print(os.listdir(r'C:\sonu\test'))
filelist=os.listdir(r'C:\sonu\test')
print(filelist)

t=tarfile.open(r'C:\sonu\test\text.tar','r')
t.getnames()


Create TAR Files:

import tarfile
import os
print(os.listdir(r'C:\sonu\test'))
filelist=os.listdir(r'C:\sonu\test')
print(filelist)
with tarfile.open(r'C:\sonu\test\text.tar',mode='w') as tar:
    for file in filelist:
        tar.add(os.path.join(r'C:\sonu\test',file))

Add new member in file:

import tarfile
import os
print(os.listdir(r'C:\sonu\test'))
filelist=os.listdir(r'C:\sonu\test')
print(filelist)
with tarfile.open(r'C:\sonu\test\text.tar',mode='a') as tar:
    for file in filelist:
        tar.add(os.path.join(r'C:\sonu\test',file))

Extracting Files From a TAR Archive

In this section, you’ll learn how to extract files from TAR archives using the following methods:
  • .extract()
  • .extractfile()
  • .extractall()

exract single file from tar:

import tarfile
import os
tar=tarfile.open(r'C:\sonu\test\text.tar','r')
print(os.listdir(r'C:\sonu\test'))
tar.getnames()
tar.extract('sonu/test/2023-CTE-Archie.xlsx',path='C:\sonu')

extract all file:

import tarfile
import os
tar=tarfile.open(r'C:\sonu\test\text.tar','r')
print(os.listdir(r'C:\sonu\test'))
tar.getnames()
tar.extractall(path='C:\sonu')


To extract a file object for reading or writing, use .extractfile(), which takes a filename or TarInfo object to extract as an argument. .extractfile() returns a file-like object that can be read and used:

import tarfile
import os
tar=tarfile.open(r'C:\sonu\test\text.tar','r')
print(os.listdir(r'C:\sonu\test'))
print(tar.getnames())
t=tar.extractfile(r'sonu/test/2023-CTE-Archie.xlsx')
print(t.read())
t.close

Working with compressed Archive:

For example, to read or write data to a TAR archive compressed using gzip, use the 'r:gz' or 'w:gz' modes respectively:

files = ['app.py', 'config.py', 'tests.py']
>>> with tarfile.open('packages.tar.gz', mode='w:gz') as tar:
...     tar.add('app.py')
...     tar.add('config.py')
...     tar.add('tests.py')

>>> with tarfile.open('packages.tar.gz', mode='r:gz') as t:
...     for member in t.getmembers():
...         print(member.name)
app.py
config.py
tests.py


tarfile can also read and write TAR archives compressed using gzip, bzip2, and lzma compression. To read or write to a compressed archive, use tarfile.open(), passing in the appropriate mode for the compression type.

An easy way of creating Archive:


import shutil

# shutil.make_archive(base_name, format, root_dir)
shutil.make_archive('data/backup', 'tar', 'data/')


To extract the archive, call .unpack_archive():

shutil.unpack_archive('backup.tar', 'extract_dir/')

Python- Working on File:


Read File:

with open('C:\sonu\office\CHAT GPT AI.txt','r') as r:
    data=r.read()
    print(data)

Search Specific string in File: 

with open('C:\sonu\office\CHAT GPT AI.txt','r') as r:
    data=r.read()
    word='SONU KUMAR'
    if word in data:
        print('exists')
    else:
        print('not exists')
    print(data)

Search Specific string and its line number in File: 

readlines() method to get all lines from a file in the form of a list object.

 word='complete'
f=open(r'C:\sonu\office\working.txt','r')
l=f.readlines()
for line in l:
    if line.find(word)!=-1:
        print(l.index(line))

Read Specific Line:

import linecache

# read fifth line
line = linecache.getline(r'C:\sonu\office\CHAT GPT AI.txt',3)
print(line)