Creating a SQLite Database from CSV with Python
When it comes to data analysis, one of the most essential steps is to have the data in a structured format. Using SQLite, we can organize our data into tables and easily query and manipulate it. But what if we have our data in a CSV file? In this article, we will discuss how to create a SQLite database from a CSV file using Python.
Getting Started
First, we need to have SQLite installed. We can check if it is already installed on our system by opening a terminal and typing in the following command:
sqlite3
If it is installed, a prompt will show up, indicating that we can start executing SQLite commands. If it is not installed, we can download the appropriate version for our operating system from the SQLite download page.
Next, we need to have the SQLite3 module installed in Python. We can check if it is installed by opening a Python terminal and typing in the following command:
import sqlite3
If it is installed, no error message will show up. If it is not installed, we can install it using pip:
pip install pysqlite3
Creating a SQLite Database
Now that we have SQLite and the SQLite3 module installed, we can start creating our SQLite database.
First, let’s create a new Python script and import the SQLite3 module:
import sqlite3
Next, we need to establish a connection to the database. We can do this using the connect
function from the SQLite3 module. This function takes a single argument, which is the name of the database file. If the file does not exist, it will be created.
conn = sqlite3.connect('mydatabase.db')
We can then create a cursor object using the cursor
method of the connection object:
cursor = conn.cursor()
The cursor object allows us to execute SQL commands on the database. For example, we can create a table called employees
with the following SQL command:
cursor.execute('''CREATE TABLE employees
(id INTEGER PRIMARY KEY,
name TEXT,
salary REAL)''')
This SQL command creates a table called employees
with three columns: id
, name
, and salary
. The id
column is an integer and serves as the primary key for the table. The name
column is a text column, and the salary
column is a real column.
After executing the command, we need to commit the changes by calling the commit
method of the connection object:
conn.commit()
We can then close the connection by calling the close
method:
conn.close()
Our script should now look something like this:
import sqlite3
conn = sqlite3.connect('mydatabase.db')
cursor = conn.cursor()
cursor.execute('''CREATE TABLE employees
(id INTEGER PRIMARY KEY,
name TEXT,
salary REAL)''')
conn.commit()
conn.close()
If we run this script, it will create a new SQLite database file called mydatabase.db
in the same directory as our script and create a table called employees
.
Importing Data from CSV
Now that we have created our SQLite database, we can start importing data from a CSV file.
Let’s say we have a CSV file called employees.csv
with the following contents:
id,name,salary
1,John Doe,50000.0
2,Jane Doe,60000.0
3,Bob Smith,70000.0
We want to import this data into our employees
table in the SQLite database.
We can open the CSV file using Python’s built-in csv
module and read the data into a list of tuples. Each tuple represents a row of data, and the elements of the tuple represent the values of the columns in that row.
Here’s the code to import the data from the CSV file:
import csv
import sqlite3
conn = sqlite3.connect('mydatabase.db')
cursor = conn.cursor()
# create table
cursor.execute('''CREATE TABLE IF NOT EXISTS employees
(id INTEGER PRIMARY KEY,
name TEXT,
salary REAL)''')
# read data from CSV file and insert into table
with open('employees.csv', 'r') as f:
reader = csv.reader(f)
next(reader) # skip header row
for row in reader:
cursor.execute('''INSERT INTO employees (id, name, salary)
VALUES (?, ?, ?)''', (row[0], row[1], row[2]))
conn.commit()
conn.close()
Let’s go through this code step by step.
First, we import the csv
and sqlite3
modules.
Next, we establish a connection to the SQLite database and create a cursor object.
conn = sqlite3.connect('mydatabase.db')
cursor = conn.cursor()
We then create the employees
table if it does not already exist.
cursor.execute('''CREATE TABLE IF NOT EXISTS employees
(id INTEGER PRIMARY KEY,
name TEXT,
salary REAL)''')
The IF NOT EXISTS
clause ensures that the table is only created if it does not already exist.
Next, we open the employees.csv
file and read its contents using the csv.reader
function. We skip the header row using the next
function, as it does not contain any data.
with open('employees.csv', 'r') as f:
reader = csv.reader(f)
next(reader) # skip header row
We then iterate over the remaining rows in the file and insert each row into the employees
table using an INSERT SQL command. We use a tuple to pass the values of each row to the execute
method of the cursor object.
for row in reader:
cursor.execute('''INSERT INTO employees (id, name, salary)
VALUES (?, ?, ?)''', (row[0], row[1], row[2]))
Finally, we commit the changes to the database and close the connection.
conn.commit()
conn.close()
If we run this script, it will create the employees
table (if it does not already exist) and import the data from the employees.csv
file into the table.
Conclusion
In this article, we learned how to create a SQLite database from a CSV file using Python. We first created a new SQLite database and a table within it. We then imported data from a CSV file into the table using Python’s csv
module and SQLite’s SQL commands. By combining these techniques, we can easily manage and manipulate structured data for our data analysis needs.