Social science researchers collect much of their data through online surveys. In many cases, they offer incentives to the participants. These incentives can take the form of lotteries for more valuable prices or individual gift card codes. We are doing the latter in our studies here at CEPA Labs at Stanford. Specifically, our survey participants receive a gift card code from Amazon.
However, sending these gift card codes to respondents is challenging. In Qualtrics, our online survey platform, we can embed a code for each potential respondent, and then trigger an email with the code attached after their survey is completed. While this is a very convenient feature, it has one substantial drawback. We need to purchase all codes up front, yet many participants may never answer. There is an option of waiting until the survey is closed and then purchasing the codes for all respondents at once. However, respondents tend to become impatient if they do not receive their code in a timely manner and start reaching out to you and possibly to the IRB office. This creates administrative work and might reduce response rates if potential respondents can talk to each other about their experience. Furthermore, we reach most of our participants with text messages and have found from experience that emails often go unnoticed or end up in the spam folder.
Given these problems, we decided to send codes using python and the Qualtrics API. This way, we can send codes immediately and do not need to purchase all codes upfront. We used the Amazon Incentives API, which allows its users to request gift card codes on demand. Codes are generated on the fly, and the amount is charged to our account. An alternative would be Giftbit, which even allows its users to send gift links with different gift card providers as option.
I believe this approach would be useful to many social science researchers. After I had developed the program, we immediately thought of a second project where we already had enough codes for the anticipated 70% response rate. We stored the codes in a .csv file.
In this post, for simplicity, I will describe a python program that gets the codes from a .csv file and sends them out by email. The other (original) program fetched the codes from the Amazon Incentives API and sent them over the EZtexting.com API. Both versions can be found on GitHub. It is also possible to send text messages per email.
The program checks Qualtrics continuously for new responses and then sends each new respondent a code. In a loop, it downloads (new) responses, writes their contact information to an SQL database, assigns a code, and adds it to the data base, and then sends the codes by email. The program is written in a way that guarantees that if the program gets interrupted, it can just be executed again without any problem.
Before I go into detail, here is a quick disclaimer. I am not a professional python developer. If you see ways how to improve the program, please let me know. I am always happy to learn new things. I also will not take any responsibility for the proper functioning of the program. If you decide to use it, it is your responsibility to adapt it to your application and thoroughly test it. Furthermore, you can find more scripts to work with the Qualtrics API on GitHub. I used python 2.7.11 (64-bit). SQLite needs to be installed.
Let’s get started with importing the required packages and setting up some parameters. The location for the survey download, the SQLite data base, the backups, and the file containing the codes need to be specified. All elements of the code to be changed are bold.
import requests
import zipfile
import pandas as pd
import os
import sqlite3
import datetime
import time
import shutil
import smtplib
import sys
from email.MIMEMultipart import MIMEMultipart
from email.MIMEText import MIMEText
from email.Utils import COMMASPACE, formatdate
# Set path of the main folder
path= 'D:/YOUR PROJECT FOLDER'
os.chdir(path)
# Setting path for data export and storage
SurveyExportPath = path + '/ DownloadFolder'
SQLDatabasePath = path + '/DataBase/'
SQLBackupPath = path + '/DataBase/Archive/'
SQLDatabaseName = 'ResponseTracker.sqlite'
# Set path for files that holds gift card codes
CODEPath = path + '/---Your file with codes---.csv'
Next, I declare which columns of the Qualtrics data will be loaded into memory later.You can adjust the program, for instance, to allow for different gift card code amounts, send the codes to phone numbers, check if the respondent has answered sufficient questions, or include their name in the email. To do so, you will have to upload that information within the contact lists, embed it within the Qualtrics survey flow, and adjust the columns here to import the columns containing the respective information.
# Columns to include when read in data from surveys
pdcolumns = ['ResponseID', 'ExternalDataReference', 'RecipientEmail']
Here, you need to declare a dictionary using Qualtrics survey IDs and survey names for all surveys that you want to include. The survey IDs are the keys. You can find the IDs in the Qualtrics account settings. Note that it is important to add a ‘.csv’ to the name. This is necessary because the survey data is downloaded from Qualtrics as “SURVEYNAME.csv”.
# List survey is Ids and file extensions in dictionary. Possible to include multiple surveys
surveyIdsDic = {'---Your survey ID 1---': '---Your survey name 1---.csv',
'---Your survey ID 2---': '---Your survey name 2---.csv',
'---Your survey ID 3---': '---Your survey name 3---.csv'}
surveyIds = surveyIdsDic.keys()
The next variables hold the number of times this script checks for new answers and sends out the codes, the time it waits between each iteration in seconds, and the number of iterations it waits before creating a backup of the SQL database. With these settings, the program would run for at least 695 days (not counting the time the program takes for each iteration) and create a backup every five minutes.
# Number of repetitions for the loop, time to wait between each iteration in seconds
reps = 1000000
waittime = 60
backupn = 300
Next, we need to set up the parameters of the Qualtrics API call. You have to declare your API Token, which you can find in the account settings in your Qualtrics account, the format of the downloaded file (.csv), and your data center. The following code declares the API URL and the information to be sent in the API call.
# Setting user Parameters for Qualtrics API
# Add you Qualtrics token and survey ids
apiToken = "---Your Token---"
fileFormat = "csv"
dataCenter = "---Your data center ---"
# Setting static parameters for Qualtrics API
baseUrl = "https://" + dataCenter + ".qualtrics.com/API/v3/responseexports/".format(dataCenter)
headers = {
"content-type": "application/json",
"x-api-token": apiToken,
}
The program defines a function to create the message to the respondent containing the gift code. In this simple version, it only takes the code as parameter. It could be easily extended to be personalized with the respondent’s name. The message is in HTML and can be styled with in-line CSS.
# Set email message to send the code in HTML
def genMessage(code):
message= """""" <html>
<head></head>
<body>
<strong>Thank you!</strong>
Thank you very much for taking the time to complete our survey!
Please accept the electronic gift certificate code below.
<span style="font-weight: bold; font-size=1.5em; text-align:center;">""" \
+ code + \
"""</span>
Thank you again
</body>
</html>"""
return message
The program then defines a function to send the message to the respondent by email. You need to specify your correspondence email address and your SMTP host. It could potentially take a list of email addresses. However, that will not be the case in this program.
# Specify you email host and email address
def sendMail(to, subject, text):
assert type(to)==list
fro = "---Your email address--- "
#add your correspondence email address to send out the code
msg = MIMEMultipart()
msg['From'] = fro
msg['To'] = COMMASPACE.join(to)
msg['Date'] = formatdate(localtime=True)
msg['Subject'] = subject
msg.attach(MIMEText(text, 'html'))
smtp = smtplib.SMTP('---Your mail host---') #add your SMTP Host
smtp.sendmail(fro, to, msg.as_string() )
smtp.close()
The next function creates a backup of the SQL database.
def createBackup(path, database):
# Check if the provdied path is valid
if not os.path.isdir(path):
raise Exception("Backup directory does not exist: {}".format(path))
# Define file name for the backup, includes date and time
backup_file = os.path.join(path, 'backup' +
time.strftime("-%Y%m%d-%H%M")+ '.sqlite')
# Lock database before making a backup
cur.execute('begin immediate')
# Make new backup file
shutil.copyfile(database, backup_file)
print ("\nCreating {}...".format(backup_file))
# Unlock database
sqlconn.rollback()
Next, the program sets up the SQL database schema. It tries to connect to the database. If the database does not exist, the program creates two tables “respondents” and “surveys.” “Respondents” tracks the responses and the gift codes. “Surveys” tracks the surveys that have been answered. If the database exists, nothing happens. This SQL code assures that no respondent is paid twice or that the same gift card code is sent to more than one respondent by accident. The attribute UNIQUE guarantees that the program only creates a new record if the response ID, the respondent ID, the email address, and the gift card code do not already exist.
# SQL schema
# database path + file name
database = SQLDatabasePath+SQLDatabaseName
# Connect to SQLite API
sqlconn = sqlite3.connect(database)
cur = sqlconn.cursor()
# Execute SQL code to create new database with a table for respondents and for surveys
# If database and these tables already exist, nothing will happen
cur.executescript('''
CREATE TABLE IF NOT EXISTS Survey (
id INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT UNIQUE,
name TEXT UNIQUE
);
CREATE TABLE IF NOT EXISTS Respondent (
id INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT UNIQUE,
individual_id INTEGER UNIQUE,
response_id TEXT UNIQUE,
survey_id INTEGER,
email TEXT UNIQUE,
create_date DATETIME DEFAULT (DATETIME(CURRENT_TIMESTAMP, 'LOCALTIME')),
redeem_code TEXT UNIQUE,
sentstatus INTEGER,
sent_date DATETIME
)
''')
# Commit (save) changes to database
sqlconn.commit()
After all the setup, the actual program starts here. Everything below will be repeated as many times as specified above. The program prints the date, time, and the number of the iteration to the console so that we can check if it’s running properly.
# Everything below will be repeated for the specified number of iterations
for t in xrange(reps):
# Provide some information in the console
print "----------"
print "Iteration :" , t, "Date: ", datetime.datetime.now().strftime("%m/%d/%y %H:%M:%S")
First, previously downloaded survey data files are deleted. This step is not necessary as the next download would replace these files. However, often enough surveys are not downloaded due to API connection problems. In that case, the old and already processed files would be processed again. That would slow down the program.
# Delete all files in extraction path, in case they don’t get downloaded in this iteration
deletelist = [os.path.join(subdir, file) for subdir, dirs, files in os.walk(SurveyExportPath)
for file in files]
for file in deletelist:
os.remove(file)
The next bit of code downloads the responses for each survey separately. It iterates over each survey. First, it tries to collect the last response ID from the SQL database from the last respondent of the respective survey. If someone has answered the survey before, the program passes the ID of the last response as parameters to the Qualtrics API call. In that case, the program only downloads responses that happened afterwards. If no one has answered the survey before, an error exception handler sets the parameters such that all responses will be downloaded. It then initiates the export. Once the export is complete, a zip file with the data is downloaded. The program includes several exceptions for API connection errors and only allows the export to take up to three minutes to make sure it does not get hung up in this step. If an error occurs, the survey will not be downloaded in this iteration, but the program will try again in the next iteration. The program then unzips the zip file.
# Iterate over survey IDs to download each one separately
for surveyId in surveyIds:
survey = surveyIdsDic[surveyId] # Identify the survey
try:
# Fetch last response id from database, used to download only new responses
cur.execute('''SELECT response_id FROM Respondent
WHERE id == (SELECT max(id) FROM Respondent
WHERE survey_id == (SELECT id FROM Survey WHERE name == ? )) ''', (survey,))
lastResponseId=cur.fetchone()[0]
# Set parameters to send to Qualtrics API
downloadRequestPayload = '{"format":"' + fileFormat + '","surveyId":"' + surveyId + '","lastResponseId":"' + lastResponseId +'"}'
# Set exception for case that no one has answered to this survey yet
except (TypeError, sqlconn.OperationalError) as e:
print e
# Set parameters without specifying last response id (all responses will be downloaded)
downloadRequestPayload = '{"format":"' + fileFormat + '","surveyId":"' + surveyId +'"}'
downloadRequestUrl = baseUrl
try:
# Connect to Qualtrics API and send download request
downloadRequestResponse = requests.request("POST", downloadRequestUrl, data=downloadRequestPayload, headers=headers)
progressId = downloadRequestResponse.json()["result"]["id"]
# Checking on data export progress and waiting until export is ready
startlooptime = time.time() # Record time to make sure the loop doesn't run forever
requestCheckProgress = 0
# As long as export not complete, keep checking
while requestCheckProgress > 100: # Abort if download takes more than 3 minutes
print "Download took more than three minutes. Try again next iteration."
complete = 0
break
# If export complete, download and unzip file
if complete==1:
requestDownloadUrl = baseUrl + progressId + '/file'
requestDownload = requests.request("GET", requestDownloadUrl, headers=headers, stream=True)
with open("RequestFile.zip", "wb+") as f:
for chunk in requestDownload.iter_content(chunk_size=1024):
f.write(chunk)
except (KeyError, requests.ConnectionError, ValueError, IOError) as e:
# Something went wrong with the Qualtrics API (retry)
print "Survey not downloaded: ", e
continue
try:
zipfile.ZipFile("RequestFile.zip").extractall(SurveyExportPath)
except (zipfile.BadZipfile, IOError) as e:
print SurveyExportPath
print "Zipfile not extracted: ", e
continue
Once the files have been downloaded and unzipped, the program loops over all downloaded files. It loads the data of each file into memory, and if new responses are in the data, it adds the them to the SQL database. The IGNORE in the SQL code makes sure that, if the response has already been recorded, it will not be added to the data base without causing an error. The program saves the changes to the SQL database after each record is created. This choice reduces speed but avoids problems from any possible connection errors.
SurveyFileList = [os.path.join(subdir, file) for subdir, dirs, files in os.walk(SurveyExportPath)
for file in files]
for file in SurveyFileList:
data=pd.read_csv(file, encoding = 'utf-8-sig', sep = ',',
usecols = pdcolumns,
low_memory=False, error_bad_lines = False)
data=data.iloc[2:,:].reset_index(drop=True)
survey = file.split('\\')[-1] # Identify the survey
if len(data.index)>0: # Only if new responses are recorded
for row in xrange(0, len(data.index)):
individual_id = data.loc[row, 'ExternalDataReference']
response_id = data.loc[row, 'ResponseID']
email = data.loc[row, 'RecipientEmail']
# Record the survey name
cur.execute(''' INSERT or IGNORE INTO Survey
(name)
VALUES (?)''',
(survey,))
# Fetch survey id to enter in response table
cur.execute('''SELECT id FROM Survey WHERE name == ? ''', (survey,))
survey_id=cur.fetchone()[0]
cur.execute(''' INSERT or IGNORE INTO Respondent
(email, individual_id, response_id, survey_id)
VALUES (?,?,?,?)''',
(email, individual_id, response_id, survey_id))
sqlconn.commit()
After all new responses have been added to the SQL database, the program assigns a gift card code to each new respondent. It first collects all respondents without a gift card code from the SQL database. It then loads the file with the gift card codes into memory and fetches the last gift card code that had been assigned to a respondent. It then selects the codes that come after this last code and assigns them to the new respondents. If no codes have been assigned before, an error is raised and the exception handler makes sure that the first code is used. The newly assigned codes are added to the SQL database. The program counts the number of codes assigned and logs it to the console after all new respondents are assigned a code.
# Select new respondents who need a code
cur.execute('''SELECT id FROM Respondent WHERE redeem_code IS NULL ''')
NeedGiftCards = cur.fetchall()
numCodesAssigned = 0 # Count number of the codes assigned
if len(NeedGiftCards)>0: # Only if new respondents
# Import csv file that holds the codes
allcodes=pd.read_csv(CODEPath, encoding = 'utf-8-sig', sep = ',',
low_memory=False, error_bad_lines = False)
# Identify last redeem code used
try:
cur.execute('''SELECT redeem_code FROM Respondent
WHERE id == (SELECT max(id) FROM Respondent
WHERE redeem_code IS NOT NULL)''')
lastcode = cur.fetchone()[0]
row=allcodes[allcodes.code==lastcode].index.values[0]
# Get index value for last code
except TypeError:
row = -1
usecodes=allcodes[allcodes.index>row] # Select all codes after that value
for needcard in NeedGiftCards:
row +=1
# Extract data
sqlDB_id = needcard[0]
redeem_code=usecodes.code[row]
numCodesAssigned += 1
# Add code to SQL database
cur.execute(''' UPDATE Respondent
SET redeem_code = ?
WHERE id == ?''',
(redeem_code, sqlDB_id))
sqlconn.commit()
print 'Number of gift card codes assigned:', numCodesAssigned
Finally, the program fetches all new responses for which the codes have not been sent out yet. For each respondent, it creates a message including the code and sends it via email. It counts the number of emails sent and logs it to the console to keep track.
# Getting all contacts and codes for which the code has not been sent
cur.execute('''SELECT id, email, redeem_code FROM Respondent
WHERE redeem_code IS NOT NULL and sentstatus IS NULL''')
contacts = cur.fetchall()
numCodesSent = 0 # Count the number of codes sent
if len(contacts)>0: # Only new respondents
for contact in contacts:
sqlDB_id = contact[0]
email = contact[1]
code = contact[2]
message = genMessage(code)
TOADDR = [email]
# Send message
try:
sendMail(TOADDR, "Thank you for your participation!", message)
numCodesSent += 1
sentstatus = 1
cur.execute(''' UPDATE Respondent
SET sentstatus = ?,
sent_date=datetime(CURRENT_TIMESTAMP, 'localtime')
WHERE id == ?''',
(sentstatus, sqlDB_id))
sqlconn.commit()
except:
e = sys.exc_info()[0]
print "Error:", e
continue
print 'Number of codes sent:', numCodesSent
The very last step is to create a backup if it’s time. After all iterations are done, the program closes the SQL database connection.
if t % backupn == 0:
"""Create timestamped database copy"""
createBackup(SQLBackupPath, database)
time.sleep(waittime)
# Close SQL connection
sqlconn.close()
I hope this program can help you with your projects. Feel free to contact me with questions or suggestions.
This article has first been published on the CEPA Labs blog.