Study Overview

This collection of notebooks surrounds a diary study that I assisted with. Diary studies are longitudinal studies in which participants routinely complete surveys throughout. These are especially useful for tracking things such as mood, exercise/diet habits, and for our intents and purposes, social media usage.

In this notebook, I’ll be walking through the process of converting the raw data from wide to long format. All of the accompanying datasets can be found here.

In the initial survey (day 0), participants were asked to give several different types of ratings for 6 social media platforms: Facebook, Youtube, Snapchat, Instagram, Twitter, and Pinterest. Aside from the IBIS measure, several ubiquitous market research measures were used:

  1. Purchase Intention (“_use”)
  2. Overall opinion (“_op”)
  3. Likelihood to recomment (“_rec”)

The IBIS measure is adapted from the Inclusion of Others in Self (IOS) scale, originally developed as a way to measure to measure intimacy. Individuals are simply asked about their relationship with a partner and shown a series of increasingly overlapping circles - the idea being that more overlap will indicate stronger relationships.

linearly separable data

Adaptation of this measure was originally proposed following the rise in the theoretical perspective of brand relationships, or the notion that individuals develop relationships with brands similarly to ones formed with other people. The bulk of the work concerning this measure had been mostly conceptual, and to date, our study would be the first using actual usage data.

We had 2 main questions:

  1. Does IBIS prospectively predict usage?
  2. If so, how does it perform relative to other widely used measures?

Day 0 Transformation

First, we’ll need to import the libraries that we will need. The “set_option” parameters are there to make viewing all of the columns easier.

There were several components to the overall experiment, and this current cleaning/analysis will focus on social media platforms.

import numpy as np
import pandas as pd
pd.set_option('display.max_rows', 500)
pd.set_option('display.max_columns', 500)
pd.set_option('display.width', 1000)

Let’s take a look at the data.

day_zero = pd.read_csv('day_zero.csv')

day_zero.head()
Unnamed: 0 age ethnicity gender fsc_1 fsc_2 fsc_3 fsc_4 aquafina_percep dasani_percep fiji_percep starbucks_percep coffee_bean_percep dunkin_donuts_percep freudian_sip_percep twitter_percep snapchat_percep youtube_percep coke_percep sprite_percep pepsi_percep budweiser_percep miller_lite_percep corona_percep blue_moon_percep coors_percep facebook_percep instagram_percep pinterest_percep coke_percep.1 sprite_percep.1 pepsi_percep.1 budweiser_percep.1 miller_lite_percep.1 corona_percep.1 blue_moon_percep.1 coors_percep.1 facebook_percep.1 instagram_percep.1 pinterest_percep.1 aquafina_percep.1 dasani_percep.1 fiji_percep.1 starbucks_percep.1 coffee_bean_percep.1 dunkin_donuts_percep.1 freudian_sip_percep.1 twitter_percep.1 snapchat_percep.1 youtube_percep.1 corona_buy blue_moon_buy aquafina_buy dasani_buy fiji_buy starbucks_buy coffee_bean_buy dunkin_donuts_buy freudian_sip_buy coke_buy sprite_buy pepsi_buy budweiser_buy coors_buy miller_lite_buy coke_buy.1 sprite_buy.1 pepsi_buy.1 budweiser_buy.1 coors_buy.1 miller_lite_buy.1 corona_buy.1 blue_moon_buy.1 aquafina_buy.1 dasani_buy.1 fiji_buy.1 starbucks_buy.1 coffee_bean_buy.1 dunkin_donuts_buy.1 freudian_sip_buy.1 facebook_use youtube_use snapchat_use twitter_use instagram_use pinterest_use pinterest_use.1 instagram_use.1 twitter_use.1 snapchat_use.1 youtube_use.1 facebook_use.1 corona_rec blue_moon_rec aquafina_rec dasani_rec fiji_rec starbucks_rec coffee_bean_rec dunkin_donuts_rec freudian_sip_rec coke_rec sprite_rec pepsi_rec budweiser_rec coors_rec miller_lite_rec facebook_rec youtube_rec snapchat_rec twitter_rec instagram_rec pinterest_rec pinterest_rec.1 instagram_rec.1 twitter_rec.1 snapchat_rec.1 youtube_rec.1 facebook_rec.1 miller_lite_rec.1 coors_rec.1 budweiser_rec.1 pepsi_rec.1 sprite_rec.1 coke_rec.1 freudian_sip_rec.1 dunkin_donuts_rec.1 coffee_bean_rec.1 starbucks_rec.1 fiji_rec.1 dasani_rec.1 aquafina_rec.1 blue_moon_rec.1 corona_rec.1 corono_op blue_moon_op aquafina_op dasani_op fiji_op starbucks_op coffee_bean_op dunkin_donuts_op freudian_sip_op coke_op sprite_op pepsi_op budweiser_op coors_op miller_lite_op facebook_op youtube_op snapchat_op instagram_op twitter_op pinterest_op pinterest_op.1 instagram_op.1 twitter_op.1 snapchat_op.1 youtube_op.1 facebook_op.1 miller_lite_op.1 coors_op.1 budweiser_op.1 pepsi_op.1 sprite_op.1 coke_op.1 freudian_sip_op.1 dunkin_donuts_op.1 coffee_bean_op.1 starbucks_op.1 fiji_op.1 dasani_op.1 aquafina_op.1 blue_moon_op.1 corona_op pos_1 pos_2 pos_3 pos_4 pos_5 neg_1 neg_2 neg_3 neg_4 neg_5 neg_6 neg_7 neg_8 participant
0 0 18 1 1 3 3 2 2 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 4.0 3.0 3.0 1.0 1.0 1.0 1.0 1.0 4.0 7.0 3.0 3.0 4.0 3.0 4.0 4.0 5.0 3.0 4.0 7.0 7.0 5.0 5.0 2.0 2.0 3.0 2.0 2.0 3.0 2.0 2.0 2.0 2.0 5.0 5.0 5.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 2.0 1.0 5.0 1.0 1.0 1.0 6.0 6.0 4.0 4.0 4.0 3.0 3.0 3.0 3.0 3.0 3.0 3.0 6.0 6.0 6.0 3.0 1.0 1.0 2.0 1.0 2.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 2.0 1.0 1.0 1.0 1.0 1.0 3.0 3.0 3.0 2.0 1.0 2.0 2.0 1.0 2.0 2.0 4.0 3.0 3.0 3.0 3.0 1 3 3 5 3 3 2 3 3 2 2 2 1 0
1 1 18 1 2 3 2 2 2 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 1.0 4.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 7.0 1.0 2.0 1.0 2.0 4.0 2.0 2.0 3.0 1.0 7.0 7.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 4.0 3.0 4.0 5.0 5.0 5.0 5.0 5.0 3.0 3.0 4.0 3.0 3.0 4.0 2.0 5.0 2.0 1.0 6.0 1.0 6.0 NaN NaN NaN NaN NaN NaN 7.0 7.0 3.0 4.0 3.0 2.0 2.0 2.0 3.0 5.0 2.0 5.0 7.0 7.0 7.0 5.0 2.0 2.0 5.0 2.0 6.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 3.0 3.0 2.0 3.0 2.0 2.0 2.0 2.0 2.0 4.0 2.0 4.0 3.0 3.0 3.0 3.0 2.0 2.0 2.0 2.0 3.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 1 1 9 1 1 1 1 4 1 3 1 1 1 1
2 2 18 6 2 5 5 1 1 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 1.0 2.0 1.0 1.0 1.0 1.0 1.0 1.0 2.0 5.0 1.0 5.0 6.0 4.0 1.0 4.0 1.0 6.0 6.0 4.0 5.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 5.0 4.0 5.0 5.0 5.0 5.0 5.0 5.0 1.0 2.0 3.0 5.0 2.0 5.0 1.0 NaN NaN NaN NaN NaN NaN 6.0 1.0 1.0 1.0 1.0 2.0 7.0 7.0 1.0 1.0 1.0 7.0 1.0 7.0 2.0 7.0 6.0 7.0 7.0 7.0 7.0 4.0 1.0 1.0 1.0 1.0 5.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 3.0 1.0 1.0 2.0 1.0 2.0 5.0 5.0 5.0 5.0 4.0 5.0 2.0 5.0 1.0 5.0 1.0 1.0 1.0 5.0 5.0 1 1 1 1 1 1 1 1 1 1 1 1 1 2
3 3 18 5 1 7 2 1 4 3.0 3.0 3.0 1.0 1.0 1.0 4.0 4.0 5.0 7.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0 4.0 1.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 5.0 5.0 5.0 5.0 5.0 5.0 5.0 5.0 3.0 3.0 3.0 5.0 5.0 5.0 2.0 NaN NaN NaN NaN NaN NaN 6.0 3.0 4.0 1.0 1.0 6.0 7.0 7.0 4.0 4.0 4.0 7.0 7.0 3.0 2.0 7.0 7.0 7.0 7.0 7.0 7.0 7.0 1.0 3.0 2.0 4.0 7.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 4.0 3.0 3.0 3.0 3.0 3.0 3.0 2.0 2.0 5.0 5.0 5.0 3.0 3.0 3.0 5.0 4.0 4.0 3.0 2.0 3.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 7 6 9 6 1 3 1 1 1 1 1 1 1 3
4 4 18 4 1 4 1 3 2 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 4.0 3.0 2.0 1.0 1.0 1.0 1.0 1.0 3.0 7.0 1.0 3.0 4.0 3.0 6.0 4.0 4.0 4.0 3.0 5.0 7.0 4.0 4.0 3.0 3.0 2.0 1.0 2.0 3.0 3.0 4.0 3.0 3.0 4.0 4.0 4.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 4.0 1.0 1.0 3.0 1.0 5.0 NaN NaN NaN NaN NaN NaN 7.0 7.0 3.0 3.0 3.0 1.0 2.0 3.0 1.0 3.0 2.0 3.0 7.0 7.0 7.0 3.0 3.0 1.0 1.0 1.0 5.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 3.0 2.0 2.0 2.0 3.0 3.0 4.0 4.0 4.0 2.0 2.0 3.0 1.0 2.0 3.0 1.0 1.0 2.0 3.0 4.0 4.0 3 6 8 6 7 2 6 1 4 2 3 6 1 4

Now, we need to grab the relevant social media columns.

#Getting social media columns and sorting
sm_cols = [col for col in day_zero if 'participant' in col or 'facebook' in col
          or 'youtube' in col or 'snapchat' in col or 'instagram' in col
          or 'twitter' in col or 'pinterest' in col]
sm_cols.sort()

day_zero_sm = day_zero[sm_cols]
day_zero_sm.head()
facebook_op facebook_op.1 facebook_percep facebook_percep.1 facebook_rec facebook_rec.1 facebook_use facebook_use.1 instagram_op instagram_op.1 instagram_percep instagram_percep.1 instagram_rec instagram_rec.1 instagram_use instagram_use.1 participant pinterest_op pinterest_op.1 pinterest_percep pinterest_percep.1 pinterest_rec pinterest_rec.1 pinterest_use pinterest_use.1 snapchat_op snapchat_op.1 snapchat_percep snapchat_percep.1 snapchat_rec snapchat_rec.1 snapchat_use snapchat_use.1 twitter_op twitter_op.1 twitter_percep twitter_percep.1 twitter_rec twitter_rec.1 twitter_use twitter_use.1 youtube_op youtube_op.1 youtube_percep youtube_percep.1 youtube_rec youtube_rec.1 youtube_use youtube_use.1
0 NaN 1.0 NaN 4.0 3.0 NaN NaN 1.0 NaN 1.0 NaN 7.0 1.0 NaN NaN 1.0 0 NaN 2.0 NaN 3.0 2.0 NaN NaN 2.0 NaN 1.0 NaN 7.0 1.0 NaN NaN 1.0 NaN 1.0 NaN 4.0 2.0 NaN NaN 5.0 NaN 1.0 NaN 7.0 1.0 NaN NaN 1.0
1 3.0 NaN NaN 1.0 5.0 NaN 5.0 NaN 2.0 NaN NaN 7.0 2.0 NaN 1.0 NaN 1 3.0 NaN NaN 1.0 6.0 NaN 6.0 NaN 2.0 NaN NaN 7.0 2.0 NaN 1.0 NaN 2.0 NaN NaN 1.0 5.0 NaN 6.0 NaN 2.0 NaN NaN 7.0 2.0 NaN 2.0 NaN
2 NaN 2.0 NaN 2.0 4.0 NaN NaN 2.0 NaN 1.0 NaN 5.0 1.0 NaN NaN 1.0 2 NaN 3.0 NaN 1.0 5.0 NaN NaN 6.0 NaN 2.0 NaN 4.0 1.0 NaN NaN 1.0 NaN 1.0 NaN 6.0 1.0 NaN NaN 1.0 NaN 1.0 NaN 5.0 1.0 NaN NaN 1.0
3 5.0 NaN 1.0 NaN 7.0 NaN NaN 6.0 3.0 NaN 4.0 NaN 4.0 NaN NaN 3.0 3 3.0 NaN 1.0 NaN 7.0 NaN NaN 6.0 4.0 NaN 5.0 NaN 3.0 NaN NaN 1.0 2.0 NaN 4.0 NaN 2.0 NaN NaN 4.0 4.0 NaN 7.0 NaN 1.0 NaN NaN 1.0
4 NaN 3.0 NaN 3.0 3.0 NaN 4.0 NaN NaN 2.0 NaN 7.0 1.0 NaN 1.0 NaN 4 NaN 3.0 NaN 1.0 5.0 NaN 5.0 NaN NaN 2.0 NaN 5.0 1.0 NaN 1.0 NaN NaN 2.0 NaN 3.0 1.0 NaN 3.0 NaN NaN 3.0 NaN 7.0 3.0 NaN 1.0 NaN

Brand/platform ratings were counterbalanced, which is why there are so many missing values. To simplify the columns for analysis, I’ll simply fill the NAs with 0s and add each respective platform’s columns together.

platforms = ['facebook', 'youtube', 'snapchat', 'instagram', 'twitter', 'pinterest']
measures = ['op', 'percep', 'rec', 'use']

for platform in platforms:
    for measure in measures:
        day_zero_sm[f'{platform}_{measure}_final'] = (
        day_zero_sm.filter(like = platform)
         .filter(like = measure)
         .sum(axis = 1)
        )

#Filtering and renaming 'final' columns
day_zero_sm_fin = day_zero_sm[[col for col in day_zero_sm.columns if '_final' in col
                              or 'participant' in col]]
day_zero_sm_fin.columns = day_zero_sm_fin.columns.str.rstrip('_final')
day_zero_sm_fin.head()
participant facebook_op facebook_percep facebook_rec facebook_use youtube_op youtube_percep youtube_rec youtube_use snapchat_op snapchat_percep snapchat_rec snapchat_use instagram_op instagram_percep instagram_rec instagram_use twitter_op twitter_percep twitter_rec twitter_use pinterest_op pinterest_percep pinterest_rec pinterest_use
0 0 1.0 4.0 3.0 1.0 1.0 7.0 1.0 1.0 1.0 7.0 1.0 1.0 1.0 7.0 1.0 1.0 1.0 4.0 2.0 5.0 2.0 3.0 2.0 2.0
1 1 3.0 1.0 5.0 5.0 2.0 7.0 2.0 2.0 2.0 7.0 2.0 1.0 2.0 7.0 2.0 1.0 2.0 1.0 5.0 6.0 3.0 1.0 6.0 6.0
2 2 2.0 2.0 4.0 2.0 1.0 5.0 1.0 1.0 2.0 4.0 1.0 1.0 1.0 5.0 1.0 1.0 1.0 6.0 1.0 1.0 3.0 1.0 5.0 6.0
3 3 5.0 1.0 7.0 6.0 4.0 7.0 1.0 1.0 4.0 5.0 3.0 1.0 3.0 4.0 4.0 3.0 2.0 4.0 2.0 4.0 3.0 1.0 7.0 6.0
4 4 3.0 3.0 3.0 4.0 3.0 7.0 3.0 1.0 2.0 5.0 1.0 1.0 2.0 7.0 1.0 1.0 2.0 3.0 1.0 3.0 3.0 1.0 5.0 5.0
df_dict = {}

for platform in platforms:
    #New df for each platform
    df_dict[platform] = day_zero_sm_fin[['participant',*[col for col in day_zero_sm_fin.columns
                                                     if platform in col]]]
    #Adding in platform name column for eventual long-form
    df_dict[platform]['platform'] = platform
    #Standardizing column names for just measure
    df_dict[platform].columns = [col.split('_')[-1] for col in df_dict[platform].columns]
day_zero_master = pd.concat([df_dict[k] for k,v in df_dict.items()])

Now that the data are in long-form, each participant should have 6 entries (i.e., 1 entry per platform). Let’s do a quick sanity check.

day_zero_master.sort_values('participant').head(12)
participant op percep rec use platform
0 0 1.0 4.0 3.0 1.0 facebook
0 0 1.0 7.0 1.0 1.0 snapchat
0 0 1.0 7.0 1.0 1.0 youtube
0 0 2.0 3.0 2.0 2.0 pinterest
0 0 1.0 4.0 2.0 5.0 twitter
0 0 1.0 7.0 1.0 1.0 instagram
1 1 3.0 1.0 6.0 6.0 pinterest
1 1 3.0 1.0 5.0 5.0 facebook
1 1 2.0 7.0 2.0 1.0 snapchat
1 1 2.0 7.0 2.0 1.0 instagram
1 1 2.0 7.0 2.0 2.0 youtube
1 1 2.0 1.0 5.0 6.0 twitter

Next, we bring in the weekly survey data. We’ll need to make the same long-form transformations before we merge them to the initial survey.

Weekly Survey Transformations

wk_one = pd.read_csv('week_one.csv')
wk_two = pd.read_csv('week_two.csv')
wk_three= pd.read_csv('week_three.csv')

wk_one.head()
Unnamed: 0 coke sprite pepsi other_soda bud coors miller_lite corona blue_moon other_beer other_alcohol aquafina dasani fiji other_water starbucks coffee_bean dunkin_donuts freudian_sip other_coffee twitter snapchat youtube facebook instagram pinterest participant
0 0 1 2 1 2 0 0 0 0.0 0 0 0 0 0 0 0 4 0 0 0 0 0.0 3.0 3.0 3.0 3.0 1 15
1 1 0 2 0 0 0 0 0 0.0 0 0 0 21 2 0 4 0 0 0 0 0 0.0 4.0 10.0 0.0 3.0 0 57
2 2 0 0 0 0 0 0 0 0.0 0 0 2 0 0 0 1 1 0 0 0 0 10.0 10.0 10.0 3.0 8.0 0 63
3 3 0 0 0 4 0 0 0 0.0 0 0 0 0 0 0 10 1 0 0 0 0 3.0 5.0 10.0 0.0 6.0 1 162
4 4 0 1 0 0 0 0 0 0.0 0 0 0 0 0 0 0 1 0 0 0 1 0.0 2.0 3.0 0.0 4.0 0 66
#Grabbing social media platforms
wk_dfs = [wk_one, wk_two, wk_three]

for i in range(len(wk_dfs)):
    wk_dfs[i] = wk_dfs[i][[col for col in wk_dfs[i] if 'participant' in col or 'facebook' in col
          or 'youtube' in col or 'snapchat' in col or 'instagram' in col
          or 'twitter' in col or 'pinterest' in col]]

    #Adding in a column for week #
    wk_dfs[i]['week'] = i+1

wk_dfs[0].head()
twitter snapchat youtube facebook instagram pinterest participant week
0 0.0 3.0 3.0 3.0 3.0 1 15 1
1 0.0 4.0 10.0 0.0 3.0 0 57 1
2 10.0 10.0 10.0 3.0 8.0 0 63 1
3 3.0 5.0 10.0 0.0 6.0 1 162 1
4 0.0 2.0 3.0 0.0 4.0 0 66 1

To convert this to long-form, we’ll use the “melt” function.

Below is a brief demo of what each week’s df will look like, as well as the annotated code.

(
    wk_dfs[0].melt(
        #The values that we want to use as identifiers
        id_vars = ['participant', 'week'],

        #The columns that will unpivot - i.e., form 1 column where each participant will have
        #a row entry for each previous column. Since these are the only columns aside from
        #'participant' and 'week', we technically don't need to even state them, but this is
        #good to just see what's happening.
        value_vars = ['snapchat', 'instagram', 'facebook',
                        'youtube', 'twitter', 'pinterest'],

        #The name of the aforementioned new column - using "platform" to match the same column
        #named on the initial survey (so we can merge later)
        var_name = 'platform',

        #The name of the column for the values; here, the values being hours spent on that
        #platform for that week
        value_name = 'hrs_spent')
    .sort_values('participant')
    .head(12)
)
participant week platform hrs_spent
44 1 1 snapchat 8.0
314 1 1 facebook 1.0
719 1 1 pinterest 0.0
449 1 1 youtube 5.0
584 1 1 twitter 0.0
179 1 1 instagram 8.0
744 2 1 pinterest 0.0
339 2 1 facebook 5.0
204 2 1 instagram 10.0
474 2 1 youtube 5.0
609 2 1 twitter 1.0
69 2 1 snapchat 1.0
#Shortened melt code
for i in range(len(wk_dfs)):
    wk_dfs[i] = wk_dfs[i].melt(id_vars = ['participant', 'week'],
                              var_name = 'platform',
                              value_name = 'hrs_spent').sort_values('participant')
#Combining each melted week
weekly_long = pd.concat(wk_dfs).sort_values(['participant', 'week'])
#Sanity check: participant 1 should have 18 entries (6 for each week)
weekly_long.head(19)
participant week platform hrs_spent
44 1 1 twitter 0.0
314 1 1 youtube 5.0
719 1 1 pinterest 0.0
449 1 1 facebook 1.0
584 1 1 instagram 8.0
179 1 1 snapchat 8.0
516 1 2 facebook 0.0
812 1 2 pinterest 0.0
72 1 2 twitter 0.0
664 1 2 instagram 10.0
368 1 2 youtube 5.0
220 1 2 snapchat 6.0
638 1 3 instagram 6.0
497 1 3 facebook 0.0
779 1 3 pinterest 0.0
215 1 3 snapchat 8.0
356 1 3 youtube 10.0
74 1 3 twitter 0.0
744 2 1 pinterest 0.0

Things look good! All that’s left now is to merge the master weekly df to the initial.

Each participant will still have 6 entries per week, and their initial ratings that they gave each brand will be on that corresponding entry.

master = weekly_long.merge(day_zero_master)

master.head(18)
participant week platform hrs_spent op percep rec use
0 1 1 twitter 0.0 2.0 1.0 5.0 6.0
1 1 2 twitter 0.0 2.0 1.0 5.0 6.0
2 1 3 twitter 0.0 2.0 1.0 5.0 6.0
3 1 1 youtube 5.0 2.0 7.0 2.0 2.0
4 1 2 youtube 5.0 2.0 7.0 2.0 2.0
5 1 3 youtube 10.0 2.0 7.0 2.0 2.0
6 1 1 pinterest 0.0 3.0 1.0 6.0 6.0
7 1 2 pinterest 0.0 3.0 1.0 6.0 6.0
8 1 3 pinterest 0.0 3.0 1.0 6.0 6.0
9 1 1 facebook 1.0 3.0 1.0 5.0 5.0
10 1 2 facebook 0.0 3.0 1.0 5.0 5.0
11 1 3 facebook 0.0 3.0 1.0 5.0 5.0
12 1 1 instagram 8.0 2.0 7.0 2.0 1.0
13 1 2 instagram 10.0 2.0 7.0 2.0 1.0
14 1 3 instagram 6.0 2.0 7.0 2.0 1.0
15 1 1 snapchat 8.0 2.0 7.0 2.0 1.0
16 1 2 snapchat 6.0 2.0 7.0 2.0 1.0
17 1 3 snapchat 8.0 2.0 7.0 2.0 1.0
#master.to_csv('sm_master.csv')