Pupli – Page 65

Convert text to number with VALUE formula in Excel

December 15, 2019

=VALUE("$1,000")

=VALUE(A2)

=VALUE(RIGHT(A2,3))

References
https://www.ablebits.com/office-addins-blog/2018/07/18/excel-convert-text-to-number/
https://exceljet.net/excel-functions/excel-value-function

Append existing excel sheet with new Dataframe using Python Pandas

December 14, 2019

def append_df_to_excel(filename, df, sheet_name='Sheet1', startrow=None,
                       truncate_sheet=False, 
                       **to_excel_kwargs):
    """
    Append a DataFrame [df] to existing Excel file [filename]
    into [sheet_name] Sheet.
    If [filename] doesn't exist, then this function will create it.

    Parameters:
      filename : File path or existing ExcelWriter
                 (Example: '/path/to/file.xlsx')
      df : dataframe to save to workbook
      sheet_name : Name of sheet which will contain DataFrame.
                   (default: 'Sheet1')
      startrow : upper left cell row to dump data frame.
                 Per default (startrow=None) calculate the last row
                 in the existing DF and write to the next row...
      truncate_sheet : truncate (remove and recreate) [sheet_name]
                       before writing DataFrame to Excel file
      to_excel_kwargs : arguments which will be passed to `DataFrame.to_excel()`
                        [can be dictionary]

    Returns: None
    """
    from openpyxl import load_workbook

    import pandas as pd

    # ignore [engine] parameter if it was passed
    if 'engine' in to_excel_kwargs:
        to_excel_kwargs.pop('engine')

    writer = pd.ExcelWriter(filename, engine='openpyxl')

    # Python 2.x: define [FileNotFoundError] exception if it doesn't exist 
    try:
        FileNotFoundError
    except NameError:
        FileNotFoundError = IOError


    try:
        # try to open an existing workbook
        writer.book = load_workbook(filename)

        # get the last row in the existing Excel sheet
        # if it was not specified explicitly
        if startrow is None and sheet_name in writer.book.sheetnames:
            startrow = writer.book[sheet_name].max_row

        # truncate sheet
        if truncate_sheet and sheet_name in writer.book.sheetnames:
            # index of [sheet_name] sheet
            idx = writer.book.sheetnames.index(sheet_name)
            # remove [sheet_name]
            writer.book.remove(writer.book.worksheets[idx])
            # create an empty sheet [sheet_name] using old index
            writer.book.create_sheet(sheet_name, idx)

        # copy existing sheets
        writer.sheets = {ws.title:ws for ws in writer.book.worksheets}
    except FileNotFoundError:
        # file does not exist yet, we will create it
        pass

    if startrow is None:
        startrow = 0

    # write out the new sheet
    df.to_excel(writer, sheet_name, startrow=startrow, **to_excel_kwargs)

    # save the workbook
    writer.save()

References
https://stackoverflow.com/questions/38074678/append-existing-excel-sheet-with-new-dataframe-using-python-pandas

Using iloc & loc to Select Rows and Columns in Pandas DataFrames

December 14, 2019

Selecting pandas data using “iloc”

The iloc indexer for Pandas Dataframe is used for integer-location based indexing / selection by position.

# Single selections using iloc and DataFrame
# Rows:
data.iloc[0] # first row of data frame (Aleshia Tomkiewicz) - Note a Series data type output.
data.iloc[1] # second row of data frame (Evan Zigomalas)
data.iloc[-1] # last row of data frame (Mi Richan)
# Columns:
data.iloc[:,0] # first column of data frame (first_name)
data.iloc[:,1] # second column of data frame (last_name)
data.iloc[:,-1] # last column of data frame (id)

# Multiple row and column selections using iloc and DataFrame
data.iloc[0:5] # first five rows of dataframe
data.iloc[:, 0:2] # first two columns of data frame with all rows
data.iloc[[0,3,6,24], [0,5,6]] # 1st, 4th, 7th, 25th row + 1st 6th 7th columns.
data.iloc[0:5, 5:8] # first 5 rows and 5th, 6th, 7th columns of data frame (county -> phone1).

Selecting pandas data using “loc”

The Pandas loc indexer can be used with DataFrames for two different use cases:

a.) Selecting rows by label/index
b.) Selecting rows with a boolean / conditional lookup

# Select rows with index values 'Andrade' and 'Veness', with all columns between 'city' and 'email'
data.loc[['Andrade', 'Veness'], 'city':'email']
# Select same rows, with just 'first_name', 'address' and 'city' columns
data.loc['Andrade':'Veness', ['first_name', 'address', 'city']]
 
# Change the index to be based on the 'id' column
data.set_index('id', inplace=True)
# select the row with 'id' = 487
data.loc[487]

# Select rows with first name Antonio, # and all columns between 'city' and 'email'
data.loc[data['first_name'] == 'Antonio', 'city':'email']
 
# Select rows where the email column ends with 'hotmail.com', include all columns
data.loc[data['email'].str.endswith("hotmail.com")]   
 
# Select rows with last_name equal to some values, all columns
data.loc[data['first_name'].isin(['France', 'Tyisha', 'Eric'])]   
       
# Select rows with first name Antonio AND hotmail email addresses
data.loc[data['email'].str.endswith("gmail.com") & (data['first_name'] == 'Antonio')] 
 
# select rows with id column between 100 and 200, and just return 'postal' and 'web' columns
data.loc[(data['id'] > 100) & (data['id'] <= 200), ['postal', 'web']] 
 
# A lambda function that yields True/False values can also be used.
# Select rows where the company name has 4 words in it.
data.loc[data['company_name'].apply(lambda x: len(x.split(' ')) == 4)] 
 
# Selections can be achieved outside of the main .loc for clarity:
# Form a separate variable with your selections:
idx = data['company_name'].apply(lambda x: len(x.split(' ')) == 4)
# Select only the True values in 'idx' and only the 3 columns specified:
data.loc[idx, ['email', 'first_name', 'company']]

References
https://www.shanelynn.ie/select-pandas-dataframe-rows-and-columns-using-iloc-loc-and-ix/
https://thispointer.com/select-rows-columns-by-name-or-index-in-dataframe-using-loc-iloc-python-pandas/

Excel ROW Function

December 10, 2019

The Excel ROW function returns the row number for a reference. For example, ROW(C5) returns 5, since C5 is the fifth row in the spreadsheet. When no reference is provided, ROW returns the row number of the cell which contains the formula.

=ROW(A3) // returns 3
=ROW(D5) // returns 5
=ROW(F10:J15) // returns 10

References
https://exceljet.net/excel-functions/excel-row-function

Select fastest apt mirror on Ubuntu Linux

December 7, 2019

sudo nano /etc/apt/sources.list

replace

deb http://us.archive.ubuntu.com/ubuntu/ xenial main restricted

with

deb mirror://mirrors.ubuntu.com/mirrors.txt xenial main restricted

Manual apt mirror selection

wget -qO - mirrors.ubuntu.com/mirrors.txt

References
https://linuxconfig.org/how-to-select-the-fastest-apt-mirror-on-ubuntu-linux

Change Data Type of columns in Pandas Dataframe

December 6, 2019

Method #1: Using DataFrame.astype()

# importing pandas as pd 
import pandas as pd 

# sample dataframe 
df = pd.DataFrame({ 
  'A': [1, 2, 3, 4, 5], 
  'B': ['a', 'b', 'c', 'd', 'e'], 
  'C': [1.1, '1.0', '1.3', 2, 5] }) 

# converting all columns to string type 
df = df.astype(str) 
print(df.dtypes)

# importing pandas as pd 
import pandas as pd 

# sample dataframe 
df = pd.DataFrame({ 
  'A': [1, 2, 3, 4, 5], 
  'B': ['a', 'b', 'c', 'd', 'e'], 
  'C': [1.1, '1.0', '1.3', 2, 5] }) 

# using dictionary to convert specific columns 
convert_dict = {'A': int, 
        'C': float
      } 

df = df.astype(convert_dict) 
print(df.dtypes)

Method #2: Using DataFrame.apply()

We can pass pandas.to_numeric, pandas.to_datetime and pandas.to_timedelta as argument to apply() function to change the datatype of one or more columns to numeric, datetime and timedelta respectively.

# importing pandas as pd 
import pandas as pd 

# sample dataframe 
df = pd.DataFrame({ 
  'A': [1, 2, 3, '4', '5'], 
  'B': ['a', 'b', 'c', 'd', 'e'], 
  'C': [1.1, '2.1', 3.0, '4.1', '5.1'] }) 

# using apply method 
df[['A', 'C']] = df[['A', 'C']].apply(pd.to_numeric) 
print(df.dtypes)

Method #3: Using DataFrame.infer_objects()

# importing pandas as pd 
import pandas as pd 

# sample dataframe 
df = pd.DataFrame({ 
  'A': [1, 2, 3, 4, 5], 
  'B': ['a', 'b', 'c', 'd', 'e'], 
  'C': [1.1, 2.1, 3.0, 4.1, 5.1] 
  }, dtype ='object') 

# converting datatypes 
df = df.infer_objects() 
print(df.dtypes)

References
https://www.geeksforgeeks.org/change-data-type-for-one-or-more-columns-in-pandas-dataframe/
https://towardsdatascience.com/my-pandas-cheat-sheet-b71437ab26f

Working on a copy of Pandas DataFrame

December 6, 2019

import pandas as pd

df: pd.DataFrame = pd.read_csv("avocado.csv")

df_cp = df.copy()

References
https://pythonprogramming.net/introduction-python3-pandas-data-analysis/

Sort Index in Pandas DataFrame

December 6, 2019

import pandas as pd

df: pd.DataFrame = pd.read_csv("avocado.csv")

df.set_index("Date", inplace=True)
df.sort_index(inplace=True)

print(df.index)

References
https://pythonprogramming.net/graph-visualization-python3-pandas-data-analysis/

Set Index in Pandas DataFrame

December 6, 2019

import pandas as pd

df: pd.DataFrame = pd.read_csv("avocado.csv")

# index of dataframe
print(df.index)

# set index in dataframe
df.set_index("Date", inplace=True)

# or set index this ways
# df = df.set_index("Date")

# print index
print(df.index)

References
https://pythonprogramming.net/introduction-python3-pandas-data-analysis/

Filter by values of a column in Pandas DataFrame

December 6, 2019

import pandas as pd

df: pd.DataFrame = pd.read_csv("avocado.csv")
albany_df = df[df["region"] == "Albany"]
# albany_df = df[df["region"] == "Albany"].copy()
print(albany_df)

males = df[(df[Gender]=='Male') & (df[Year]==2014)]

To store your dataframes in a dict using a for loop:

from collections import defaultdict
dic={}
for g in ['male', 'female']:
  dic[g]=defaultdict(dict)
  for y in [2013, 2014]:
    dic[g][y]=df[(df[Gender]==g) & (df[Year]==y)] #store the DataFrames to a dict of dict

References
https://pythonprogramming.net/introduction-python3-pandas-data-analysis/
https://stackoverflow.com/questions/22086116/how-do-you-filter-pandas-dataframes-by-multiple-columns