Pandas

 Data Structures: 

Series One dimensional labeled array capable of holding any data type, the axis labeled are collectively known as an index.

data can be dic, ndarray, or scalar value.

data=np.random.rand(5)

index=['a','b','c','d','e')

s=pd.Series(data, index=index)

using dictionary:

d={'a':1,'b':2,'c':3,'d':4}

s=pd.Series(d)

** key will convert into index and the value will be value. and If an index is passed, the values in data corresponding to the labels in the index will be pulled out.

s=pd.Series(d,index=['a','b','c','e','f'])

pandas series has a single data type:

s.dtype

Access series value using index:

s['a']

set value using an index:

s['f']=7

Check whether the index exist in a series or not:

'f' in s

'h' not in s

If we access the value using the index and the index is not in series, it will throw an error. but if we access using the get method it won't throw an error out put will be None.

s.get('i') 

s.get('i',9) if i is not available in the series it will give 9 instead of None, but if available it will give assigned value.

Vectorized Operation:

ex: series1+series2

when we do an operation between two series it automatically aligns the data based on a label. The result of an operation between unaligned Series will have the union of the indexes involved. 

If a label is not found in one Series or the other, the result will be marked as missing NaN.

a=pd.Series((1,2,3,4))

b=pd.Series((3,4,6,7,8))

a+b: 4,6,9,11,nan


Data Frame: Two-dimensional data structure with columns of different types. 

we can consider it as a spreadsheet or SQL Table. Along with the data, we can pass the index and column

It accepts many different kinds of input: 

Data Frame using Series and Dict:

1. d={ "one":pd.Series([1,2,3,4],index=["a","b","c","d"],
        "two":pd.Series([1,2,3,4,5],index=["a","b","c","d","e"]
            }
df=pd.DataFrame(d)

The resulting index will be the union of the indexes of the various Series. If there are any nested dicts, these will first be converted to Series. If no columns are passed, the columns will be the ordered list of dict keys.

a dict of Series plus a specific index will discard all data not matching up to the passed index.

df=pd.DataFrame(d,index=["a","b","d","e"])

From dict of ndarray or list: The ndarray must all be the same length, if an index is passed,

it must be same length as arrays.

import pandas as pd

import numpy as np

d={"one":[1,2,3,4,5],"two":[1,2,3,4,5]}

df=pd.DataFrame(d)

df1=pd.DataFrame(d,index=["a","b","c","d","e"])

df1

From list of dicts:

import pandas as pd

import numpy as np

d=[{"a":1,"b":2},{"a":5,"b":10,"c":10}]

df=pd.DataFrame(d,index=["first","second"])

df

From dict of tuples: 

import pandas as pd

import numpy as np

pd.DataFrame({

    ("a","b"):{("A","B"):1,("A","C"):2},

    ("a","a"):{("A","C"):3,("A","B"):4},

    ("a", "c"): {("A", "B"): 5, ("A", "C"): 6},

    ("b", "a"): {("A", "C"): 7, ("A", "B"): 8},

    ("b", "b"): {("A", "D"): 9, ("A", "B"): 10}

})

From Series:

import pandas as pd

import numpy as np

ser=pd.Series(range(3),index=["a","b","c"])

df=pd.DataFrame(ser)

df

Viewing Data:

df.head()

df.tail()

df.index()

df.columns()

df.dtypes()

df.describes()

DataFrame.sort_index() sorts by an axis:

df.sort_index(axis=1, ascending=False)

df.sort_values(by="A")

Data Selection:

print(df[:])

print(df[2:3])

print(df.loc[:,"A"])

print(df.loc[["a","c"],["A"]]) # need to use label

print(df.loc["a":"c",["A"]])

print(df.iloc[1:3])

print(df.iloc[1:4,:])

print(df.iloc[1:5][:])

print(df.iloc[1,0])

Boolean Indexing

df[df["A"]>2]

selecting row where df["A"] greater than 2

df[df>2]

Selecting value from data frame where value greater than 2

ISIN Method:

df[df["A"].isin([1,2])]