Unit 2 - Operations on a Series

CBSE Revision Notes
Class-11 Informatics Practices (New Syllabus)
Unit 2: Data Handling (DH-1)


Operations on a Series

Series is a one-dimensional labeled array capable of holding data of any type (integer, string, float, python objects, etc.). The axis labels are collectively called index.

pandas.Series

A pandas Series can be created using the following constructor −

pandas.Series( data, index, dtype, copy)

The parameters of the constructor are as follows −

S.NoParameter & Description
1data - data takes various forms like ndarray, list, constants
2index - Index values must be unique and hashable, same length as data. Default np.arrange(n) if no index is passed.
3dtype - dtype is for data type. If None, data type will be inferred
4copy - Copy data. Default False

A series can be created using various inputs like −

  • Array
  • Dict
  • Scalar value or constant

Create an Empty Series

A basic series, which can be created is an Empty Series.

Example

#import the pandas library and aliasing as pd
import pandas as pd
s = pd.Series()
print s

Its output is as follows −

Series([], dtype: float64)

Create a Series from ndarray

If data is an ndarray, then index passed must be of the same length. If no index is passed, then by default index will be range(n) where n is array length, i.e., [0,1,2,3…. range(len(array))-1].

Example 1

#import the pandas library and aliasing as pd
import pandas as pd
import numpy as np
data = np.array(['a','b','c','d'])
s = pd.Series(data)
print s

Its output is as follows −

0  a
1 b
2 c
3 d
dtype: object

We did not pass any index, so by default, it assigned the indexes ranging from 0 to len(data)-1, i.e., 0 to 3.

Example 2

#import the pandas library and aliasing as pd
import pandas as pd
import numpy as np
data = np.array(['a','b','c','d'])
s = pd.Series(data,index=[100,101,102,103])
print s

Its output is as follows −

100 a
101 b
102 c
103 d
dtype: object

We passed the index values here. Now we can see the customized indexed values in the output.

Create a Series from dict

dict can be passed as input and if no index is specified, then the dictionary keys are taken in a sorted order to construct index. If index is passed, the values in data corresponding to the labels in the index will be pulled out.

Example 1

#import the pandas library and aliasing as pd
import pandas as pd
import numpy as np
data = {'a' : 0., 'b' : 1., 'c' : 2.}
s = pd.Series(data)
print s

Its output is as follows −

a 0.0
b 1.0
c 2.0
dtype: float64

Observe − Dictionary keys are used to construct index.

Example 2

#import the pandas library and aliasing as pd
import pandas as pd
import numpy as np
data = {'a' : 0., 'b' : 1., 'c' : 2.}
s = pd.Series(data,index=['b','c','d','a'])
print s

Its output is as follows −

b 1.0
c 2.0
d NaN
a 0.0
dtype: float64

Observe − Index order is persisted and the missing element is filled with NaN (Not a Number).

Create a Series from Scalar

If data is a scalar value, an index must be provided. The value will be repeated to match the length of index

#import the pandas library and aliasing as pd
import pandas as pd
import numpy as np
s = pd.Series(5, index=[0, 1, 2, 3])
print s

Its output is as follows −

0 5
1 5
2 5
3 5
dtype: int64

pandas.Series.head

Series.head(n=5)

Return the first n rows.

This function returns the first n rows for the object based on position. It is useful for quickly testing if your object has the right type of data in it.

Parameters:n : int, default 5
Number of rows to select.
Returns:obj_head : type of caller
The first n rows of the caller object.

Returns the last n rows.

Examples

>>> df = pd.DataFrame({'animal':['alligator', 'bee', 'falcon', 'lion',
'monkey', 'parrot', 'shark', 'whale', 'zebra']})
>>> df
animal
0 alligator
1 bee
2 falcon
3 lion
4 monkey
5 parrot
6 shark
7 whale
8 zebra

Viewing the first 5 lines

>>> df.head()
animal
0 alligator
1 bee
2 falcon
3 lion
4 monkey

Viewing the first n lines (three in this case)

>>> df.head(3)
animal
0 alligator
1 bee
2 falcon

pandas.Series.tail

Series.tail(n=5)

Return the last n rows.

This function returns last n rows from the object based on position. It is useful for quickly verifying data, for example, after sorting or appending rows.

Parameters:n : int, default 5
Number of rows to select.
Returns:type of caller
The last n rows of the caller object.

The first n rows of the caller object.

Examples

>>> df = pd.DataFrame({'animal':['alligator', 'bee', 'falcon', 'lion',
'monkey', 'parrot', 'shark', 'whale', 'zebra']})
>>> df
animal
0 alligator
1 bee
2 falcon
3 lion
4 monkey
5 parrot
6 shark
7 whale
8 zebra

Viewing the last 5 lines

>>> df.tail()
animal
4 monkey
5 parrot
6 shark
7 whale
8 zebra

Viewing the last n lines (three in this case)

>>> df.tail(3)
animal
6 shark
7 whale
8 zebra

Here we discuss a lot of the essential functionality common to the pandas data structures. Here’s how to create some of the objects used in the examples from the previous section:

In [1]: index = pd.date_range('1/1/2000', periods=8)

In [2]: s = pd.Series(np.random.randn(5), index=['a', 'b', 'c', 'd', 'e'])

In [3]: df = pd.DataFrame(np.random.randn(8, 3), index=index,
...: columns=['A', 'B', 'C'])
...:

In [4]: wp = pd.Panel(np.random.randn(2, 5, 4), items=['Item1', 'Item2'],
...: major_axis=pd.date_range('1/1/2000', periods=5),
...: minor_axis=['A', 'B', 'C', 'D'])
...:

Head and Tail

To view a small sample of a Series or DataFrame object, use the head() and tail() methods. The default number of elements to display is five, but you may pass a custom number.

In [5]: long_series = pd.Series(np.random.randn(1000))

In [6]: long_series.head()
Out[6]:
0 0.229453
1 0.304418
2 0.736135
3 -0.859631
4 -0.424100
dtype: float64

In [7]: long_series.tail(3)
Out[7]:
997 -0.351587
998 1.136249
999 -0.448789
dtype: float64

Attributes and the raw ndarray(s)

pandas objects have a number of attributes enabling you to access the metadata

  • shape: gives the axis dimensions of the object, consistent with ndarray
  • Axis labels
    • Seriesindex (only axis)
    • DataFrameindex (rows) and columns
    • Panelitemsmajor_axis, and minor_axis

Note, these attributes can be safely assigned to!

In [8]: df[:2]
Out[8]:
A B C
2000-01-01 0.048869 -1.360687 -0.47901
2000-01-02 -0.859661 -0.231595 -0.52775

In [9]: df.columns = [x.lower() for x in df.columns]

In [10]: df
Out[10]:
a b c
2000-01-01 0.048869 -1.360687 -0.479010
2000-01-02 -0.859661 -0.231595 -0.527750
2000-01-03 -1.296337 0.150680 0.123836
2000-01-04 0.571764 1.555563 -0.823761
2000-01-05 0.535420 -1.032853 1.469725
2000-01-06 1.304124 1.449735 0.203109
2000-01-07 -1.032011 0.969818 -0.962723
2000-01-08 1.382083 -0.938794 0.669142