Construction of Hertzsprung-Russell diagrams in Python¶

The purpose of this post is to construct Hertsprung-Russel diagrams, and place in it different kinds of stars.

Imports and references¶

In [1]:

%matplotlib inline

from __future__ import division

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

# This IPython magic generates a table with version information
#https://github.com/jrjohansson/version_information
%load_ext version_information
%version_information numpy, matplotlib, pandas

Out[1]:

Software	Version
Python	2.7.9 64bit [GCC 4.4.7 20120313 (Red Hat 4.4.7-1)]
IPython	2.3.1
OS	Linux 3.13.0 45 generic x86_64 with debian jessie sid
numpy	1.9.1
matplotlib	1.4.2
pandas	0.15.2
Tue Feb 24 20:53:45 2015 CET

An interesting educational activity about the H-R diagrams is The Hertzsprung-Russell Diagram in the Sloan Digital Sky Survey III (SDSS III).

Some theory¶

Some relevant facts in the development of a Hertsprung-Russel diagram (H-R diagram in short) are:

On the one hand there is a relationship between the absolute magnitude of a star and its luminosity, which has been described in this post: The color of celestial objects: part I
There is also a relationship between the spectral class of the star (O B A F G K M), its surface temperature and its color index (eg. B-V). This relationship between color index and temperature can also be seen in the previous post. Regarding the relationship between spectral classes and surface temperatures, you can read my post The color of celestial objects: part II
The purpose of the H-R diagram is to show the relationship between both groups of magnitudes. The "classic" H-R diagram shows the spectral class of the stars (or the corresponding color index B-V) on the horizontal axis and the absolute magnitudes on the vertical axis.
The position of a star in the H-R diagram tells us something about its characteristics. Thus for example if a star has a spectral class indicating a "cold" surface temperature (for example classes K or M), but its absolute magnitude / luminosity is high, this indicates that the star has a large surface, whereby the big radiative surface "compensates" the low intensity of the radiation due to its low temperature. Ie it would be a "giant" star or "supergiant". As the color indices corresponding to cold temperatures tend to red, we would speak for instance of "red giants".

The HIPPARCOS catalog¶

The biggest difficulty to build an H-R diagram is to determine the luminosity or absolute magnitude of the stars, as this involves knowing their distances. In this sense, the catalog of celestial objects SDSS I've been using so far in previous posts is not the most suitable because its main purpose is the photometry of extragalactic objects, providing little information of stars in our galaxy, and certainly lacking accurate information on the distance that separates us from them.

For this reason, the catalog of stars that we will use is HIPPARCOS, given that the priomordial objective of this mission was precisely to make accurate measurements of about 120000 stars. Furthermore, the information in this catalog includes the spectral class of the vast majority of these objects, so it is the most suitable source that we can use to make an H-R diagram.

A description of the fields in the catalog can be downloaded in pdf format at the following page: Hipparcos Overview. In particular, for this post, we are interested in the following fields:

H1 Identifier (HIP)
H5 V Johnson magnitude (Vmag)
H11 Trigonometric parallax, expressed in units of milliarcsec (Plx)
H37 Colour index (B-V)
H76 Spectral type (SpType)

The absolute magnitudes can be calculated from the values of V. For this you can use the relationship between the absolute magnitude \(M\), the apparent magnitude \(m\) and the parallax \(p\) parallax in arcseconds. See for example the wikipedia article Absolute magnitude:

\[ M = m + 5 (1 + \log_{10} p )\]

If parallax is expressed in milliarcseconds (Plx), and taking as apparent magnitude the value \(V\) in the catalog, we obtain:

\[ M = Vmag + 5 (1 + \log_{10} \frac{Plx}{1000}) \]

That once simplified gives:

\[ M = Vmag + 5 * log_{10} \frac{Plx}{100} \]

To download the catalog you must access the VizieR Search Page for the catalog I/239 from the Strasbourg astronomical Data Center (CDS). Select only the fields listed above: HIP, Vmag, Plx, BV, SpType. In the left pane, under "Preferences" select "unlimited" and "; -Separated Values" Press the "Submit" button and download the file, which I called I_239_selection.tsv and occupies only 3.8 MB. Opening it with a text editor is found that it is a text file with a header of a few lines explanatory of the contents, followed by 118322 entries with the five selected fields separated by the ; character. At the end of the file there is also a blank line we should skip when reading the file.

Now I will read the file creating a Pandas DataFrame:

In [2]:

filename = '../Hipparcos/I_239_selection.tsv'
df = pd.read_table(filename, skiprows=44, sep=';', header=None, index_col=0,
                   names = ['HIP', 'Vmag', 'Plx', 'B-V', 'SpType'],
                   skipfooter=1, engine='python')

In [3]:

df.head()

Out[3]:

	Vmag	Plx	B-V	SpType
HIP
1	9.10	3.54	0.482	F5
2	9.27	21.90	0.999	K3V
3	6.61	2.81	-0.019	B9
4	8.06	7.75	0.370	F0V
5	8.55	2.87	0.902	G8III

In [4]:

df.tail()

Out[4]:

	Vmag	Plx	B-V	SpType
HIP
118318	6.99	1.92	1.595	K2
118319	8.23	10.63	0.639	G2V
118320	7.59	5.00	0.999	K0
118321	9.20	19.22	0.698	G5V
118322	4.49	8.71	-0.075	B9IV

Preparation of the DataFrame with the data¶

Now let's work on this dataframe. First, we will analyze whether there are entries in the catalog without valid data (eg with NaN values)

In [5]:

df.describe()

Out[5]:

	Vmag	Plx	B-V	SpType
count	118218	118218	118218	115184
unique	1127	5617	2437	4124
top	8.69			K0
freq	504	263	1281	8570

The "count" value indicates that in some of the 118322 entries in the catalog there are missing values, and it happens in each of the three columns. They are marked in the dataframe as nonexistent values (NaN). We will identify and eliminate those columns.

But before we clean these NaN values, we will change any field with spaces to NaN values, since blank spaces are not considered by Pandas as NaN, but nevertheless should be eliminated, because we can not perform calculations with them (the method below is taken from the following answer in Stackoverflow

In [6]:

df_clean = df.applymap(lambda x: np.nan if isinstance(x, basestring)
                       and x.isspace() else x)

In [7]:

df_clean.describe()

Out[7]:

	Vmag	Plx	B-V	SpType
count	118217	117955	116937	115184
unique	1126	5616	2436	4124
top	8.69	2.93	1.000	K0
freq	504	183	317	8570

Finally, all rows that contain a NaN value will be removed

In [8]:

df_clean= df_clean.dropna()
df_clean.describe()

Out[8]:

	Vmag	Plx	B-V	SpType
count	114472	114472	114472	114472
unique	1072	5361	2426	4070
top	8.69	2.93	1.000	K0
freq	502	182	308	8537

In [9]:

df_clean.shape

Out[9]:

(114472, 4)

It is found that the non-zero counter values now match the number of lines in the dataframe. From now on we will work with the "df_clean" dataframe.

Next we are going to calculate, by the above formula, the absolute magnitude in the V filter, which we will call 'M_V'. I will add a new column to the dataframe, but previously I will change the columns of the dataframe with which we plan to operate, converting them from strings to numeric values.

In [10]:

df_clean['Vmag'] = df_clean['Vmag'].astype(np.float)
df_clean['Plx'] = df_clean['Plx'].astype(np.float)
df_clean['B-V'] = df_clean['B-V'].astype(np.float)

In [11]:

# Add a new column with the absolute magnitude
df_clean['M_V'] = df_clean['Vmag'] + 5 * np.log10(df_clean['Plx']/100.)

In [12]:

df_clean.head()

Out[12]:

	Vmag	Plx	B-V	SpType	M_V
HIP
1	9.10	3.54	0.482	F5	1.845016
2	9.27	21.90	0.999	K3V	5.972221
3	6.61	2.81	-0.019	B9	-1.146468
4	8.06	7.75	0.370	F0V	2.506509
5	8.55	2.87	0.902	G8III	0.839409

Next let's study the spectral types present in the dataframe. Really the only information needed at this time from the SpType column are the first two characters. Furthermore we will be only interested in those cases where the first character is alphabetic and the second is a number, like G8 or F5. After removing the anomalies, a new column is created with only the first two characters:

In [13]:

# Rows that do not meet the condition alpha + num are eliminated
f = lambda s: (len(s) >= 2)  and (s[0].isalpha()) and (s[1].isdigit())
i  = df_clean['SpType'].apply(f)
df_clean = df_clean[i]

# A new column is created with the first two characters from 'SpType'
f = lambda s: s[0:2]
df_clean['SpType2'] = df_clean['SpType'].apply(f)

In [14]:

df_clean.shape

Out[14]:

(111562, 6)

In [15]:

df_clean.head()

Out[15]:

	Vmag	Plx	B-V	SpType	M_V	SpType2
HIP
1	9.10	3.54	0.482	F5	1.845016	F5
2	9.27	21.90	0.999	K3V	5.972221	K3
3	6.61	2.81	-0.019	B9	-1.146468	B9
4	8.06	7.75	0.370	F0V	2.506509	F0
5	8.55	2.87	0.902	G8III	0.839409	G8

Let's see what are the spectral classes present in the dataframe

In [16]:

f = lambda s: s[0]
clases = df_clean['SpType'].map(f)
clases.value_counts()

Out[16]:

K    31578
F    25201
G    22213
A    17651
B    10281
M     4212
O      256
C       82
N       48
R       23
S       17
dtype: int64

We'll now remove lines with Special Classes C N, R and S to keep only those in the sequence OBAFGKM

In [17]:

f = lambda s: s[0] in 'OBAFGKM'
df_clean = df_clean[df_clean['SpType'].map(f)]

Check it:

In [18]:

f = lambda s: s[0]
clases = df_clean['SpType'].map(f)
clases.value_counts()

Out[18]:

K    31578
F    25201
G    22213
A    17651
B    10281
M     4212
O      256
dtype: int64

The purpose is to make a graph where on the horizontal axis are the spectral classes included in the Morgan-Keenan system (MKK) "OBAFGKM", followed by a digit 0-9. In addition we want that the spectral classes appear ordered precisely in this order (B5 before A0...). So I'm going to replace the letters by digits in ascending sequence in the SpType2 column, in order to sort by that column.

In [19]:

orden = {'O':'0', 'B':'1', 'A':'2', 'F':'3', 'G':'4', 'K':'5', 'M':'6'}
f = lambda s: orden[s[0]]+s[1]
df_clean['SpType2'] = df_clean['SpType2'].apply(f)
df_clean.head()

Out[19]:

	Vmag	Plx	B-V	SpType	M_V	SpType2
HIP
1	9.10	3.54	0.482	F5	1.845016	35
2	9.27	21.90	0.999	K3V	5.972221	53
3	6.61	2.81	-0.019	B9	-1.146468	19
4	8.06	7.75	0.370	F0V	2.506509	30
5	8.55	2.87	0.902	G8III	0.839409	48

Chart plot¶

In [20]:

fig, ax = plt.subplots(figsize=(8,10))

ax.set_xlim(0, 70)
ax.set_ylim(15, -10)
ax.grid()
ax.set_title('H-R Diagram \n (Hipparcos catalog)')

ax.title.set_fontsize(20)
ax.set_xlabel('spectral class')
ax.xaxis.label.set_fontsize(20)
ax.set_ylabel('Absolute magnitude')
ax.yaxis.label.set_fontsize(20)

ax.scatter(df_clean['SpType2'].astype(np.int), df_clean['M_V'],
           s=50, edgecolors='none', alpha=0.015, c='k')
ax.set_xticks(range(5,75,10))
ax.set_xticklabels(['O', 'B', 'A', 'F', 'G', 'K', 'M'])
ax.tick_params(axis='both', labelsize=14)

The diagram above is not very satisfactory, in the sense that as the horizontal axis correspond to a discrete set of values (the different spectral types), vertical stripes are formed by accumulation of observations on these values. It would be preferable to use a continuous scale of decimal values. Therefore we are going to repeat the plot, this time using the color index B-V instead of the spectral classes. Furthermore, it will show that both diagrams have the same shape, confirming the relationship between spectral classes and color indexes.

In [21]:

fig, ax = plt.subplots(figsize=(8,10))

ax.set_xlim(-0.5, 2.5)
ax.set_ylim(15, -10)
ax.grid()
ax.set_title('H-R Diagram \n (Hipparcos catalog)')

ax.title.set_fontsize(20)
ax.set_xlabel('Color index B-V')
ax.xaxis.label.set_fontsize(20)
ax.set_ylabel('Absolute magnitude')
ax.yaxis.label.set_fontsize(20)

ax.scatter(df_clean['B-V'], df_clean['M_V'],
#           s=50, edgecolors='none', alpha=0.015, c='k')
           s=1, edgecolors='none', c='k')

ax.tick_params(axis='both', labelsize=14)

Luminosity classes¶

In addition to the spectral type OBAFGKM, many stars in the Hipparcos catalog contain, in the field SpType, the luminosity class coded with the Roman numerals I to VII, that indicate the type of star in question. You can read an explanation of this classification system in the Wikipedia: Stellar classification

So let's analyze the presence of these luminosity classes in the HIPPARCOS catalog:

In [22]:

f = lambda s: 'VII' in s
b = df_clean['SpType'].map(f)
print "Class VII: white dwarfs, there are %d stars" %sum(b)

f = lambda s: ('VI' in s) and ('VII' not in s)
b = df_clean['SpType'].map(f)
print "Class VI: subdwarfs, there are %d stars" %sum(b)

f = lambda s: ('V' in s) and ('VI' not in s) and ('IV' not in s)
b = df_clean['SpType'].map(f)
print "Class V: main-sequence, there are %d stars" %sum(b)

f = lambda s: 'IV' in s
b = df_clean['SpType'].map(f)
print "Class IV: subgiants, there are %d stars" %sum(b)

f = lambda s: 'III' in s
b = df_clean['SpType'].map(f)
print "Class III: giants, there are %d stars" %sum(b)

f = lambda s: ('II' in s) and ('III' not in s) and ('VII' not in s)
b = df_clean['SpType'].map(f)
print "Class II:  bright giants, there are %d stars" %sum(b)

f = lambda s: ('I' in s) and ('II' not in s) and ('V' not in s)
b = df_clean['SpType'].map(f)
print "Class I: supergiants, there are %d stars" %sum(b)

Class VII: white dwarfs, there are 1 stars
Class VI: subdwarfs, there are 16 stars
Class V: main-sequence, there are 24683 stars
Class IV: subgiants, there are 7955 stars
Class III: giants, there are 22519 stars
Class II:  bright giants, there are 1239 stars
Class I: supergiants, there are 937 stars

As you can see below, about half of the stars in the catalog have no luminosity class encoded in the SpType field:

In [23]:

f = lambda s: ('I' not in s) and ('V' not in s)
b = df_clean['SpType'].map(f)
print sum(b)

However, there's yet a significant number of entries in the catalog that do contain information of the stellar class, so it makes sense to show the position of each of these luminosity classes in the H-R diagram

In [24]:

def plot_lum_class(b,c, label):
    ''' b: boolean Series to make the selection
        c: Color
        label: for the legend
    '''
    x = df_clean['B-V'][b]
    y = df_clean['M_V'][b]
    ax.scatter(x, y, c = c, s=6, edgecolors='none', label = label)

fig = plt.figure(figsize=(8,10))
ax = fig.add_subplot(111, axisbg='0.6')

ax.set_xlim(-0.5, 2.5)
ax.set_ylim(15, -15)
ax.grid()
ax.set_title('H-R Diagram \n (Hipparcos catalog)')

ax.title.set_fontsize(20)
ax.set_xlabel('Color index B-V')
ax.xaxis.label.set_fontsize(20)
ax.set_ylabel('Absolute magnitude')
ax.yaxis.label.set_fontsize(20)

f = lambda s: 'VII' in s
b = df_clean['SpType'].map(f)
plot_lum_class(b,'white', 'VII: white dwarfs')

f = lambda s: ('VI' in s) and ('VII' not in s)
b = df_clean['SpType'].map(f)
plot_lum_class(b,'blue', 'VI: subdwarfs')

f = lambda s: ('V' in s) and ('VI' not in s) and ('IV' not in s)
b = df_clean['SpType'].map(f)
plot_lum_class(b,'black', 'V: main-sequence')

f = lambda s: 'IV' in s
b = df_clean['SpType'].map(f)
plot_lum_class(b,'grey', 'IV: subgiants')

f = lambda s: 'III' in s
b = df_clean['SpType'].map(f)
plot_lum_class(b,'green', 'III: giants')

f = lambda s: ('II' in s) and ('III' not in s) and ('VII' not in s)
b = df_clean['SpType'].map(f)
plot_lum_class(b,'orange', 'II: bright giants')

f = lambda s: ('I' in s) and ('II' not in s) and ('V' not in s)
b = df_clean['SpType'].map(f)
plot_lum_class(b,'yellow', 'I: supergiants')

ax.tick_params(axis='both', labelsize=14)
legend = ax.legend(scatterpoints=1,markerscale = 6, shadow=True)
frame = legend.get_frame()
frame.set_facecolor('0.90')

This concludes this article. Do not hesitate to send me your comments. Until next time!

This entry has been created entirely with IPyhon Notebook

Balbuceos astropythonicos

Etiquetas