-->

Etiquetas

Construction of Hertzsprung-Russell diagrams in Python

Construction of Hertzsprung-Russell diagrams in Python

Author: Eduardo Martín Calleja

The purpose of this post is to construct Hertsprung-Russel diagrams, and place in it different kinds of stars.

Imports and references

In [1]:
%matplotlib inline

from __future__ import division

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

# This IPython magic generates a table with version information
#https://github.com/jrjohansson/version_information
%load_ext version_information
%version_information numpy, matplotlib, pandas
Out[1]:
SoftwareVersion
Python2.7.9 64bit [GCC 4.4.7 20120313 (Red Hat 4.4.7-1)]
IPython2.3.1
OSLinux 3.13.0 45 generic x86_64 with debian jessie sid
numpy1.9.1
matplotlib1.4.2
pandas0.15.2
Tue Feb 24 20:53:45 2015 CET

An interesting educational activity about the H-R diagrams is The Hertzsprung-Russell Diagram in the Sloan Digital Sky Survey III (SDSS III).

Some theory

Some relevant facts in the development of a Hertsprung-Russel diagram (H-R diagram in short) are:

  • On the one hand there is a relationship between the absolute magnitude of a star and its luminosity, which has been described in this post: The color of celestial objects: part I

  • There is also a relationship between the spectral class of the star (O B A F G K M), its surface temperature and its color index (eg. B-V). This relationship between color index and temperature can also be seen in the previous post. Regarding the relationship between spectral classes and surface temperatures, you can read my post The color of celestial objects: part II

  • The purpose of the H-R diagram is to show the relationship between both groups of magnitudes. The "classic" H-R diagram shows the spectral class of the stars (or the corresponding color index B-V) on the horizontal axis and the absolute magnitudes on the vertical axis.

  • The position of a star in the H-R diagram tells us something about its characteristics. Thus for example if a star has a spectral class indicating a "cold" surface temperature (for example classes K or M), but its absolute magnitude / luminosity is high, this indicates that the star has a large surface, whereby the big radiative surface "compensates" the low intensity of the radiation due to its low temperature. Ie it would be a "giant" star or "supergiant". As the color indices corresponding to cold temperatures tend to red, we would speak for instance of "red giants".

The HIPPARCOS catalog

The biggest difficulty to build an H-R diagram is to determine the luminosity or absolute magnitude of the stars, as this involves knowing their distances. In this sense, the catalog of celestial objects SDSS I've been using so far in previous posts is not the most suitable because its main purpose is the photometry of extragalactic objects, providing little information of stars in our galaxy, and certainly lacking accurate information on the distance that separates us from them.

For this reason, the catalog of stars that we will use is HIPPARCOS, given that the priomordial objective of this mission was precisely to make accurate measurements of about 120000 stars. Furthermore, the information in this catalog includes the spectral class of the vast majority of these objects, so it is the most suitable source that we can use to make an H-R diagram.

A description of the fields in the catalog can be downloaded in pdf format at the following page: Hipparcos Overview. In particular, for this post, we are interested in the following fields:

  • H1 Identifier (HIP)
  • H5 V Johnson magnitude (Vmag)
  • H11 Trigonometric parallax, expressed in units of milliarcsec (Plx)
  • H37 Colour index (B-V)
  • H76 Spectral type (SpType)

The absolute magnitudes can be calculated from the values of V. For this you can use the relationship between the absolute magnitude \(M\), the apparent magnitude \(m\) and the parallax \(p\) parallax in arcseconds. See for example the wikipedia article Absolute magnitude:

\[ M = m + 5 (1 + \log_{10} p )\]

If parallax is expressed in milliarcseconds (Plx), and taking as apparent magnitude the value \(V\) in the catalog, we obtain:

\[ M = Vmag + 5 (1 + \log_{10} \frac{Plx}{1000}) \]

That once simplified gives:

\[ M = Vmag + 5 * log_{10} \frac{Plx}{100} \]

To download the catalog you must access the VizieR Search Page for the catalog I/239 from the Strasbourg astronomical Data Center (CDS). Select only the fields listed above: HIP, Vmag, Plx, BV, SpType. In the left pane, under "Preferences" select "unlimited" and "; -Separated Values" Press the "Submit" button and download the file, which I called I_239_selection.tsv and occupies only 3.8 MB. Opening it with a text editor is found that it is a text file with a header of a few lines explanatory of the contents, followed by 118322 entries with the five selected fields separated by the ; character. At the end of the file there is also a blank line we should skip when reading the file.

Now I will read the file creating a Pandas DataFrame:

In [2]:
filename = '../Hipparcos/I_239_selection.tsv'
df = pd.read_table(filename, skiprows=44, sep=';', header=None, index_col=0,
                   names = ['HIP', 'Vmag', 'Plx', 'B-V', 'SpType'],
                   skipfooter=1, engine='python')
In [3]:
df.head()
Out[3]:
Vmag Plx B-V SpType
HIP
1 9.10 3.54 0.482 F5
2 9.27 21.90 0.999 K3V
3 6.61 2.81 -0.019 B9
4 8.06 7.75 0.370 F0V
5 8.55 2.87 0.902 G8III
In [4]:
df.tail()
Out[4]:
Vmag Plx B-V SpType
HIP
118318 6.99 1.92 1.595 K2
118319 8.23 10.63 0.639 G2V
118320 7.59 5.00 0.999 K0
118321 9.20 19.22 0.698 G5V
118322 4.49 8.71 -0.075 B9IV

Preparation of the DataFrame with the data

Now let's work on this dataframe. First, we will analyze whether there are entries in the catalog without valid data (eg with NaN values​​)

In [5]:
df.describe()
Out[5]:
Vmag Plx B-V SpType
count 118218 118218 118218 115184
unique 1127 5617 2437 4124
top 8.69 K0
freq 504 263 1281 8570

The "count" value indicates that in some of the 118322 entries in the catalog there are missing values, and it happens in each of the three columns. They are marked in the dataframe as nonexistent values (NaN). We will identify and eliminate those columns.

But before we clean these NaN values, we will change any field with spaces to NaN values, since blank spaces are not considered by Pandas as NaN, but nevertheless should be eliminated, because we can not perform calculations with them (the method below is taken from the following answer in Stackoverflow

In [6]:
df_clean = df.applymap(lambda x: np.nan if isinstance(x, basestring)
                       and x.isspace() else x)
In [7]:
df_clean.describe()
Out[7]:
Vmag Plx B-V SpType
count 118217 117955 116937 115184
unique 1126 5616 2436 4124
top 8.69 2.93 1.000 K0
freq 504 183 317 8570

Finally, all rows that contain a NaN value will be removed

In [8]:
df_clean= df_clean.dropna()
df_clean.describe()
Out[8]:
Vmag Plx B-V SpType
count 114472 114472 114472 114472
unique 1072 5361 2426 4070
top 8.69 2.93 1.000 K0
freq 502 182 308 8537
In [9]:
df_clean.shape
Out[9]:
(114472, 4)

It is found that the non-zero counter values now match the number of lines in the dataframe. From now on we will work with the "df_clean" dataframe.

Next we are going to calculate, by the above formula, the absolute magnitude in the V filter, which we will call 'M_V'. I will add a new column to the dataframe, but previously I will change the columns of the dataframe with which we plan to operate, converting them from strings to numeric values.

In [10]:
df_clean['Vmag'] = df_clean['Vmag'].astype(np.float)
df_clean['Plx'] = df_clean['Plx'].astype(np.float)
df_clean['B-V'] = df_clean['B-V'].astype(np.float)
In [11]:
# Add a new column with the absolute magnitude
df_clean['M_V'] = df_clean['Vmag'] + 5 * np.log10(df_clean['Plx']/100.)
In [12]:
df_clean.head()
Out[12]:
Vmag Plx B-V SpType M_V
HIP
1 9.10 3.54 0.482 F5 1.845016
2 9.27 21.90 0.999 K3V 5.972221
3 6.61 2.81 -0.019 B9 -1.146468
4 8.06 7.75 0.370 F0V 2.506509
5 8.55 2.87 0.902 G8III 0.839409

Next let's study the spectral types present in the dataframe. Really the only information needed at this time from the SpType column are the first two characters. Furthermore we will be only interested in those cases where the first character is alphabetic and the second is a number, like G8 or F5. After removing the anomalies, a new column is created with only the first two characters:

In [13]:
# Rows that do not meet the condition alpha + num are eliminated
f = lambda s: (len(s) >= 2)  and (s[0].isalpha()) and (s[1].isdigit())
i  = df_clean['SpType'].apply(f)
df_clean = df_clean[i]

# A new column is created with the first two characters from 'SpType'
f = lambda s: s[0:2]
df_clean['SpType2'] = df_clean['SpType'].apply(f)
In [14]:
df_clean.shape
Out[14]:
(111562, 6)
In [15]:
df_clean.head()
Out[15]:
Vmag Plx B-V SpType M_V SpType2
HIP
1 9.10 3.54 0.482 F5 1.845016 F5
2 9.27 21.90 0.999 K3V 5.972221 K3
3 6.61 2.81 -0.019 B9 -1.146468 B9
4 8.06 7.75 0.370 F0V 2.506509 F0
5 8.55 2.87 0.902 G8III 0.839409 G8

Let's see what are the spectral classes present in the dataframe

In [16]:
f = lambda s: s[0]
clases = df_clean['SpType'].map(f)
clases.value_counts()
Out[16]:
K    31578
F    25201
G    22213
A    17651
B    10281
M     4212
O      256
C       82
N       48
R       23
S       17
dtype: int64

We'll now remove lines with Special Classes C N, R and S to keep only those in the sequence OBAFGKM

In [17]:
f = lambda s: s[0] in 'OBAFGKM'
df_clean = df_clean[df_clean['SpType'].map(f)]

Check it:

In [18]:
f = lambda s: s[0]
clases = df_clean['SpType'].map(f)
clases.value_counts()
Out[18]:
K    31578
F    25201
G    22213
A    17651
B    10281
M     4212
O      256
dtype: int64

The purpose is to make a graph where on the horizontal axis are the spectral classes included in the Morgan-Keenan system (MKK) "OBAFGKM", followed by a digit 0-9. In addition we want that the spectral classes appear ordered precisely in this order (B5 before A0...). So I'm going to replace the letters by digits in ascending sequence in the SpType2 column, in order to sort by that column.

In [19]:
orden = {'O':'0', 'B':'1', 'A':'2', 'F':'3', 'G':'4', 'K':'5', 'M':'6'}
f = lambda s: orden[s[0]]+s[1]
df_clean['SpType2'] = df_clean['SpType2'].apply(f)
df_clean.head()
Out[19]:
Vmag Plx B-V SpType M_V SpType2
HIP
1 9.10 3.54 0.482 F5 1.845016 35
2 9.27 21.90 0.999 K3V 5.972221 53
3 6.61 2.81 -0.019 B9 -1.146468 19
4 8.06 7.75 0.370 F0V 2.506509 30
5 8.55 2.87 0.902 G8III 0.839409 48

Chart plot

In [20]:
fig, ax = plt.subplots(figsize=(8,10))

ax.set_xlim(0, 70)
ax.set_ylim(15, -10)
ax.grid()
ax.set_title('H-R Diagram \n (Hipparcos catalog)')

ax.title.set_fontsize(20)
ax.set_xlabel('spectral class')
ax.xaxis.label.set_fontsize(20)
ax.set_ylabel('Absolute magnitude')
ax.yaxis.label.set_fontsize(20)

ax.scatter(df_clean['SpType2'].astype(np.int), df_clean['M_V'],
           s=50, edgecolors='none', alpha=0.015, c='k')
ax.set_xticks(range(5,75,10))
ax.set_xticklabels(['O', 'B', 'A', 'F', 'G', 'K', 'M'])
ax.tick_params(axis='both', labelsize=14)

The diagram above is not very satisfactory, in the sense that as the horizontal axis correspond to a discrete set of values (the different spectral types), vertical stripes are formed by accumulation of observations on these values. It would be preferable to use a continuous scale of decimal values. Therefore we are going to repeat the plot, this time using the color index B-V instead of the spectral classes. Furthermore, it will show that both diagrams have the same shape, confirming the relationship between spectral classes and color indexes.

In [21]:
fig, ax = plt.subplots(figsize=(8,10))

ax.set_xlim(-0.5, 2.5)
ax.set_ylim(15, -10)
ax.grid()
ax.set_title('H-R Diagram \n (Hipparcos catalog)')

ax.title.set_fontsize(20)
ax.set_xlabel('Color index B-V')
ax.xaxis.label.set_fontsize(20)
ax.set_ylabel('Absolute magnitude')
ax.yaxis.label.set_fontsize(20)

ax.scatter(df_clean['B-V'], df_clean['M_V'],
#           s=50, edgecolors='none', alpha=0.015, c='k')
           s=1, edgecolors='none', c='k')

ax.tick_params(axis='both', labelsize=14)

Luminosity classes

In addition to the spectral type OBAFGKM, many stars in the Hipparcos catalog contain, in the field SpType, the luminosity class coded with the Roman numerals I to VII, that indicate the type of star in question. You can read an explanation of this classification system in the Wikipedia: Stellar classification

So let's analyze the presence of these luminosity classes in the HIPPARCOS catalog:

In [22]:
f = lambda s: 'VII' in s
b = df_clean['SpType'].map(f)
print "Class VII: white dwarfs, there are %d stars" %sum(b)

f = lambda s: ('VI' in s) and ('VII' not in s)
b = df_clean['SpType'].map(f)
print "Class VI: subdwarfs, there are %d stars" %sum(b)

f = lambda s: ('V' in s) and ('VI' not in s) and ('IV' not in s)
b = df_clean['SpType'].map(f)
print "Class V: main-sequence, there are %d stars" %sum(b)

f = lambda s: 'IV' in s
b = df_clean['SpType'].map(f)
print "Class IV: subgiants, there are %d stars" %sum(b)

f = lambda s: 'III' in s
b = df_clean['SpType'].map(f)
print "Class III: giants, there are %d stars" %sum(b)

f = lambda s: ('II' in s) and ('III' not in s) and ('VII' not in s)
b = df_clean['SpType'].map(f)
print "Class II:  bright giants, there are %d stars" %sum(b)

f = lambda s: ('I' in s) and ('II' not in s) and ('V' not in s)
b = df_clean['SpType'].map(f)
print "Class I: supergiants, there are %d stars" %sum(b)
Class VII: white dwarfs, there are 1 stars
Class VI: subdwarfs, there are 16 stars
Class V: main-sequence, there are 24683 stars
Class IV: subgiants, there are 7955 stars
Class III: giants, there are 22519 stars
Class II:  bright giants, there are 1239 stars
Class I: supergiants, there are 937 stars

As you can see below, about half of the stars in the catalog have no luminosity class encoded in the SpType field:

In [23]:
f = lambda s: ('I' not in s) and ('V' not in s)
b = df_clean['SpType'].map(f)
print sum(b)
55605

However, there's yet a significant number of entries in the catalog that do contain information of the stellar class, so it makes sense to show the position of each of these luminosity classes in the H-R diagram

In [24]:
def plot_lum_class(b,c, label):
    ''' b: boolean Series to make the selection
        c: Color
        label: for the legend
    '''
    x = df_clean['B-V'][b]
    y = df_clean['M_V'][b]
    ax.scatter(x, y, c = c, s=6, edgecolors='none', label = label)

fig = plt.figure(figsize=(8,10))
ax = fig.add_subplot(111, axisbg='0.6')

ax.set_xlim(-0.5, 2.5)
ax.set_ylim(15, -15)
ax.grid()
ax.set_title('H-R Diagram \n (Hipparcos catalog)')

ax.title.set_fontsize(20)
ax.set_xlabel('Color index B-V')
ax.xaxis.label.set_fontsize(20)
ax.set_ylabel('Absolute magnitude')
ax.yaxis.label.set_fontsize(20)

f = lambda s: 'VII' in s
b = df_clean['SpType'].map(f)
plot_lum_class(b,'white', 'VII: white dwarfs')

f = lambda s: ('VI' in s) and ('VII' not in s)
b = df_clean['SpType'].map(f)
plot_lum_class(b,'blue', 'VI: subdwarfs')

f = lambda s: ('V' in s) and ('VI' not in s) and ('IV' not in s)
b = df_clean['SpType'].map(f)
plot_lum_class(b,'black', 'V: main-sequence')

f = lambda s: 'IV' in s
b = df_clean['SpType'].map(f)
plot_lum_class(b,'grey', 'IV: subgiants')

f = lambda s: 'III' in s
b = df_clean['SpType'].map(f)
plot_lum_class(b,'green', 'III: giants')

f = lambda s: ('II' in s) and ('III' not in s) and ('VII' not in s)
b = df_clean['SpType'].map(f)
plot_lum_class(b,'orange', 'II: bright giants')

f = lambda s: ('I' in s) and ('II' not in s) and ('V' not in s)
b = df_clean['SpType'].map(f)
plot_lum_class(b,'yellow', 'I: supergiants')

ax.tick_params(axis='both', labelsize=14)
legend = ax.legend(scatterpoints=1,markerscale = 6, shadow=True)
frame = legend.get_frame()
frame.set_facecolor('0.90')

This concludes this article. Do not hesitate to send me your comments. Until next time!

This entry has been created entirely with IPyhon Notebook