Construction of Hertzsprung-Russell diagrams in Python¶
Author: Eduardo Martín Calleja
The purpose of this post is to construct Hertsprung-Russel diagrams, and place in it different kinds of stars.
Imports and references¶
%matplotlib inline
from __future__ import division
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
# This IPython magic generates a table with version information
#https://github.com/jrjohansson/version_information
%load_ext version_information
%version_information numpy, matplotlib, pandas
An interesting educational activity about the H-R diagrams is The Hertzsprung-Russell Diagram in the Sloan Digital Sky Survey III (SDSS III).
Some theory¶
Some relevant facts in the development of a Hertsprung-Russel diagram (H-R diagram in short) are:
On the one hand there is a relationship between the absolute magnitude of a star and its luminosity, which has been described in this post: The color of celestial objects: part I
There is also a relationship between the spectral class of the star (O B A F G K M), its surface temperature and its color index (eg. B-V). This relationship between color index and temperature can also be seen in the previous post. Regarding the relationship between spectral classes and surface temperatures, you can read my post The color of celestial objects: part II
The purpose of the H-R diagram is to show the relationship between both groups of magnitudes. The "classic" H-R diagram shows the spectral class of the stars (or the corresponding color index B-V) on the horizontal axis and the absolute magnitudes on the vertical axis.
The position of a star in the H-R diagram tells us something about its characteristics. Thus for example if a star has a spectral class indicating a "cold" surface temperature (for example classes K or M), but its absolute magnitude / luminosity is high, this indicates that the star has a large surface, whereby the big radiative surface "compensates" the low intensity of the radiation due to its low temperature. Ie it would be a "giant" star or "supergiant". As the color indices corresponding to cold temperatures tend to red, we would speak for instance of "red giants".
The HIPPARCOS catalog¶
The biggest difficulty to build an H-R diagram is to determine the luminosity or absolute magnitude of the stars, as this involves knowing their distances. In this sense, the catalog of celestial objects SDSS I've been using so far in previous posts is not the most suitable because its main purpose is the photometry of extragalactic objects, providing little information of stars in our galaxy, and certainly lacking accurate information on the distance that separates us from them.
For this reason, the catalog of stars that we will use is HIPPARCOS, given that the priomordial objective of this mission was precisely to make accurate measurements of about 120000 stars. Furthermore, the information in this catalog includes the spectral class of the vast majority of these objects, so it is the most suitable source that we can use to make an H-R diagram.
A description of the fields in the catalog can be downloaded in pdf format at the following page: Hipparcos Overview. In particular, for this post, we are interested in the following fields:
- H1 Identifier (HIP)
- H5 V Johnson magnitude (Vmag)
- H11 Trigonometric parallax, expressed in units of milliarcsec (Plx)
- H37 Colour index (B-V)
- H76 Spectral type (SpType)
The absolute magnitudes can be calculated from the values of V. For this you can use the relationship between the absolute magnitude \(M\), the apparent magnitude \(m\) and the parallax \(p\) parallax in arcseconds. See for example the wikipedia article Absolute magnitude:
\[ M = m + 5 (1 + \log_{10} p )\]
If parallax is expressed in milliarcseconds (Plx), and taking as apparent magnitude the value \(V\) in the catalog, we obtain:
\[ M = Vmag + 5 (1 + \log_{10} \frac{Plx}{1000}) \]
That once simplified gives:
\[ M = Vmag + 5 * log_{10} \frac{Plx}{100} \]
To download the catalog you must access the VizieR Search Page for the catalog I/239 from the Strasbourg astronomical Data Center (CDS). Select only the fields listed above: HIP, Vmag, Plx, BV, SpType. In the left pane, under "Preferences" select "unlimited" and "; -Separated Values" Press the "Submit" button and download the file, which I called I_239_selection.tsv and occupies only 3.8 MB. Opening it with a text editor is found that it is a text file with a header of a few lines explanatory of the contents, followed by 118322 entries with the five selected fields separated by the ; character. At the end of the file there is also a blank line we should skip when reading the file.
Now I will read the file creating a Pandas DataFrame:
filename = '../Hipparcos/I_239_selection.tsv'
df = pd.read_table(filename, skiprows=44, sep=';', header=None, index_col=0,
names = ['HIP', 'Vmag', 'Plx', 'B-V', 'SpType'],
skipfooter=1, engine='python')
df.head()
df.tail()
Preparation of the DataFrame with the data¶
Now let's work on this dataframe. First, we will analyze whether there are entries in the catalog without valid data (eg with NaN values)
df.describe()
The "count" value indicates that in some of the 118322 entries in the catalog there are missing values, and it happens in each of the three columns. They are marked in the dataframe as nonexistent values (NaN). We will identify and eliminate those columns.
But before we clean these NaN values, we will change any field with spaces to NaN values, since blank spaces are not considered by Pandas as NaN, but nevertheless should be eliminated, because we can not perform calculations with them (the method below is taken from the following answer in Stackoverflow
df_clean = df.applymap(lambda x: np.nan if isinstance(x, basestring)
and x.isspace() else x)
df_clean.describe()
Finally, all rows that contain a NaN value will be removed
df_clean= df_clean.dropna()
df_clean.describe()
df_clean.shape
It is found that the non-zero counter values now match the number of lines in the dataframe. From now on we will work with the "df_clean" dataframe.
Next we are going to calculate, by the above formula, the absolute magnitude in the V filter, which we will call 'M_V'. I will add a new column to the dataframe, but previously I will change the columns of the dataframe with which we plan to operate, converting them from strings to numeric values.
df_clean['Vmag'] = df_clean['Vmag'].astype(np.float)
df_clean['Plx'] = df_clean['Plx'].astype(np.float)
df_clean['B-V'] = df_clean['B-V'].astype(np.float)
# Add a new column with the absolute magnitude
df_clean['M_V'] = df_clean['Vmag'] + 5 * np.log10(df_clean['Plx']/100.)
df_clean.head()
Next let's study the spectral types present in the dataframe. Really the only information needed at this time from the SpType column are the first two characters. Furthermore we will be only interested in those cases where the first character is alphabetic and the second is a number, like G8 or F5. After removing the anomalies, a new column is created with only the first two characters:
# Rows that do not meet the condition alpha + num are eliminated
f = lambda s: (len(s) >= 2) and (s[0].isalpha()) and (s[1].isdigit())
i = df_clean['SpType'].apply(f)
df_clean = df_clean[i]
# A new column is created with the first two characters from 'SpType'
f = lambda s: s[0:2]
df_clean['SpType2'] = df_clean['SpType'].apply(f)
df_clean.shape
df_clean.head()
Let's see what are the spectral classes present in the dataframe
f = lambda s: s[0]
clases = df_clean['SpType'].map(f)
clases.value_counts()
We'll now remove lines with Special Classes C N, R and S to keep only those in the sequence OBAFGKM
f = lambda s: s[0] in 'OBAFGKM'
df_clean = df_clean[df_clean['SpType'].map(f)]
Check it:
f = lambda s: s[0]
clases = df_clean['SpType'].map(f)
clases.value_counts()
The purpose is to make a graph where on the horizontal axis are the spectral classes included in the Morgan-Keenan system (MKK) "OBAFGKM", followed by a digit 0-9. In addition we want that the spectral classes appear ordered precisely in this order (B5 before A0...). So I'm going to replace the letters by digits in ascending sequence in the SpType2 column, in order to sort by that column.
orden = {'O':'0', 'B':'1', 'A':'2', 'F':'3', 'G':'4', 'K':'5', 'M':'6'}
f = lambda s: orden[s[0]]+s[1]
df_clean['SpType2'] = df_clean['SpType2'].apply(f)
df_clean.head()
Chart plot¶
fig, ax = plt.subplots(figsize=(8,10))
ax.set_xlim(0, 70)
ax.set_ylim(15, -10)
ax.grid()
ax.set_title('H-R Diagram \n (Hipparcos catalog)')
ax.title.set_fontsize(20)
ax.set_xlabel('spectral class')
ax.xaxis.label.set_fontsize(20)
ax.set_ylabel('Absolute magnitude')
ax.yaxis.label.set_fontsize(20)
ax.scatter(df_clean['SpType2'].astype(np.int), df_clean['M_V'],
s=50, edgecolors='none', alpha=0.015, c='k')
ax.set_xticks(range(5,75,10))
ax.set_xticklabels(['O', 'B', 'A', 'F', 'G', 'K', 'M'])
ax.tick_params(axis='both', labelsize=14)
The diagram above is not very satisfactory, in the sense that as the horizontal axis correspond to a discrete set of values (the different spectral types), vertical stripes are formed by accumulation of observations on these values. It would be preferable to use a continuous scale of decimal values. Therefore we are going to repeat the plot, this time using the color index B-V instead of the spectral classes. Furthermore, it will show that both diagrams have the same shape, confirming the relationship between spectral classes and color indexes.
fig, ax = plt.subplots(figsize=(8,10))
ax.set_xlim(-0.5, 2.5)
ax.set_ylim(15, -10)
ax.grid()
ax.set_title('H-R Diagram \n (Hipparcos catalog)')
ax.title.set_fontsize(20)
ax.set_xlabel('Color index B-V')
ax.xaxis.label.set_fontsize(20)
ax.set_ylabel('Absolute magnitude')
ax.yaxis.label.set_fontsize(20)
ax.scatter(df_clean['B-V'], df_clean['M_V'],
# s=50, edgecolors='none', alpha=0.015, c='k')
s=1, edgecolors='none', c='k')
ax.tick_params(axis='both', labelsize=14)
Luminosity classes¶
In addition to the spectral type OBAFGKM, many stars in the Hipparcos catalog contain, in the field SpType, the luminosity class coded with the Roman numerals I to VII, that indicate the type of star in question. You can read an explanation of this classification system in the Wikipedia: Stellar classification
So let's analyze the presence of these luminosity classes in the HIPPARCOS catalog:
f = lambda s: 'VII' in s
b = df_clean['SpType'].map(f)
print "Class VII: white dwarfs, there are %d stars" %sum(b)
f = lambda s: ('VI' in s) and ('VII' not in s)
b = df_clean['SpType'].map(f)
print "Class VI: subdwarfs, there are %d stars" %sum(b)
f = lambda s: ('V' in s) and ('VI' not in s) and ('IV' not in s)
b = df_clean['SpType'].map(f)
print "Class V: main-sequence, there are %d stars" %sum(b)
f = lambda s: 'IV' in s
b = df_clean['SpType'].map(f)
print "Class IV: subgiants, there are %d stars" %sum(b)
f = lambda s: 'III' in s
b = df_clean['SpType'].map(f)
print "Class III: giants, there are %d stars" %sum(b)
f = lambda s: ('II' in s) and ('III' not in s) and ('VII' not in s)
b = df_clean['SpType'].map(f)
print "Class II: bright giants, there are %d stars" %sum(b)
f = lambda s: ('I' in s) and ('II' not in s) and ('V' not in s)
b = df_clean['SpType'].map(f)
print "Class I: supergiants, there are %d stars" %sum(b)
As you can see below, about half of the stars in the catalog have no luminosity class encoded in the SpType field:
f = lambda s: ('I' not in s) and ('V' not in s)
b = df_clean['SpType'].map(f)
print sum(b)
However, there's yet a significant number of entries in the catalog that do contain information of the stellar class, so it makes sense to show the position of each of these luminosity classes in the H-R diagram
def plot_lum_class(b,c, label):
''' b: boolean Series to make the selection
c: Color
label: for the legend
'''
x = df_clean['B-V'][b]
y = df_clean['M_V'][b]
ax.scatter(x, y, c = c, s=6, edgecolors='none', label = label)
fig = plt.figure(figsize=(8,10))
ax = fig.add_subplot(111, axisbg='0.6')
ax.set_xlim(-0.5, 2.5)
ax.set_ylim(15, -15)
ax.grid()
ax.set_title('H-R Diagram \n (Hipparcos catalog)')
ax.title.set_fontsize(20)
ax.set_xlabel('Color index B-V')
ax.xaxis.label.set_fontsize(20)
ax.set_ylabel('Absolute magnitude')
ax.yaxis.label.set_fontsize(20)
f = lambda s: 'VII' in s
b = df_clean['SpType'].map(f)
plot_lum_class(b,'white', 'VII: white dwarfs')
f = lambda s: ('VI' in s) and ('VII' not in s)
b = df_clean['SpType'].map(f)
plot_lum_class(b,'blue', 'VI: subdwarfs')
f = lambda s: ('V' in s) and ('VI' not in s) and ('IV' not in s)
b = df_clean['SpType'].map(f)
plot_lum_class(b,'black', 'V: main-sequence')
f = lambda s: 'IV' in s
b = df_clean['SpType'].map(f)
plot_lum_class(b,'grey', 'IV: subgiants')
f = lambda s: 'III' in s
b = df_clean['SpType'].map(f)
plot_lum_class(b,'green', 'III: giants')
f = lambda s: ('II' in s) and ('III' not in s) and ('VII' not in s)
b = df_clean['SpType'].map(f)
plot_lum_class(b,'orange', 'II: bright giants')
f = lambda s: ('I' in s) and ('II' not in s) and ('V' not in s)
b = df_clean['SpType'].map(f)
plot_lum_class(b,'yellow', 'I: supergiants')
ax.tick_params(axis='both', labelsize=14)
legend = ax.legend(scatterpoints=1,markerscale = 6, shadow=True)
frame = legend.get_frame()
frame.set_facecolor('0.90')
This concludes this article. Do not hesitate to send me your comments. Until next time!
This entry has been created entirely with IPyhon Notebook