The color of celestial objects: part III¶
Author: Eduardo Martín Calleja
This post is the third that I dedicate to the color theme (I hope that you are not bored yet of the subject). You can find the two previous ones at:
The color of celestial objects: part I
The color of celestial objects: part II
In this third post we will explore the database of the Sloan Digital Sky Survey SDSS, whose photometric system is different from the photometric UBV system, which is what we have been using until now. We will use the techniques to send SQL queries from Python to the SDSS database that were explained in the post An easy way to make SQL queries from Python to the SDSS database. If you are going to follow this post, you should maybe have a look at it before.
The aim I pursue is to access all the stars in the SDSS database having a spectral classification in the MK system (Morgan-Keenan). The reason is that the catalog we used in the previous post, the "Bright Star Catalogue" was limited to cover only the stars that can be visible to the naked eye, thus there were a large number of stars left outside the scope of our study, and the proportion of each class of stars was biased. On the other hand, the SDSS database is an endless source of information, so I'm interested in getting to know it better.
However, to make this post not very extensive, I will focus only on the main sequence stars.
This entry was written entirely, like all previous ones, with the IPython Notebook and in it we will see in detail the process of production, debugging and graphical representation of data.
Imports and references¶
The SDSS uses a system of five color filters whose main features, which I found summarized in this thesis are:
\[ \begin{array}{|l|c|c|} \text{Filter} & \lambda_{eff} (A) & \Delta \lambda \\\\ \hline \\\\ \text{Ultraviolet (u)} & 3540 & 599 \\\\ \text{Green (g)} & 4770 & 1379 \\\\ \text{Red (r)} & 6222 & 1382 \\\\ \text{Near Infrared (i)} & 7632 & 1535 \\\\ \text{Infrared (z)} & 9049 & 1370 \\\\ \end{array} \]
The SDSS color filters in the visible light band are "g" and "r", so the magnitude "g-r" is often used to indicate the color in the visible spectrum.
These are the Python libraries that we are going to use:
%matplotlib inline
from __future__ import division
import numpy as np
import matplotlib.pyplot as plt
import mechanize
import StringIO
import pandas as pd
import quantities as pq
# This IPython magic generates a table with version information
#https://github.com/jrjohansson/version_information
%load_ext version_information
%version_information numpy, matplotlib, pandas, quantities
Preparation to the access to the SDSS database¶
# URL SQL Search DR10
url = "http://skyserver.sdss3.org/dr10/en/tools/search/sql.aspx"
# Open a Browser instance to make the queries
br = mechanize.Browser()
Let's define a function to make the database SQL queries more easily. It will return a Pandas dataframe
def SDSS_select(sql):
br.open(url)
br.select_form(name="sql")
br['cmd'] = sql
br['format']=['csv']
response = br.submit()
file_like = StringIO.StringIO(response.get_data())
return pd.read_csv(file_like, skiprows=1)
Creating a dataframe with stars on the main sequence¶
As we want to obtain both photometric and spectrum (the spectral classification of the star) data, the most appropriate table in the SDSS database can be SpecPhoto view. From it we will be interested in particular in the data fields with the magnitudes in each filter, the "class" and "type" to select stars only, and the "subclass" with the information on the spectral classification of the star.
We will prepare a SQL query to get a dataframe with the stars which, according to their spectra data, were classified in the SDSS database as belonging to the main sequence.
Keep in mind that the "type" field sets a classification of objects according to their morphology: type = 6 means "STAR". Instead, "class" classifies objects according to their spectrum into three categories: "GALAXY", "QSO" or "STAR". Thus, objects with a "type" equal to 6 in the database can be classified spectrographically as stars, galaxies or quasars. Conversely, although the spectral classification is safer, in the database there are also some object with "class" equal to "STAR" and with "types" such as 0 (UNKNOWN) or 3 (GALAXY). So the best to make sure that we are selecting only stars is to search with type = 6 and class = 'STAR'
We know from the previous post that in the Morgan-Keenan system the stars of the main sequence are those that have a luminosity class of "V". So, let's prepare a SQL query to find precisely the stars of this class:
SQL_V = ' \
SELECT \
objID, subclass, flags, zwarning, \
modelMag_u, modelMag_g, modelMag_r, \
modelMag_i, modelMag_z \
FROM SpecPhoto \
WHERE \
class = "STAR" \
AND \
subclass LIKE "%V%" \
AND \
subclass NOT LIKE "%IV%" \
AND \
subclass NOT LIKE "%VI%" \
AND \
type = 6'
And now we can run the query to the database:
df_V = SDSS_select(SQL_V)
df_V.shape
Let's change the column names and have a look:
df_V.columns = ['objID','subclass','flags','zwarning','u','g','r','i','z']
df_V.head()
In this selection I have included, as you can see, the "zwarning" field just in case we are interested to filter by its values. In the description of the table in the Schema Browser it is indicated that a value of zero in this field indicates absence of problems. Let's see how many entries there are in this field different from zero:
df_V[df_V['zwarning']!=0].shape
This is a respectable amount. To be sure that the data have no problems, we will remove all these cases from our dataframe .
df_V = df_V[df_V['zwarning']==0]
And we remove the "zwarning" column, no longer needed.
df_V = df_V.drop('zwarning', axis=1)
df_V.shape
Let's assign the first column as the dataframe index:
df_V = df_V.set_index('objID')
df_V.head()
Now, we will examine below the flags to see if we have to exclude more entries from our dataframe
Remove dubious data using the flags¶
In this section we'll become familiar with the way to deal with the flags field, and use them to clean the dataframe. As an example we will analyze now the flags of one object in the dataframe
flg = df_V.loc[1237662640123281773,'flags']
# Show in hexa
hex(flg)
Let's see what they mean. To do this it is best to make a query directly on the database. Go to the page SQL Search and enter the following query:
We will obtain as a result the names of the three flags that correspond to the three bits with value 1 in the "flags" field of the object:
Come now to the description of the flags: for this we return to the SQL Search page and write:
In response a description of each flag is obtained. In particular:
You can check that these hexadecimal characters match the value that we obtained from the "flags" field of the first object in the dataframe
The flags are also explained in the page Summary table, grouped by concepts, which is quite useful. To interpret which bit corresponds to each flag note that there are two 32-bit flag fields, that in the SpecPhoto table are grouped into a single 64-bit value. The 32-bit field to which each bit flag belongs is indicated with the notation 1/2 or 2/2. According to the summary table, the flags that indicate possible problems with the photometry are:
So we're going to build a mask to filter all these cases:
mask = 2**0 + 2**5 + 2**(14+32) + 2**7 + 2**19 + 2**(26+32) + 2**24 + 2**22
Let's see how many cases are there in our dataframe having either of these bits activated:
f = lambda s: (s & mask) == 0
b =df_V['flags'].map(f)
b.value_counts()
We see that there are 4660 cases with any of these flags on. We will delete them:
df_V_clean =df_V[b]
df_V_clean.shape
Main sequence g-r vs. u-g color-color diagram¶
Let's see first what are the spectral classes represented in the dataframe. It suffices to take the first character of the field "subclass"
f = lambda s: s[0]
clases = df_V_clean['subclass'].map(f)
clases.value_counts()
We see that the "O" class is missing and there is apparently a class that starts with "C". Let's see what spectral class is this:
b = df_V_clean['subclass'].apply(lambda s : s[0] == 'C')
df_V_clean['subclass'][b].value_counts()
Oh: There are 4006 objects whose spectrum has been classified as "CV". They are "cataclysmic variable stars", which must be excluded from our dataframe, because we want it only to contain main sequence stars.
df_V_clean = df_V_clean[~b]
df_V_clean.shape
Furthermore, in order to have more complete color-color diagrams, we will incorporate the class "O" stars to our dataframe
# We prepare the query
SQL_O = ' \
SELECT \
objID, subclass, flags, \
modelMag_u, modelMag_g, modelMag_r, \
modelMag_i, modelMag_z \
FROM SpecPhoto \
WHERE \
class = "STAR" \
AND \
subclass LIKE "O%" \
AND \
type = 6 \
AND \
zwarning = 0'
# And we execute it
df_O = SDSS_select(SQL_O)
df_O.columns = ['objID','subclass','flags','u','g','r','i','z']
df_O = df_O.set_index('objID')
df_O.shape
# remove the stars with "bad" flags:
f = lambda s: (s & mask) == 0
b =df_O['flags'].map(f)
df_O_clean =df_O[b]
df_O_clean.shape
Finally we'll concatenate the two dataframes, creating a new dataframe "df" that is the one we will work with:
df = pd.concat([df_V_clean, df_O_clean])
df.shape
Let's expand the dataframe with new columns with the color indices:
df['u-g'] = df['u'] - df['g']
df['g-r'] = df['g'] - df['r']
df['r-i'] = df['r'] - df['i']
df['i-z'] = df['i'] - df['z']
To distinguish the various spectral classes in the color-color diagram , we create now a dictionary of colors:
colors = {'O':'#0000FF', 'B':'#CCCCFF', 'A':'#FFFFFF',
'F':'#FFFFB2', 'G':'#FFFF00', 'K':'#FFA500',
'M':'#FF0000'}
And now we are ready to generate the color-color diagram
fig = plt.figure(figsize=(10,10))
ax = fig.add_subplot(111, axisbg='0.6')
ax.set_xlim(-1, 6)
ax.set_ylim(-1, 2.5)
ax.grid()
ax.set_title('Main sequence: g-r vs. u-g')
ax.title.set_fontsize(20)
ax.set_xlabel('u-g')
ax.xaxis.label.set_fontsize(20)
ax.set_ylabel('g-r')
ax.yaxis.label.set_fontsize(20)
for cls in 'OBAFGKM':
b = df['subclass'].map(lambda s: s[0] ==cls)
x = df[b]['u-g']
y = df[b]['g-r']
c = colors[cls]
ax.scatter(x, y, c = c, s=6, edgecolors='none', label = cls)
legend = ax.legend(scatterpoints=1,markerscale = 6, shadow=True)
frame = legend.get_frame()
frame.set_facecolor('0.90')
As the colors overlap, it is not easy to see where each spectral class is located, so we will make a mosaic of plots, one for each spectral class. Let's start writing a function to plot each class:
def plot_gr_ug(df, cls, axes):
x = df['u-g']
y = df['g-r']
axes.scatter(x, y, c = colors[cls], s=6, edgecolors='none')
axes.set_xlim(-1, 6)
axes.set_ylim(-1,2.5)
axes.set_xlabel('u-g')
axes.set_ylabel('g-r')
axes.grid()
axes.set_title('Main sequence: class %s' %cls);
Furthermore, we also want to represent the line that indicates the color-color diagram of a black body, to see how it is positioned relative to the locus of each class. For that, we'll need some additional functions, as seen in previous posts:
# Defining the SDSS photometric system constants
lambda_u = 3540 * pq.angstrom
delta_u = 599 * pq.angstrom
lambda_g = 4770 * pq.angstrom
delta_g = 1379 * pq.angstrom
lambda_r = 6222 * pq.angstrom
delta_r = 1382 * pq.angstrom
lambda_i = 7632 * pq.angstrom
delta_i = 1535 * pq.angstrom
lambda_z = 9049 * pq.angstrom
delta_u = 1370 * pq.angstrom
def B(wl,T):
'''wl is an array of wavelengths with units of length
T is a temperature in Kelvin
the result is an array of spectral radiances
with units W/(m**2 * nm * sr)
'''
I = 2 * pq.constants.h * (pq.c)**2 / wl**5 * \
1 / (np.exp((pq.constants.h*pq.c \
/ (wl*pq.constants.k*T)).simplified)-1)
return I.rescale(pq.watt/(pq.m**2 * pq.nm *pq.sr))
We now define the functions that will allow us to calculate the color indices in the SDSS photometric system for a black body at a given temperature:
# We approximate the received flux as the product of th spectral radiance
# B(lambda, T) by the width delta lambda
def get_UG(T):
F = B(lambda_g, T) * delta_g/(B(lambda_u, T)* delta_u)
return 2.5 * np.log10(F)
def get_GR(T):
F = B(lambda_r, T) * delta_r/(B(lambda_g, T)* delta_g)
return 2.5 * np.log10(F)
def get_RI(T):
F = B(lambda_i, T) * delta_i/(B(lambda_r, T)* delta_r)
return 2.5 * np.log10(F)
And finally we will generate a mosaic of color-color diagrams "g-r" over "r-i" for each of the spectral classes present in the main sequence, in which we will also represent the line that corresponds to a perfect black body.
fig = plt.figure(figsize=(10,20))
for i in range(0,7):
c ='OBAFGKM'
ax1 = fig.add_subplot(4,2,i+1, axisbg='0.4')
cls = c[i]
b = df['subclass'].map(lambda s: s[0] ==cls)
plot_gr_ug(df[b],cls=cls,axes=ax1)
T1 = 2000 * pq.kelvin
T2 = 15000 * pq.kelvin
GmenosR_1 = get_GR(T1)
UmenosG_1 = get_UG(T1)
GmenosR_2 = get_GR(T2)
UmenosG_2 = get_UG(T2)
# Plot the "blackbody" line
ax1.plot([UmenosG_1, UmenosG_2], [GmenosR_1, GmenosR_2], c='k',ls='--')
It can be seen how the main sequence stars have a range of characteristic values in its color index in the visible region, the "g-r". As we move forward in the "OBAFGKM" sequence, the color index is taking larger values, ie, the stars become less bright in that area of the spectrum.
However, its color index in the ultraviolet region, "u-g" presents a much greater dispersion. Just look at the elongated shape of the locus, and also note that the range of the scale in the abscissa is also wider. It becomes really difficult to distinguish between the classes F, G, K and M if we look only to that index.
Another remark is that the "O" class stars, being the hottest, present values of the color indices very close to what is the black body radiation patern.
Main sequence r-i vs. g-r color-color diagram¶
Let's define the function to create the mosaic:
def plot_ri_gr(df, cls, axes):
x = df['g-r']
y = df['r-i']
axes.scatter(x, y, c = colors[cls], s=6, edgecolors='none')
axes.set_xlim(-1, 2.5)
axes.set_ylim(-1, 2)
axes.set_xlabel('g-r')
axes.set_ylabel('r-i')
axes.grid()
axes.set_title('Main sequence: class %s' %cls);
fig = plt.figure(figsize=(10,20))
for i in range(0,7):
c ='OBAFGKM'
ax2 = fig.add_subplot(4,2,i+1, axisbg='0.4')
cls = c[i]
b = df['subclass'].map(lambda s: s[0] ==cls)
plot_ri_gr(df[b],cls=cls,axes=ax2)
T1 = 2000 * pq.kelvin
T2 = 15000 * pq.kelvin
GmenosR_1 = get_GR(T1)
RmenosI_1 = get_RI(T1)
GmenosR_2 = get_GR(T2)
RmenosI_2 = get_RI(T2)
ax2.plot([GmenosR_1, GmenosR_2], [RmenosI_1, RmenosI_2], c='k', ls='--')
In this case I find it interesting to note that the center of the locus of each spectral class is moving along the line of the blackbody from the area of higher temperature to the lower temperature region. Again the class O" stars exhibit color index paterns very close to a perfect blackbody.
I will end here this third entry on the color of stars and color-color diagrams. Until the next post ...