-->

Etiquetas

The color of celestial objects: part III

The color of celestial objects: part III

Author: Eduardo Martín Calleja

This post is the third that I dedicate to the color theme (I hope that you are not bored yet of the subject). You can find the two previous ones at:

The color of celestial objects: part I

The color of celestial objects: part II

In this third post we will explore the database of the Sloan Digital Sky Survey SDSS, whose photometric system is different from the photometric UBV system, which is what we have been using until now. We will use the techniques to send SQL queries from Python to the SDSS database that were explained in the post An easy way to make SQL queries from Python to the SDSS database. If you are going to follow this post, you should maybe have a look at it before.

The aim I pursue is to access all the stars in the SDSS database having a spectral classification in the MK system (Morgan-Keenan). The reason is that the catalog we used in the previous post, the "Bright Star Catalogue" was limited to cover only the stars that can be visible to the naked eye, thus there were a large number of stars left outside the scope of our study, and the proportion of each class of stars was biased. On the other hand, the SDSS database is an endless source of information, so I'm interested in getting to know it better.

However, to make this post not very extensive, I will focus only on the main sequence stars.

This entry was written entirely, like all previous ones, with the IPython Notebook and in it we will see in detail the process of production, debugging and graphical representation of data.

Imports and references

The SDSS uses a system of five color filters whose main features, which I found summarized in this thesis are:

\[ \begin{array}{|l|c|c|} \text{Filter} & \lambda_{eff} (A) & \Delta \lambda \\\\ \hline \\\\ \text{Ultraviolet (u)} & 3540 & 599 \\\\ \text{Green (g)} & 4770 & 1379 \\\\ \text{Red (r)} & 6222 & 1382 \\\\ \text{Near Infrared (i)} & 7632 & 1535 \\\\ \text{Infrared (z)} & 9049 & 1370 \\\\ \end{array} \]

The SDSS color filters in the visible light band are "g" and "r", so the magnitude "g-r" is often used to indicate the color in the visible spectrum.

These are the Python libraries that we are going to use:

In [1]:
%matplotlib inline

from __future__ import division

import numpy as np
import matplotlib.pyplot as plt
import mechanize
import StringIO
import pandas as pd
import quantities as pq

# This IPython magic generates a table with version information
#https://github.com/jrjohansson/version_information
%load_ext version_information
%version_information numpy, matplotlib, pandas, quantities
Out[1]:
SoftwareVersion
Python2.7.9 64bit [GCC 4.4.7 20120313 (Red Hat 4.4.7-1)]
IPython2.3.1
OSLinux 3.13.0 45 generic x86_64 with debian jessie sid
numpy1.9.1
matplotlib1.4.2
pandas0.15.2
quantities0.10.1
Mon Feb 23 23:37:19 2015 CET

Preparation to the access to the SDSS database

In [2]:
# URL SQL Search DR10
url = "http://skyserver.sdss3.org/dr10/en/tools/search/sql.aspx"
# Open a Browser instance to make the queries
br = mechanize.Browser()

Let's define a function to make the database SQL queries more easily. It will return a Pandas dataframe

In [3]:
def SDSS_select(sql):
    br.open(url)
    br.select_form(name="sql")
    br['cmd'] = sql
    br['format']=['csv']
    response = br.submit()
    file_like = StringIO.StringIO(response.get_data())
    return pd.read_csv(file_like,  skiprows=1)

Creating a dataframe with stars on the main sequence

As we want to obtain both photometric and spectrum (the spectral classification of the star) data, the most appropriate table in the SDSS database can be SpecPhoto view. From it we will be interested in particular in the data fields with the magnitudes in each filter, the "class" and "type" to select stars only, and the "subclass" with the information on the spectral classification of the star.

We will prepare a SQL query to get a dataframe with the stars which, according to their spectra data, were classified in the SDSS database as belonging to the main sequence.

Keep in mind that the "type" field sets a classification of objects according to their morphology: type = 6 means "STAR". Instead, "class" classifies objects according to their spectrum into three categories: "GALAXY", "QSO" or "STAR". Thus, objects with a "type" equal to 6 in the database can be classified spectrographically as stars, galaxies or quasars. Conversely, although the spectral classification is safer, in the database there are also some object with "class" equal to "STAR" and with "types" such as 0 (UNKNOWN) or 3 (GALAXY). So the best to make sure that we are selecting only stars is to search with type = 6 and class = 'STAR'

We know from the previous post that in the Morgan-Keenan system the stars of the main sequence are those that have a luminosity class of "V". So, let's prepare a SQL query to find precisely the stars of this class:

In [4]:
SQL_V = '                            \
SELECT                               \
 objID, subclass, flags, zwarning,   \
 modelMag_u, modelMag_g, modelMag_r, \
 modelMag_i, modelMag_z              \
FROM SpecPhoto                       \
WHERE                                \
 class = "STAR"                      \
AND                                  \
 subclass LIKE "%V%"                 \
AND                                  \
 subclass NOT LIKE "%IV%"            \
AND                                  \
 subclass NOT LIKE "%VI%"            \
AND                                  \
 type = 6'

And now we can run the query to the database:

In [5]:
df_V = SDSS_select(SQL_V)
df_V.shape
Out[5]:
(93061, 9)

Let's change the column names and have a look:

In [6]:
df_V.columns = ['objID','subclass','flags','zwarning','u','g','r','i','z']
df_V.head()
Out[6]:
objID subclass flags zwarning u g r i z
0 1237662341093851576 M0V 35253360132368 0 22.92463 19.55098 17.77137 17.11437 16.70092
1 1237662666966827586 CV 68987912192 0 20.51780 20.07429 20.11576 20.08739 19.65723
2 1237661463312008226 M0V 35253360001040 4 22.54791 20.90471 20.14267 19.89848 19.72298
3 1237662641195647695 M0V 35253360136464 0 24.59499 21.81905 20.08079 19.36764 18.95300
4 1237661087495749970 M0V 68987912960 0 25.16913 21.48537 19.66987 18.91488 18.45601

In this selection I have included, as you can see, the "zwarning" field just in case we are interested to filter by its values. In the description of the table in the Schema Browser it is indicated that a value of zero in this field indicates absence of problems. Let's see how many entries there are in this field different from zero:

In [7]:
df_V[df_V['zwarning']!=0].shape
Out[7]:
(13668, 9)

This is a respectable amount. To be sure that the data have no problems, we will remove all these cases from our dataframe .

In [8]:
df_V = df_V[df_V['zwarning']==0]

And we remove the "zwarning" column, no longer needed.

In [9]:
df_V = df_V.drop('zwarning', axis=1)
df_V.shape
Out[9]:
(79393, 8)

Let's assign the first column as the dataframe index:

In [10]:
df_V = df_V.set_index('objID')
df_V.head()
Out[10]:
subclass flags u g r i z
objID
1237662341093851576 M0V 35253360132368 22.92463 19.55098 17.77137 17.11437 16.70092
1237662666966827586 CV 68987912192 20.51780 20.07429 20.11576 20.08739 19.65723
1237662641195647695 M0V 35253360136464 24.59499 21.81905 20.08079 19.36764 18.95300
1237661087495749970 M0V 68987912960 25.16913 21.48537 19.66987 18.91488 18.45601
1237661126151045397 CV 68987912704 20.01366 20.22576 20.17420 20.05289 19.52105

Now, we will examine below the flags to see if we have to exclude more entries from our dataframe

Remove dubious data using the flags

In this section we'll become familiar with the way to deal with the flags field, and use them to clean the dataframe. As an example we will analyze now the flags of one object in the dataframe

In [11]:
flg = df_V.loc[1237662640123281773,'flags']
# Show in hexa
hex(flg)
Out[11]:
'0x1010000200L'

Let's see what they mean. To do this it is best to make a query directly on the database. Go to the page SQL Search and enter the following query:

SELECT objID, dbo.fPhotoFlagsN(flags) FROM SpecPhoto WHERE objID= "1237662640123281773"

We will obtain as a result the names of the three flags that correspond to the three bits with value 1 in the "flags" field of the object:

1237662640123281773,STATIONARY BINNED1 MANYPETRO

Come now to the description of the flags: for this we return to the SQL Search page and write:

Select * from PhotoFlags

In response a description of each flag is obtained. In particular:

STATIONARY,0x0000001000000000,This object was consistent with being stationary BINNED1,0x0000000010000000,Object was detected in 1x1 binned image MANYPETRO,0x0000000000000200,More than one Petrosian radius was found.

You can check that these hexadecimal characters match the value that we obtained from the "flags" field of the first object in the dataframe

The flags are also explained in the page Summary table, grouped by concepts, which is quite useful. To interpret which bit corresponds to each flag note that there are two 32-bit flag fields, that in the SpecPhoto table are grouped into a single 64-bit value. The 32-bit field to which each bit flag belongs is indicated with the notation 1/2 or 2/2. According to the summary table, the flags that indicate possible problems with the photometry are:

- CANONICAL_CENTER: bit 0 (1/2) - PEAKCENTER: bit 5 (1/2) - DEBLEND_NOPEAK: bit 14 (2/2) - NOPROFILE: bit 7 (1/2) - NOTCHECKED: bit 19 (1/2) - NOTCHECKED_CENTER: bit 26 (2/2) - TOO_LARGE: bit 24 (1/2) - BADSKY: bit 22 (1/2)

So we're going to build a mask to filter all these cases:

In [12]:
mask = 2**0 + 2**5 + 2**(14+32) + 2**7 + 2**19 + 2**(26+32) + 2**24 + 2**22

Let's see how many cases are there in our dataframe having either of these bits activated:

In [13]:
f = lambda s: (s & mask) == 0
b =df_V['flags'].map(f)
b.value_counts()
Out[13]:
True     74733
False     4660
dtype: int64

We see that there are 4660 cases with any of these flags on. We will delete them:

In [14]:
df_V_clean =df_V[b]
df_V_clean.shape
Out[14]:
(74733, 7)

Main sequence g-r vs. u-g color-color diagram

Let's see first what are the spectral classes represented in the dataframe. It suffices to take the first character of the field "subclass"

In [15]:
f = lambda s: s[0]
clases = df_V_clean['subclass'].map(f)
clases.value_counts()
Out[15]:
F    40908
K    11501
G     8485
A     6184
C     4006
M     2421
B     1228
dtype: int64

We see that the "O" class is missing and there is apparently a class that starts with "C". Let's see what spectral class is this:

In [16]:
b = df_V_clean['subclass'].apply(lambda s : s[0] == 'C')
df_V_clean['subclass'][b].value_counts()
Out[16]:
CV    4006
dtype: int64

Oh: There are 4006 objects whose spectrum has been classified as "CV". They are "cataclysmic variable stars", which must be excluded from our dataframe, because we want it only to contain main sequence stars.

In [17]:
df_V_clean = df_V_clean[~b]
df_V_clean.shape
Out[17]:
(70727, 7)

Furthermore, in order to have more complete color-color diagrams, we will incorporate the class "O" stars to our dataframe

In [18]:
# We prepare the query
SQL_O = '                            \
SELECT                               \
 objID, subclass, flags,             \
 modelMag_u, modelMag_g, modelMag_r, \
 modelMag_i, modelMag_z              \
FROM SpecPhoto                       \
WHERE                                \
 class = "STAR"                      \
AND                                  \
 subclass LIKE "O%"                  \
AND                                  \
 type = 6                            \
AND                                  \
 zwarning = 0'
In [19]:
# And we execute it
df_O = SDSS_select(SQL_O)
df_O.columns = ['objID','subclass','flags','u','g','r','i','z']
df_O = df_O.set_index('objID')
df_O.shape
Out[19]:
(1748, 7)
In [20]:
# remove the stars with "bad" flags:
f = lambda s: (s & mask) == 0
b =df_O['flags'].map(f)
df_O_clean =df_O[b]
df_O_clean.shape
Out[20]:
(1694, 7)

Finally we'll concatenate the two dataframes, creating a new dataframe "df" that is the one we will work with:

In [21]:
df = pd.concat([df_V_clean, df_O_clean])
df.shape
Out[21]:
(72421, 7)

Let's expand the dataframe with new columns with the color indices:

In [22]:
df['u-g'] = df['u'] - df['g']
df['g-r'] = df['g'] - df['r']
df['r-i'] = df['r'] - df['i']
df['i-z'] = df['i'] - df['z']

To distinguish the various spectral classes in the color-color diagram , we create now a dictionary of colors:

In [23]:
colors = {'O':'#0000FF', 'B':'#CCCCFF', 'A':'#FFFFFF',
          'F':'#FFFFB2', 'G':'#FFFF00', 'K':'#FFA500',
          'M':'#FF0000'}

And now we are ready to generate the color-color diagram

In [24]:
fig = plt.figure(figsize=(10,10))
ax = fig.add_subplot(111, axisbg='0.6')

ax.set_xlim(-1, 6)
ax.set_ylim(-1, 2.5)
ax.grid()
ax.set_title('Main sequence: g-r vs. u-g')

ax.title.set_fontsize(20)
ax.set_xlabel('u-g')
ax.xaxis.label.set_fontsize(20)
ax.set_ylabel('g-r')
ax.yaxis.label.set_fontsize(20)

for cls in 'OBAFGKM':
    b = df['subclass'].map(lambda s: s[0] ==cls)
    x = df[b]['u-g']
    y = df[b]['g-r']
    c = colors[cls]
    ax.scatter(x, y, c = c, s=6, edgecolors='none', label = cls)
legend = ax.legend(scatterpoints=1,markerscale = 6, shadow=True)
frame = legend.get_frame()
frame.set_facecolor('0.90')

As the colors overlap, it is not easy to see where each spectral class is located, so we will make a mosaic of plots, one for each spectral class. Let's start writing a function to plot each class:

In [25]:
def plot_gr_ug(df, cls, axes):
    x = df['u-g']
    y = df['g-r']
    axes.scatter(x, y, c = colors[cls], s=6, edgecolors='none')
    axes.set_xlim(-1, 6)
    axes.set_ylim(-1,2.5)
    axes.set_xlabel('u-g')
    axes.set_ylabel('g-r')
    axes.grid()
    axes.set_title('Main sequence: class %s' %cls);

Furthermore, we also want to represent the line that indicates the color-color diagram of a black body, to see how it is positioned relative to the locus of each class. For that, we'll need some additional functions, as seen in previous posts:

In [26]:
#  Defining the SDSS photometric system constants
lambda_u = 3540 * pq.angstrom
delta_u = 599 * pq.angstrom
lambda_g = 4770 * pq.angstrom
delta_g = 1379 * pq.angstrom
lambda_r = 6222 * pq.angstrom
delta_r = 1382 * pq.angstrom
lambda_i = 7632 * pq.angstrom
delta_i = 1535 * pq.angstrom
lambda_z = 9049 * pq.angstrom
delta_u = 1370 * pq.angstrom
In [27]:
def B(wl,T):
    '''wl is an array of wavelengths with units of length
    T is a temperature in Kelvin
    the result is an array of spectral radiances
    with units W/(m**2 * nm * sr)
    '''
    I = 2 * pq.constants.h * (pq.c)**2 / wl**5 *  \
        1 / (np.exp((pq.constants.h*pq.c \
        / (wl*pq.constants.k*T)).simplified)-1)
    return I.rescale(pq.watt/(pq.m**2 * pq.nm *pq.sr))

We now define the functions that will allow us to calculate the color indices in the SDSS photometric system for a black body at a given temperature:

In [28]:
# We approximate the received flux as the product of th spectral radiance
# B(lambda, T) by the width delta lambda

def get_UG(T):    
    F = B(lambda_g, T) * delta_g/(B(lambda_u, T)* delta_u)
    return 2.5 * np.log10(F)

def get_GR(T):
    F = B(lambda_r, T) * delta_r/(B(lambda_g, T)* delta_g)
    return 2.5 * np.log10(F)

def get_RI(T):
    F = B(lambda_i, T) * delta_i/(B(lambda_r, T)* delta_r)
    return 2.5 * np.log10(F)

And finally we will generate a mosaic of color-color diagrams "g-r" over "r-i" for each of the spectral classes present in the main sequence, in which we will also represent the line that corresponds to a perfect black body.

In [29]:
fig = plt.figure(figsize=(10,20))
for i in range(0,7):
    c ='OBAFGKM'
    ax1 = fig.add_subplot(4,2,i+1, axisbg='0.4')
    cls = c[i]
    b = df['subclass'].map(lambda s: s[0] ==cls)
    plot_gr_ug(df[b],cls=cls,axes=ax1)
    T1 = 2000 * pq.kelvin
    T2 = 15000 * pq.kelvin
    GmenosR_1 = get_GR(T1)
    UmenosG_1 = get_UG(T1)
    GmenosR_2 = get_GR(T2)
    UmenosG_2 = get_UG(T2)
    # Plot the "blackbody" line
    ax1.plot([UmenosG_1, UmenosG_2], [GmenosR_1, GmenosR_2], c='k',ls='--')

It can be seen how the main sequence stars have a range of characteristic values in its color index in the visible region, the "g-r". As we move forward in the "OBAFGKM" sequence, the color index is taking larger values, ie, the stars become less bright in that area of the spectrum.

However, its color index in the ultraviolet region, "u-g" presents a much greater dispersion. Just look at the elongated shape of the locus, and also note that the range of the scale in the abscissa is also wider. It becomes really difficult to distinguish between the classes F, G, K and M if we look only to that index.

Another remark is that the "O" class stars, being the hottest, present values of the color indices very close to what is the black body radiation patern.

Main sequence r-i vs. g-r color-color diagram

Let's define the function to create the mosaic:

In [30]:
def plot_ri_gr(df, cls, axes):
    x = df['g-r']
    y = df['r-i']
    axes.scatter(x, y, c = colors[cls], s=6, edgecolors='none')
    axes.set_xlim(-1, 2.5)
    axes.set_ylim(-1, 2)
    axes.set_xlabel('g-r')
    axes.set_ylabel('r-i')
    axes.grid()
    axes.set_title('Main sequence: class %s' %cls);
In [31]:
fig = plt.figure(figsize=(10,20))
for i in range(0,7):
    c ='OBAFGKM'
    ax2 = fig.add_subplot(4,2,i+1, axisbg='0.4')
    cls = c[i]
    b = df['subclass'].map(lambda s: s[0] ==cls)
    plot_ri_gr(df[b],cls=cls,axes=ax2)
    T1 = 2000 * pq.kelvin
    T2 = 15000 * pq.kelvin
    GmenosR_1 = get_GR(T1)
    RmenosI_1 = get_RI(T1)
    GmenosR_2 = get_GR(T2)
    RmenosI_2 = get_RI(T2)

    ax2.plot([GmenosR_1, GmenosR_2], [RmenosI_1, RmenosI_2], c='k', ls='--')

In this case I find it interesting to note that the center of the locus of each spectral class is moving along the line of the blackbody from the area of higher temperature to the lower temperature region. Again the class O" stars exhibit color index paterns very close to a perfect blackbody.

I will end here this third entry on the color of stars and color-color diagrams. Until the next post ...