In this section I hope to maintain a list of useful links for Python data diggers. If you have useful links to add ping @4u2c or @Pmason and I will add them to the listing. Please advise me of any broken links thanks! Unless otherwise stated it is for Python 3.4 or later.
Any reference to a commercial site is NOT a endorsement of that site and there are no hidden agendas here - just someone put forth the link in good faith and I present it here - most of these I have no personal experience with unless noted!
In order of increasing complexity and completeness
The Python Tutorial This tutorial introduces the reader informally to the basic concepts and features of the Python language and system. [This is my normal staring point for most questions].
The Python Standard Library This library reference manual describes the standard library that is distributed with Python. It also describes some of the optional components that are commonly included in Python distributions.
The Python Language Reference This reference manual describes the syntax and “core semantics” of the language. It is terse, [an understatement if I have ever heard one] but attempts to be exact and complete.
Python 2.7 Quick Reference Though for the earlier version of Python this has a handy reference for Environmental variables and the basic stuff. If anyone is aware of a Python 3 version of this I will add it as well!
A Byte of Python Worth working through all the examples. Plus useful info for getting started with a working interpreter. This where I started, Pmason.
PyCharm Edu This is a free package Python teaching aid and a easy to use editor for developing and running code. You can work through the tutorials or skip that and use this as a complete IDE. It has a package handler that helps find and load packages - most of which will work for Windows which is saying something.... If you need help I am most familiar with this editor, Pmason.
Book - Print "Statistics, Data Mining and Machine Learning in Astronomy - A practical Python Guide for the Analysis of Survey Data", by Zeijko Ivezic & colleagues.
zooniverse/Data-digging
This is the main Python code repository for zooniverse. Somewhat ramshackle as far as indexing since it is created by merging many individual's repositories and changes frequently. Rely on the readme's in each directory under "example script" to find what you need.
Panoptes Client’s documentation All the goods on Panoptes Client, in a fairly terse format. You will likely need to look at working examples in the Git above to really get a handle on this. The Git for the Client is here.
Panoptes API Much of the detail you need to really use the Client is in here. Not for the faint of heart.
zooniverse/panoptes-cli The starting point for the Command Line Interface.
panoptes_aggregation General purpose aggregation tool for projects built on the Zooniverse. This packages provides command line tools for processing a project's data dump files.
Windows packages Getting some packages to load in Windows can be a challenge. This site might save your bacon... Alternately I have had excellent results using PyCharm to load packages on a Windows platform.
Matplotlib This is the starting point for merging your subject images and data, or just pretty plots in general...
Notes for Nature This is a repeat of the link in the zooniverse datadigging repository but this should be your first point of call for any transcription reconciliation needs. While it is somewhat specialized for museum label type text from nfn's zooniverse downloads, it can be set up to run any csv transcription download. It is easy to trick it into putting attention on to specific data formats ( eg all floating point numbers, or columnar data of a fixed format by a little judicious pre-formating of the data before you run reconcile.
py.
Reconcile-Editor for working with NfN's reconcile.
py. This script provides a GUI for editing and correcting the reconciled results for transcriptions and simple question tasks. The Git repository and readme is here and a description of what it does is in this comment
Pmason's building blocks This is a repository of some of the more useful scripts I have written for the over 60 projects I have worked with. Many projects use minor modifications to the scripts here. I will support this effort as long as I can so if you see something here that almost does what you need contact @Pmason by DM
Jean Tate - 'FIRST contour overlay on SDSS backgrounds Using Python to create overlays. Many of the details of obtaining and fitting the images and creating the contours are specific to RGZ but the plots overlay section have general applications to many projects.
Finding exoplanets withPython quite a useful tutorial, on finding an exoplanet in a database
Philip Fowler's pyniverse a Python package to analyse generic User Stats of the Zooniverse volunteers, with neat graphs for classifications against time, users trying the project for the first time, and the cumulative user distribution. This package does require a number of packages to be installed.
In this section I hope to maintain a list of useful links for Python data diggers. If you have useful links to add ping @4u2c or @Pmason and I will add them to the listing. Please advise me of any broken links thanks! Unless otherwise stated it is for Python 3.4 or later.
Any reference to a commercial site is NOT a endorsement of that site and there are no hidden agendas here - just someone put forth the link in good faith and I present it here - most of these I have no personal experience with unless noted!
In order of increasing complexity and completeness
The Python Tutorial This tutorial introduces the reader informally to the basic concepts and features of the Python language and system. [This is my normal staring point for most questions].
The Python Standard Library This library reference manual describes the standard library that is distributed with Python. It also describes some of the optional components that are commonly included in Python distributions.
The Python Language Reference This reference manual describes the syntax and “core semantics” of the language. It is terse, [an understatement if I have ever heard one] but attempts to be exact and complete.
Python 2.7 Quick Reference Though for the earlier version of Python this has a handy reference for Environmental variables and the basic stuff. If anyone is aware of a Python 3 version of this I will add it as well!
A Byte of Python Worth working through all the examples. Plus useful info for getting started with a working interpreter. This where I started, Pmason.
PyCharm Edu This is a free package Python teaching aid and a easy to use editor for developing and running code. You can work through the tutorials or skip that and use this as a complete IDE. It has a package handler that helps find and load packages - most of which will work for Windows which is saying something.... If you need help I am most familiar with this editor, Pmason.
Book - Print "Statistics, Data Mining and Machine Learning in Astronomy - A practical Python Guide for the Analysis of Survey Data", by Zeijko Ivezic & colleagues.
zooniverse/Data-digging
This is the main Python code repository for zooniverse. Somewhat ramshackle as far as indexing since it is created by merging many individual's repositories and changes frequently. Rely on the readme's in each directory under "example script" to find what you need.
Panoptes Client’s documentation All the goods on Panoptes Client, in a fairly terse format. You will likely need to look at working examples in the Git above to really get a handle on this. The Git for the Client is here.
Panoptes API Much of the detail you need to really use the Client is in here. Not for the faint of heart.
zooniverse/panoptes-cli The starting point for the Command Line Interface.
panoptes_aggregation General purpose aggregation tool for projects built on the Zooniverse. This packages provides command line tools for processing a project's data dump files.
Windows packages Getting some packages to load in Windows can be a challenge. This site might save your bacon... Alternately I have had excellent results using PyCharm to load packages on a Windows platform.
Matplotlib This is the starting point for merging your subject images and data, or just pretty plots in general...
Notes for Nature This is a repeat of the link in the zooniverse datadigging repository but this should be your first point of call for any transcription reconciliation needs. While it is somewhat specialized for museum label type text from nfn's zooniverse downloads, it can be set up to run any csv transcription download. It is easy to trick it into putting attention on to specific data formats ( eg all floating point numbers, or columnar data of a fixed format by a little judicious pre-formating of the data before you run reconcile.
py.
Reconcile-Editor for working with NfN's reconcile.
py. This script provides a GUI for editing and correcting the reconciled results for transcriptions and simple question tasks. The Git repository and readme is here and a description of what it does is in this comment
Pmason's building blocks This is a repository of some of the more useful scripts I have written for the over 60 projects I have worked with. Many projects use minor modifications to the scripts here. I will support this effort as long as I can so if you see something here that almost does what you need contact @Pmason by DM
Jean Tate - 'FIRST contour overlay on SDSS backgrounds Using Python to create overlays. Many of the details of obtaining and fitting the images and creating the contours are specific to RGZ but the plots overlay section have general applications to many projects.
Finding exoplanets withPython quite a useful tutorial, on finding an exoplanet in a database
Philip Fowler's pyniverse a Python package to analyse generic User Stats of the Zooniverse volunteers, with neat graphs for classifications against time, users trying the project for the first time, and the cumulative user distribution. This package does require a number of packages to be installed.
10 Participants
16 Comments
The total shown on the home page isn’t a cumulative total of classifications for this week, so the label is a little misleading. It’s actually a rolling total over the past 7 days, so it can go up or down, depending how many classifications you made 7 days ago.
Newer projects, like Planet Hunters TESS, do show a cumulative total for this week on the Classify page. That total resets to 0 every Monday, then increases over the course of the week.
It is confusing that the site uses two different methods of calculating weekly classifications, both with the label This Week.
The total shown on the home page isn’t a cumulative total of classifications for this week, so the label is a little misleading. It’s actually a rolling total over the past 7 days, so it can go up or down, depending how many classifications you made 7 days ago.
Newer projects, like Planet Hunters TESS, do show a cumulative total for this week on the Classify page. That total resets to 0 every Monday, then increases over the course of the week.
It is confusing that the site uses two different methods of calculating weekly classifications, both with the label This Week.
7 Participants
43 Comments
Perhaps you are a project team member and have access to classification data, so you know this, but why assume that classifiers who are logged in make fewer errors than those who are not logged in? Or that classifiers who are not logged in are short term participants?
Also, it seems to me that a "path of errors" created by a short term participant must necessarily be a minor part of the classification data... although if a project has a large number of error-making short term participants, cumulatively they could contribute many errors.
I have seen quite a lot of comments from logged-in participants who clearly have not read and/or not followed the instructions in the field help, tutorial, FAQ, Field Guide. I expect there are many more who aren't sufficiently concerned to submit "how do I classify ...?" or "I answered .., is that correct?" comments, as well as those who aren't logged in so cannot submit comments. But IMO forcing every classifier to log in won't solve the problem of those who make many errors.
Perhaps you are a project team member and have access to classification data, so you know this, but why assume that classifiers who are logged in make fewer errors than those who are not logged in? Or that classifiers who are not logged in are short term participants?
Also, it seems to me that a "path of errors" created by a short term participant must necessarily be a minor part of the classification data... although if a project has a large number of error-making short term participants, cumulatively they could contribute many errors.
I have seen quite a lot of comments from logged-in participants who clearly have not read and/or not followed the instructions in the field help, tutorial, FAQ, Field Guide. I expect there are many more who aren't sufficiently concerned to submit "how do I classify ...?" or "I answered .., is that correct?" comments, as well as those who aren't logged in so cannot submit comments. But IMO forcing every classifier to log in won't solve the problem of those who make many errors.
6 Participants
15 Comments