Zooniverse Talk

Your search returned 3 results.

Citizen science in Africa and South America

Well, maybe you are missing out .... because, Brazil has even more results than we do: currently 1,970,000 resultados from projetos de ciência cidadã https://www.google.com.br/?gws_rd=ssl#q=projetos+de+ciência+cidadã It might take some elbow-grease but if you go country for country, Google in their language, there might be some awesome surprises and results

Edit:
Weird, Google changes the results count everytime the page is accessed anew. There are now 2,190,000 resultados for Brazil ...

Citizen science in Africa and South America

Edit:
Weird, Google changes the results count everytime the page is accessed anew. There are now 2,190,000 resultados for Brazil ...

9 years ago Well, maybe you are missing out .... because, Brazil has even more results th...

2 Participants

6 Comments

Two and out?

There are certainly cases where a real time reconciliation of free transcription fields as the volunteers complete each classification could be used to eliminate a significant number of classifications. However this is not yet practical - zooniverse has a limited ability to compare volunteer responses in near real time using something called caesar, and take various actions based on the result (such as retire a subject and/or add it to a second subject set used in a different workflow), but the response comparison does not currently allow a lot of processing or deal with minor differences so common in transcription.

It is possible to export the classifications on a 24 hour basis look at reconciliation of the results at have hit some limit in the last day but this requires significant management, and becomes very difficult as the the workflow completes.
There is also the danger that while two transcriptions agree to some level, BOTH may be wrong - Note this is not made any better with a retirement of three - the two matching but incorrect text strings are still accepted as good, but setting the retirement higher does reduce this, and is the reason we chose 5 in some phases. It is a double edged blade - more transcriptions increases the computational effort, allows more opportunity for variation, and if the reconciliation algorithm tries to keep additional text when presented with two versions one of which has added text (as does reconcile.py), then many of the final reconciled texts will have stuff that may not be valid.

One problem is the normally transcriptions of several fields are combined in the same workflow. To retire a subject would require that all the fields have been transcribed and successfully reconciled ( ie exact match or "close enough" with only small differences in spacing or punctuation that are judged to be acceptable)

We have considered this for WWI burial cards... not so much as for reducing classifications where there were good matches, but for those cases where after three classifications there was NOT a match. For a number of reasons, so far we have proceeded in the normal pattern - setting a fairly low retirement level (generally 3, occasionally five), completing the workflow then pulling out remaining issues and feeding those back into secondary workflows or private review for resolution.

So far we have only run a few of the verification and resolution workflows, and as we have proceeded through the various phases (around twelve so far) we are building a rather daunting pile of work that remains - Below is the summary for one phase of the project - the Emergency address section. This phase is fairly typical, except in the "other" field which I am still trying to sort out (most of the "other" single transcripts are bits found somewhere else in the other volunteer's transcriptions). As you can see if all the single transcript and no match fields remain to be resolved, there is a significant amount of work that still needs to be done:

Reconciliation Summary
Reconciled
Field	Type	Unanimous Matches	Majority Matches	Mean Mode Range	Fuzzy Matches	All Blank	One Transcript	Total	No Matches
fullname	text	42,666	29,528		1,215	1,179	40	78,442	3
address	text	40,475	30,707		1,494	1,238	41	78,440	5
other	text	3,708	5,350		2,224	61,181	3,773	78,293	152
notified_raw	text	47,525	20,350		1,172	6,411	190	78,435	10
notified_regex	text	53,540	15,276		1,772	6,407	194	78,434	11
sketch	text	31,135	1,674		22	45,178	412	78,445	0
photo	text	57,848	5,855		81	14,204	289	78,443	2

Almost all one transcript all no matches indicate a transcription in error of some sort - often simply things put in the wrong place, or information from some other area mistakenly transcribed into a field, but what has become fairly obvious is that, for many cases, there is often a issue with the card itself - for WWI burial cards issues include - erasures, information out of place on the card, various typographical errors (corrected or not by volunteers), and odd punctuation/spacing/format/shortforms. To even resolve some issues requires the card to located and viewed, and some editorial authority, and is not something that will be easy to set up in a workflow for volunteers to resolve..

Two and out?

Reconciliation Summary
Reconciled
Field	Type	Unanimous Matches	Majority Matches	Mean Mode Range	Fuzzy Matches	All Blank	One Transcript	Total	No Matches
fullname	text	42,666	29,528		1,215	1,179	40	78,442	3
address	text	40,475	30,707		1,494	1,238	41	78,440	5
other	text	3,708	5,350		2,224	61,181	3,773	78,293	152
notified_raw	text	47,525	20,350		1,172	6,411	190	78,435	10
notified_regex	text	53,540	15,276		1,772	6,407	194	78,434	11
sketch	text	31,135	1,674		22	45,178	412	78,445	0
photo	text	57,848	5,855		81	14,204	289	78,443	2

5 years ago There are certainly cases where a real time reconciliation of free transcript...

6 Participants

13 Comments

Common errors when uploading through CLI

I've run into something like this when using the Panoptes JS client, in NodeJS, to make hundreds of API requests, so maybe this isn't a Python problem? In these cases, retrying a failed request always succeeds.

In the logs for a job that reads ~11,000 subjects from ~190 subject sets (one request per subject set, I think), I see four failed requests. These are requests where www.zooniverse.org either timed out or dropped the connection. In each case, retrying the failed request once succeeds.

#30 215.0 retrying /subjects?subject_set_id=98241&page_size=100&page=1, attempt: 1
#30 215.0 retrying /subjects?subject_set_id=98908&page_size=100&page=1, attempt: 1
#30 217.3 { id: '98908', subjects: 43 }
#30 217.4 { id: '98241', subjects: 94 }
#30 285.7 retrying /subjects?subject_set_id=98904&page_size=100&page=1, attempt: 1
#30 285.9 retrying /subjects?subject_set_id=111058&page_size=100&page=1, attempt: 1
#30 287.8 { id: '98904', subjects: 41 }
#30 288.2 { id: '111058', subjects: 36 }

Common errors when uploading through CLI

#30 215.0 retrying /subjects?subject_set_id=98241&page_size=100&page=1, attempt: 1
#30 215.0 retrying /subjects?subject_set_id=98908&page_size=100&page=1, attempt: 1
#30 217.3 { id: '98908', subjects: 43 }
#30 217.4 { id: '98241', subjects: 94 }
#30 285.7 retrying /subjects?subject_set_id=98904&page_size=100&page=1, attempt: 1
#30 285.9 retrying /subjects?subject_set_id=111058&page_size=100&page=1, attempt: 1
#30 287.8 { id: '98904', subjects: 41 }
#30 288.2 { id: '111058', subjects: 36 }

a year ago I've run into something like this when using the Panoptes JS client, in NodeJ...

5 Participants

58 Comments

Page of 1

Talk is a place for Zooniverse volunteers and researchers to discuss their projects, collect and share data, and work together to make new discoveries.

Zooniverse Talk

Citizen science in Africa and South America

Two and out?

Common errors when uploading through CLI

0 Active Participants:

Projects: