Much more difficult is recognizing unsaturated colors. Everything in most scenes -- except for a few red sports cars or lemon yellow chick cars and the occasional fire engine or school bus -- why do you think they chose those colors? -- everything is off-gray or off-white, or dirty black (or shiny black, reflecting the scene around it, with the result that it looks the same as dirty black) which comes off as dark gray inside the camera.
So if we restrict our first attempt to saturated colors, a blob of color 4-5 pixels square is sufficient to distinguish it from digital noise and small things like flowers and birds.
Assuming a fixed-focus lens (so pedestrian size in the image can give a reliable distance estimate, if we so choose) of normal field of view -- perhaps the equivalent of 50mm lens on a 35mm camera, it is not hard to calculate that a six-foot (2m) pedestrian at 50 meters would be 2mm on the film of that 35mm camera, and correspondingly 0.7mm on a 1/3" (8mm) standard C-mount video camera sensor chip. His shirt is a little less than half that, in round numbers 0.3mm or 300 microns. If the 8mm-diagonal chip resolves 320x240 color pixels (so the pixel size is about 20 microns square), that shirt image on the sensor chip is about 15 pixels square at 150 feet (50m).
Why 50 meters? According to the Oregon State Driver's instruction booklet,
a car going 20mph (=10m/s, the standard downtown speed limit in Oregon)
needs 65 feet (20m) to stop, so detecting him at 50m gives a 2x margin
of safety. That's 15 pixels square; at 150m he is only 5 pixels square,
which we guessed is the minimum for detection, another 3x margin of safety.
A car driving the nominal Oregon downtown speed limit (10m/s), at a frame rate of 10fps, moves one meter every second, which gives you 20 seconds to get the car stopped in the distance we are told we need to do so.
The numbers are credible and consistent.
The PointGrey (now a division of FLIR) Firefly (320x240) and Chameleon (640x480) cameras both work with their FlyCapture2 API and driver software on Windows10 (and also on Linux). The API is defined for C/C++/C# but not Java. I wrote a Java wrapper class to encapsulate the API calls necessary to start the camera and capture frames at 15fps (or 30fps, if you can handle the data rate). This has been tested and works reasonably well at 15fps on a 2.4GHz Win10 computer with ample time for processing 320x240 images in Java. The (wrapper class + DLL + test code) you can download the zip file here. If you (meaning your browser) know the secret password, it's also available on GitHub.
cameras have encoding firmware for a variety of popular image compression
formats, but my wrapper class delivers the data in the native raw (unprocessed)
Bayer8 encoding, where each color pixel must be extracted from four (non-contiguous)
sensor data bytes. There is example code included with the wrapper class
code. You can also look at their
API information directly, to better understand the wrapper class.
Some additional (longer) clips, all in my By8 format (see downloads above) of a single person in a bright shirt, walking across the street:
Carol4x.zip, Carol2x.zip, and Carol3x.zip
Any questions or comments? This is your project.
Next time: Software Components
Rev. 2017 July 12