神刀安全网

Computer Vision Research: The deep “depression”

Well, I am not that old, but I have been involved with computer vision for almost two decades now. I have started publishing papers when about 250 papers were submitted per year to the major and most selective conferences in computer vision (ICCV, CVPR, ECCV). At that time the conference boards were approx 60-80  people and there were 300-400 participants.

Computer vision conferences (even up to 2010) were organized in a number of thematic areas reasonably well represented both in terms of content as well as in terms of approaches. Early vision, grouping/segmentation, motion analysis/tracking, recognition & 3D vision are some examples. Statistics, geometry, optimization were there in almost all of these areas, and one could get a grasp/global view of the field through his participation to such a conference. Coming to the vision field required a reasonable understanding of physics, math, statistics and geometry. Participating to the conference was giving you an exposure to computer vision challenges as well as to approaches.

There were always trends and dominant topics in the field. I guess eighties were all about stereo, nineties were all about continuous methods and segmentation grouping, while the change of the century brought in discrete methods and the refocus of the community to recognition and descriptors. In parallel, machine learning community has stepped in and its recent developments made it to the computer vision field. Having said the above, despite the presence of dominant topics still the field was quite diverse and still alternative ideas could sneak in in almost all sub-domains of computer vision.

Well, I have the impression that this is far from being the case anymore. Research now focusing on using deep learning complex engineering pipelines to address computer vision tasks. 80-90% of the papers that are published in conferences and almost all oral papers do come from this area. There is absolutely nothing wrong on having such papers, and their performance justify definitely their value, however one can question what is the "added" scientific value. Other than a handful number of people doing some fundamental research towards understanding the theoretical concepts of these methods, almost all the community now seems to target the development of more complex pipelines (that most likely cannot be  reproduced based on the elements presented in the paper) which in most of the cases have almost no theoretical reasoning behind that can add 0,1% of performance on a given benchmark. Is this the objective of academic research? Putting in place highly complex engineering models that simply explore computing power and massive annotated data? The community (and I guess all communities) was running after benchmarks and low hanging fruits also in the past but at that time there was an alternative for other directions as well which doesn’t seem to be the case anymore. This is not the case only for conferences but also for funding as well which has as direct consequence the rapid decrease of the research "theoretical depth" in the field or I could state instead research diversity.

It might be simply because deep learning on highly complex, hugely determined in terms of degrees of freedom graphs once endowed with massive amount of annotated data and unthinkable – until very recently – computing power can solve all computer vision problems. If this is the case, well it is simply a matter of time that industry takes over, research in computer vision becomes a marginal academic objective and the field follows the "declining" path of computer graphics (in terms of activity and volume of academic research).

If not though, one can question how computer vision will move to the next level? How from a community where all fresh incoming PhD students have never and most likely will never hear about statistical learning, pattern recognition, euclidean geometry, continuous and discrete optimization, etc. new ideas will emerge. I am a believer of "broad" and rich scientific culture, and I have the impression that this is in the process of disappearing from the field. One can envision two possible interpretations: a highly positive one (we do converge towards the famous David Marr’s theory that assumes that a single computational framework can address visual perception). This will be a great accomplishment since a field that was at 5% accomplishment in 1995 (recall Pr. Thomas Huang presentation at ICPR’95 conference). There is a less positive interpretation though where we are putting all our efforts – while excluding alternatives –  on an area that shows great promises, but still will not be able address on  its own the rich variety of problems in computer vision.

A very good friend mentioned to me once that there are three deep learning stages: denial, doubt, and acceptance/adoption! I guess I navigate on the ocean between the last two stages without a compass.

转载本站任何文章请注明:转载至神刀安全网,谢谢神刀安全网 » Computer Vision Research: The deep “depression”

分享到:更多 ()

评论 抢沙发

  • 昵称 (必填)
  • 邮箱 (必填)
  • 网址