Reviewer is absolutely certain Please summarize your review in sentences This paper proposes an alternative approach to the object detection problem. Also, it uses Euclidean loss for probabilities. Any benefit from context rescoring would be orthogonal. We thank the reviewers for their insightful comments and encouraging remarks. I have 2 main concerns regarding this paper: Motivations, methods, and results all make sense.

To alleviate concerns on originality, we emphasize that YOLO is among the first to model object detection as an end-to-end, unified process. One note is that YOLO has the main advantage in its efficiency. Good paper, accept 3: An OK paper, but not good enough Confidence 4: I think adding more analysis on the gird size can be helpful for better understanding this algorithm. But visual search has weaknesses. However the similarity ends there. I think it is important to see how sensitive is the algorithm to these tricks. Some reviewers liked the paper, some voted for rejection, which made this manuscript a borderline paper. I did't understand why Eq 2 uses square root. We are trying to explore and demonstrate to the vision community that there are alternatives to sliding-window or region proposal methods for object detection. The reviewers, however, agreed that this is an interesting research direction, and I do encourage the authors to continue this work. NIPS audience is interested in fast object detection algorithms. It cannot predict full bounding boxes, only adjust them. Comments to author s. Since two of the long reviews raised some important concerns about the paper, and this year we got many very good submissions, therefore I cannot recommend this paper for publication at NIPS. There is no denying the fact India has a lot to offer for its travelers but one is perplexed as to what to visit and what to forgo. Unfortunately the key idea of this paper has already been proposed by [20], which also divides the image into a regular grid and predicts a bounding-box for each, in a single CNN pass. This is not mentioned in the submission, nor discuss. In this manner, multiple objects can be detected in the same image, without region proposals and also without resorting to simplistic regression from the whole image to a bounding-box. The table 2 should add one more row on this fused one. One criticism regards references. In contrast to state-of-the-art systems such as R-CNN, the proposed technique does not rely on region proposals and instead detects objects with a single pass through the CNN.

