Ticket #6784 (new defect)

Opened 5 years ago

Network.HTTP.Browser hangs on redirect

Reported by: jmdurr@… Owned by: somebody
Priority: major Milestone:
Component: component1 Version:
Keywords: redirects, browser Cc:

Description

While writing a crawler using Network.HTTP.Browser I ran into an issue where during a defaultGetRequest_ hangs on a 301 redirect and never returns.

I have tried changing maxredirects and setting allow redirects to true, neither seem to have an affect.

If I try to enable the debugLog I get another error about the debug log already being locked and the debug log only gets the results of robots.txt

It can be seen on multiple sites, but here is an example output:

Sending: GET /robots.txt HTTP/1.1 Host: en.wikipedia.org User-Agent: LearningForest? Content-Length: 0

Creating new connection to en.wikipedia.org Received: HTTP/1.0 200 OK Date: Mon, 08 Mar 2010 15:36:13 GMT Server: Apache Cache-Control: s-maxage=3600, must-revalidate, max-age=0 X-Article-ID: 19292575 X-Language: en X-Site: wikipedia Last-Modified: Fri, 29 Jan 2010 22:29:50 GMT Vary: Accept-Encoding Content-Length: 26147 Content-Type: text/plain X-Cache: HIT from sq66.wikimedia.org X-Cache-Lookup: HIT from sq66.wikimedia.org:3128 Age: 12 X-Cache: HIT from sq61.wikimedia.org X-Cache-Lookup: HIT from sq61.wikimedia.org:80 Connection: close

"checking url: http://en.wikipedia.org" "trying request" Sending: GET / HTTP/1.1 Host: en.wikipedia.org User-Agent: LearningForest? Content-Length: 0

Recovering connection to en.wikipedia.org Sending: GET / HTTP/1.1 Host: en.wikipedia.org User-Agent: LearningForest? Content-Length: 0

Creating new connection to en.wikipedia.org Received: HTTP/1.0 301 Moved Permanently Date: Mon, 08 Mar 2010 15:36:25 GMT Server: Apache Cache-Control: s-maxage=1200, must-revalidate, max-age=0 Vary: Accept-Encoding,Cookie Last-Modified: Mon, 08 Mar 2010 15:36:25 GMT Location: http://en.wikipedia.org/wiki/Main_Page Content-Length: 0 Content-Type: text/html; charset=utf-8 Age: 1 X-Cache: HIT from sq61.wikimedia.org X-Cache-Lookup: HIT from sq61.wikimedia.org:3128 X-Cache: MISS from sq65.wikimedia.org X-Cache-Lookup: MISS from sq65.wikimedia.org:80 Connection: close

301 - redirect using GET Redirecting to http://en.wikipedia.org/wiki/Main_Page ... Sending: GET /wiki/Main_Page HTTP/1.1 Host: en.wikipedia.org User-Agent: LearningForest? Content-Length: 0

Recovering connection to en.wikipedia.org

Note: See TracTickets for help on using tickets.