You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
Waltton Morais 2f09941613 Fix golint issue 3 months ago
.github Added .github/ISSUE_TEMPLATE.md (optional) 3 years ago
_examples Update openedx_courses.go 8 months ago
cmd/colly Update go.mod github.com/gocolly/colly/v2 1 year ago
debug Fix variable names in js and lock during Marshal 2 years ago
extensions Fix golint issue 3 months ago
proxy [fix] tyop 1 year ago
queue feat: support Stop() in queue 1 year ago
storage [mod] simplify the cookie layer in storage interface 3 years ago
.codecov.yml turn off codecov comments 3 years ago
.travis.yml [enh] update go versions for travis tests 1 year ago
CHANGELOG.md [doc] add changelog - closes #509 1 year ago
CONTRIBUTING.md Update CONTRIBUTING.md 3 years ago
LICENSE.txt [enh] add request & response callbacks ++ cookie handling ++ readme 4 years ago
README.md add quote crawler using colly 5 months ago
VERSION [enh] bump version to 2.1.0 1 year ago
colly.go Fix default User-Agent when using custom headers 6 months ago
colly_test.go Merge pull request #588 from moritamori/adding-tests-for-onxml 6 months ago
context.go [mod] add license header 3 years ago
context_test.go [mod] add license header 3 years ago
go.mod Merge branch 'master' into adding-tests-for-onxml 7 months ago
go.sum Merge branch 'master' into adding-tests-for-onxml 7 months ago
htmlelement.go added ChildTexts method to htmlelement: returns the stripped text content of all the matching elements in a []string 1 year ago
http_backend.go Add httpBackend cache response callback 8 months ago
http_trace.go Adds HTTP Tracing to colly requests that are accessible from the colly Response. 1 year ago
http_trace_test.go Adds HTTP Tracing to colly requests that are accessible from the colly Response. 1 year ago
request.go [fix] do not repeat cookies on request retry - fixes #362 1 year ago
response.go Improve the docstring for Response.Trace. 1 year ago
unmarshal.go Add map unmarshal 2 years ago
unmarshal_test.go Add map unmarshal 2 years ago
xmlelement.go Change type assertion 7 months ago
xmlelement_test.go Update go.mod github.com/gocolly/colly/v2 1 year ago

README.md

Colly

Lightning Fast and Elegant Scraping Framework for Gophers

Colly provides a clean interface to write any kind of crawler/scraper/spider.

With Colly you can easily extract structured data from websites, which can be used for a wide range of applications, like data mining, data processing or archiving.

GoDoc Backers on Open Collective Sponsors on Open Collective build status report card view examples Code Coverage FOSSA Status Twitter URL

Features

  • Clean API
  • Fast (>1k request/sec on a single core)
  • Manages request delays and maximum concurrency per domain
  • Automatic cookie and session handling
  • Sync/async/parallel scraping
  • Caching
  • Automatic encoding of non-unicode responses
  • Robots.txt support
  • Distributed scraping
  • Configuration via environment variables
  • Extensions

Example

func main() {
	c := colly.NewCollector()

	// Find and visit all links
	c.OnHTML("a[href]", func(e *colly.HTMLElement) {
		e.Request.Visit(e.Attr("href"))
	})

	c.OnRequest(func(r *colly.Request) {
		fmt.Println("Visiting", r.URL)
	})

	c.Visit("http://go-colly.org/")
}

See examples folder for more detailed examples.

Installation

Add colly to your go.mod file:

module github.com/x/y

go 1.14

require (
        github.com/gocolly/colly/v2 latest
)

Bugs

Bugs or suggestions? Visit the issue tracker or join #colly on freenode

Other Projects Using Colly

Below is a list of public, open source projects that use Colly:

If you are using Colly in a project please send a pull request to add it to the list.

Contributors

This project exists thanks to all the people who contribute. [Contribute].

Backers

Thank you to all our backers! 🙏 [Become a backer]

Sponsors

Support this project by becoming a sponsor. Your logo will show up here with a link to your website. [Become a sponsor]

License

FOSSA Status