Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance improvement? #86

Open
Shooter3k opened this issue Oct 13, 2021 · 9 comments
Open

Performance improvement? #86

Shooter3k opened this issue Oct 13, 2021 · 9 comments

Comments

@Shooter3k
Copy link

Is there any way to improve the performance? When using -d 2 or higher, it seems to balloon in crawl times, that take days to run and always end with killing the task.

I'm not 100% sure what it's doing, but perhaps a way for it to make incremental check-ins to the output file or a current progress (of what it thinks so far) might solve the issue? or perhaps the index just gets too big for it to process efficiently?

In any case, I'm looking forward to what people suggest

@digininja
Copy link
Owner

digininja commented Oct 13, 2021 via email

@Shooter3k
Copy link
Author

I had a couple of thoughts about this that I'd like to throw out there.....

  1. I use an application called "Screaming Frog SEO Spider" and it has a progress bar that basically shows how much it's crawled out of how much the spider has found so far. So, even in a single thread, the progress bar is constantly bouncing around a bit but it's basically showing how much it's crawled out of how much the spider has found. This is especially nice because you can judge (roughly) where things are going. In other words, if the spider finds 10,000 pages in 2 seconds and then 30,000 in 6 seconds, then you know it's going to take a really long time. Where as if the spider finds 6 pages in 2 seconds and then 12 pages in 6 seconds, you know it's probably not going to take very long. Hopefully that makes sense but here is a little screenshot of what their progress bar looks like after just a few seconds of starting a crawl.....so you can sort of guess it's probably going to take 'a long time'.
    image

  2. the second thought would be to add an optional parameter and have the spider dump the spider results to a file instead of indexing/crawling them. That would then give the user the option to crawl them individually (likely running cewl multiple times) on their own. If you really wanted to go the extra mile, you could also add an option for cewl to be able to crawl the results from the created file at a later time

Overall, (IMO) any option that provides some sort of arbitrary progress or 'I'm still running and this is how much I've done so far and this is how much I think I still need to do' would be helpful. Right now, using -v or --debug is the only way to validate it's still crawling and not hung up somewhere.

@digininja
Copy link
Owner

digininja commented Oct 13, 2021 via email

@Shooter3k
Copy link
Author

Well, any options you're willing to do would be greatly appreciated. I love the app and use it a lot

@w4po
Copy link

w4po commented Oct 14, 2023

Hello @digininja ,
I am trying to scrape the Ironman website to solve the last challenge of Cracking JWT keys (Obscure).

But the Cewl tool is really slow. In fact, it is just sitting there without making any requests, then after an hour or so it continues for a bit and then goes idle repeatedly.

I had to hibernate my PC twice instead of shutting it down to keep the tool working.

I am using the latest version CeWL 6.1 (Max Length) On Windows 11.

I had to use a proxy to monitor the work as there is no indication of work at all in the tool itself (It would be Cewl to show any kind of progress).

Command:

Command

Task Manager:

Task Manager

Proxy:

Proxy

Thanks for the Awesome Auth lab challenges.

@digininja
Copy link
Owner

digininja commented Oct 14, 2023 via email

@w4po
Copy link

w4po commented Oct 14, 2023

The same thing in WSL 2.0 Ubuntu,
It's extremely slow, I think it starts with 1 or 2 requests per second, then the more requests it gathers the slower it becomes,
It's now doing ~1 request every 30 minutes or something like that.

Maybe it's doing some comparison of the new words with the old ones to handle duplicates?

I've never used it on Windows do don't know the base performance levels but it shouldn't be that slow. I'll see if I can give it a run against the site later and see what speed I get.

@digininja
Copy link
Owner

digininja commented Oct 14, 2023 via email

@w4po
Copy link

w4po commented Oct 16, 2023

It might be an issue with my system, even though I have a reasonably good one.

I've conducted some additional tests on https://www.ironman.com.
Initially, it maintains a rate of 1 request per second,
But after around 110 requests, it begins to slow down.
By the time it reaches approximately 200 requests, the rate drops to about half a request per second.

I also tested it on your site, https://digi.ninja/.
Initially, it starts with a rate of 3 requests per second for the first 10 requests.
After that, there's a pause where it doesn't make any requests for a few seconds, and instead, it prints "Offsite link, not following:..."

During this "Offsite link, not following:..." phase,
I attempted to stop the process using CTRL + C.
It took a few seconds to stop, even though it wasn't actively making requests,
just printing the message. This happened after only 10 requests,
so it shouldn't have accumulated a significant amount of data (only 2500 lines).

So I think the bottleneck is somewhere in the checking phase.

PS: I'm struggling with the JWT cracking Obscure level. Can you provide any hints?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants