I want to scrape product data with owner details.
eBay is an incredibly famous and popular marketplace. Very often, it is used by small sellers to sell goods, the same way as on Amazon. Therefore, the data from it can be used to assess a trading niche when entering a market with a new product. The eBay site is quite simple, does not use JavaScript to display pages, and there should be no technical problems for scraping it. However, you need to know that there is a limit to the number of listings eBay shows per query. Therefore, if you want to collect all the results, you will have to build your queries in such a way that the search query or filtering returns as many listings as would not exceed the limit. Let's say we want to sell e-readers. Let's try to find the category we need and configure the filters. eBay has the Tablets & eReaders category. That is what we need as a starting point. Open this page in Google Chrome
However, tablets are also shown in this category, and we are only interested in e-readers. To do this, we need to configure the filter for the type of product. Unfortunately, there is no such filter in the main set, so you need to click on the “More Filters” button, which will open a window with a list of all filters.
There we need to find the “Type” filter, select the “e Reader” option in it and click on the “Apply” button.
We see that there are significantly fewer products, but we are only interested in new devices. Therefore, we need to select the “New” option in the “Condition” filter.
We are not interested in auctions either, so we will select the “Buy It Now” option above the listings block.
Now, to save on page requests, increase the number of displayed results on the page. By default, eBay displays 50 listings. We can choose 200, which will save our costs of going through the entire catalog of the filtered category by 4 times. To do this, under the block with the results, you need to select the number of results to show: 200.
https://www.ebay.com/sch/Tablets-eBook-Readers/171485/i.html?_dcat=171485&_fsrp=1&_sacat=171485&rt=nc&Type=eBook%2520Reader&LH_ItemCondition=1000&LH_BIN=1&_ipg=200
This will be our starting URL.
Next, we need to disable JS on the page. We will do this, as usual, using the extension for Google Chrome: Quick JavaScript Switcher. Next, open the developer tools in Google Chrome by pressing Ctrl + Shift + I. Then, using the tool to select elements, we will find the blocks we need on the page and CSS selectors for them. Firstly, we are interested in the listing block, and secondly, the link to the next page.
First, let us collect all the blocks with listings, that is, define a CSS selector for them.
CSS selector: ul> li.s-item. To check, we will do a search in the “Elements” section. And make sure that the selector selects all the listings. We will see that more than 200 elements have been selected, although there should be 200 exactly. This happened because eBay, in addition to regular listings, also shows us Sponsored ads.
Filtering them out will not be easy, but we will try to do it. Click on one of the letters in the word “SPONSORED” and see which element opens in the “Elements” window. We will see a set of span elements containing a random set of characters.
We see that these elements have different classes. It is quite clear that “SPONSORED” is shown on the page since one of the classes is shown, and the second is not. But we can’t just take the class we need as a constant. Because, judging by the name of the class, it is not static, but dynamically generated. Therefore, we cannot rely on his name. However, if one of the classes is shown, and the second is not, somewhere on the page there should be CSS that sets this rule.
span.s-m1yuhh { display: inline; } span.s-o2xlx7k { display: none; }
Our task now is to construct a selector for it and check it in the “Elements” window, making sure that the selector selects only 1 element on the page.
style:contains – such a selector will work just fine for our task.
Now that we have found the desired element, we need to pull out the class that is shown from there. To do this, in the parse command we will use the filter option.
span\.
Using this regular expression, we will extract the class name: s-m1yuhh.
Now that we have a class, we can go into the span element and remove all span elements with a class other than our class. Then you need to use the parse command and compare the result in the register with the string “SPONSORED” using the if command.
This way we can skip commercial listings.
Besides if you do have any questions give me a call: https://clarity.fm/joy-brotonath