Donate to support Ukraine's independence.

24 May'11

Scrapy 0.12 Parsing with python

Based on Scrapy Tutorial (dead link: doc.scrapy.org/intro/tutorial.html)

  1. Install scrapy and dependencies
sudo apt-get install python-lxml
sudo easy_install -U Scrapy
  1. Create project
scrapy startproject dmoz
  1. Create item models
from scrapy.item import Item, Field

class DmozItem(Item):
    title = Field()
    link = Field()
    desc = Field()
  1. Create spiders (in projname/spiders/)
from scrapy.spider import BaseSpider
from scrapy.selector import HtmlXPathSelector
from dmoz.items import DmozItem

class DmozSpider(BaseSpider):
    name = "dmoz.org"
    allowed_domains = ["dmoz.org"]
    start_urls = [
    "http://www.dmoz.org/Computers/Programming/Languages/Python/Books/",
    "http://www.dmoz.org/Computers/Programming/Languages/Python/Resources/"
    ]

    def parse(self, response):
        hxs …

Continue reading

24 May'11

BeautifulSoup Parsing

1. Import dependencies

import urllib
from BeautifulSoup import BeautifulSoup

2. Settings

site = "http://****.in.ua"
base = site + "/url/****.html"
parse_urls = ["?page=1",]
parsed = []
urls = []

3. prepopulate urls bank with paging

def parser(fun):
    element = parse_urls.pop()
    parsed.append(element)
    page = urllib.urlopen(base + element)
    soup = BeautifulSoup(page.read())
    for topic in soup.findAll(True, 'right_block'):
        urls.append(topic.p.a["href"])
    for link in soup.find(id="page_list").findAll('li'):
        if (link.a["href"] not in parse_urls and link.a["href"] not in parsed):
            parse_urls.append(link.a["href"])

while(len(parse_urls) != 0):
    parser(blog_parse)

4. Parse

pages = []
for url …

Continue reading

24 May'11

Installing MongoDB 1.8.1 on Ubuntu 11.04 and PyMongo

Install everything you need:

sudo apt-key adv --keyserver keyserver.ubuntu.com --recv 7F0CEB10
sudo nano /etc/apt/sources.list

Next, add a line to sources.list:

  • on Ubuntu
deb http://downloads-distro.mongodb.org/repo/ubuntu-upstart dist 10gen
  • on Debian
deb http://downloads-distro.mongodb.org/repo/debian-sysvinit dist 10gen
sudo apt-get update
sudo apt-get install mongodb-10gen
sudo apt-get install python-setuptools
sudo easy_install pymongo

Next, test the connection:

from pymongo.connection import Connection
from pymongo import ASCENDING
connection = Connection("localhost", 27017)
db = connection.test
db.my_collection.save({"x": 10})
db.my_collection.save({"x": 10, "y": "good"})
for item in db.my_collection.find …

Continue reading

05 Apr'11

Ruby on Rails 3 installation on Debian 6 Squeeze

apt-get install libsqlite3-dev curl git build-essential zlib1g-dev libssl-dev
bash << ( curl http://rvm.beginrescueend.com/releases/rvm-install-head )
if [[ -s "$HOME/.rvm/scripts/rvm" ]] ; then source "$HOME/.rvm/scripts/rvm" ; fi
rvm install 1.9.2
rvm --default ruby-1.9.2
gem install rails
rails new testapp

в $HOME/.profile

export PATH=$PATH:/var/lib/gems/1.8/bin

comment out sqlite deps in Gemfile

then run rails server: rails s

UPD 02.03.2012 obsolete replacement:

Package libreadline5-dev is not available, but is referred to by another package.
This may mean that the package is missing, has been obsoleted, or
is …

Continue reading

02 Apr'11

April Fools Prank on Squid

https://help.ubuntu.com/community/Upside-Down-TernetHowTo

Continue reading

19 Mar'11

SSH passwordless login

On your machine

local $ ssh-keygen -t rsa

do not enter any keys, just hit Enter Enter. Next,

local $ scp ~/.ssh/id_rsa.pub
root@ipaddress:/root/.ssh/authorized_keys
local $ rm ~/.ssh/id_rsa.pub
local $ ssh root@ipaddress

that should guide you directly to the command prompt

UPD 20.04.2014: There is a great Linux program ssh-copy-id that does exactly everything mentioned above, but in one line instead!

Continue reading

18 Mar'11

Django nginx Debian

How to make a simple install of django onto small Debian-6 VPS.
I’ll stick with flup, which enables python to serve fastcgi and some
other protocols.
I use it in conjunction with nginx, which in turn is used save memory.

Literature:

http://library.linode.com/using-linux/administration-basics#system_diagnostics
http://docs.djangoproject.com/en/dev/howto/deployment/fastcgi/

http://www.mindinmotion.ru/post/django-postgresql-nginx-on-debian-server

1 upgrade the system

apt-get upgrade

2 install required dependencies

apt-get install nginx-light postgresql python-django python-psycopg2 python-flup python-imaging

3 configure nginx

you may want to use emacs, vim, or nano. in case of last - you should …

Continue reading

18 Mar'11

nano configuration

in ~/.nanorc you may add

set tabsize 3
set autoindent

if you want to simplify config files editing

P.S. be careful with python and tabs:

Continue reading

10 Mar'11

How to capitalize a word in C#

This can be easily done via TextInfo class:

name = CultureInfo.CurrentCulture.TextInfo.ToTitleCase(name));

Additionally, LINQ can greatly help iterating over a collection:

lst.ForEach(ci => ci.Name = CultureInfo.CurrentCulture.TextInfo.ToTitleCase(ci.Name));

Continue reading

Category: 

25 Feb'11

Windows Server 2008 configuration

You may want to disable auto-start of both Initial Configuration and Server manager for some time, and then - to reapply them again. The first one is extremely useful in case of desktop system env.

Open registry at HKLM\Software\Microsoft\ServerManager and change the value of DoNotOpenServerManagerAtLogon key from 0 to 1.

To enable Initial configuration, run oobe.

Continue reading

← Previous Next → Page 9 of 10