Freelance Client Project

Automated Address Collection

Powered by Spring Batch, Rundeck, and Google Maps API


This solution leverages backend automation and scheduled orchestration to process business addresses and generate static maps at scale.

Project Overview

A client-focused tool that automates the collection and enrichment of business address data using modern backend technologies and scheduled job orchestration.

Technologies Used

  • Spring Batch: Batch processing of business data
  • Rundeck: Job scheduling and management
  • MySQL: Persistent storage of addresses
  • Google Maps API: Address geocoding and location info
  • Amazon S3: Storage for generated map images
  • Web Scraper: Pulls data from Yelp and YellowPages

Project Sources

  • Yelp business listings
  • YellowPages search results
  • Google Maps API lookups

Purpose

This tool streamlines business address discovery and enrichment for marketing, analysis, or integration with client workflows—demonstrating scalable backend automation.

Automated Business Address Processing with Spring Batch

This solution uses Spring Batch to orchestrate the enrichment and transformation of business addresses into downloadable map assets. Below are the building blocks and code examples that make this system scalable and maintainable.

1. Job Configuration

Purpose: Defines the orchestration and flow of steps.

return jobBuilderFactory.get("addressJob")
  .start(partitionedStep())
  .next(finalizeStep())
  .build();

2. Smart Partitioning for Scalable Processing

RangePartitioner dynamically divides work based on the job quota and last processed ID, ensuring even distribution:

int chunkSize = quota / gridSize + (quota % gridSize > 0 ? 1 : 0);
for (int i = 0; i < gridSize; i++) {
  int fromId = lastProcessedKey + (i * chunkSize) + 1;
  int toId = Math.min(lastProcessedKey + ((i + 1) * chunkSize), lastProcessedKey + quota);
  context.put("fromId", fromId);
  context.put("toId", toId);
}

Enables parallelized steps to fetch and process business entries in defined ID ranges.

3. ItemReader

Purpose: Reads business records by ID range.

JdbcPagingItemReader<StaticMapStore> reader = new JdbcPagingItemReader<>();
reader.setQueryProvider(...);

4. Geolocation Transformation with Static Maps

In StaticMapItemProcessor, a latitude and longitude pair is translated into a ready-to-download image request:

GoogleInfo meta = new GoogleMetadataBuilder()
  .setAccessKey(accessKey)
  .setGoogleServiceEndpoint(serviceEndpoint)
  .setPlaceholder(Map.of("{longitude}", store.getLng(), "{latitude}", store.getLat()))
  .execute();

ImageInfo imageInfo = new ImageInfo();
imageInfo.setImageName(prefix + store.getStoreId() + suffix);

item.addMetadata("GoogleInfo", meta);
item.addMetadata("ImageInfo", imageInfo);

This prepares the necessary metadata to later fetch and label the map image for each store.

5. Task-Based Map Download and Upload

StaticMapWriter delegates work to reusable task components, decoupling infrastructure concerns:

googleServiceFactory.getTask("GoogleDownloadTask")
  .execute(googleInfo, imageInfo);

googleServiceFactory.getTask("S3UploadTask")
  .execute(s3Info, imageInfo);

Each task encapsulates a clear responsibility: downloading from Google, uploading to S3 — making the workflow modular and testable.

6. Job Parameters

Purpose: Makes job reusable with dynamic input.

--job.name=addressJob lastProcessedKey=500 quota=1000

7. Skip/Retry Handling

Purpose: Ensures recoverability from partial failures.

<chunk skip-limit="50">
  <skippable-exception-classes>
    <include class="SkippableException"/>
  </skippable-exception-classes>
</chunk>

8. Rundeck Integration

Purpose: Automates and schedules job execution.

java -jar batch-job.jar --job.name=addressJob quota=500

9. Job Listener

Purpose: Tracks success, failure, and job metrics.

public void afterStep(StepExecution step) {
  log.info("Success: " + jobState.getSuccess());
}

Code Index

Explore key components of the github.com/khimu/api Sample Code Snippet

Batch Jobs

  • ScrapperJob.java
    // Job definition and step configuration
    @Bean
    public Job scrapperJob() {
        return jobBuilderFactory.get("scrapperJob")
            .start(scrapperStep())
            .build();
    }
  • MainJob.java
    // Main job orchestrating multiple steps
    @Bean
    public Job mainJob() {
        return jobBuilderFactory.get("mainJob")
            .start(stepOne())
            .next(stepTwo())
            .build();
    }

Task Components

  • S3UploadTask.java
    // Uploading file to S3
    public void uploadToS3(String filePath) {
        s3Client.putObject(bucketName, keyName, new File(filePath));
    }
  • GoogleGeocodeTask.java
    // Making a request to Google Geocode API
    String url = "https://maps.googleapis.com/maps/api/geocode/json?address=" + address + "&key=" + apiKey;
    HttpResponse response = httpClient.execute(new HttpGet(url));

Readers and Writers

  • StoreKeywordItemReader.java
    // Reading store keywords from database
    public StoreKeyword read() {
        return jdbcTemplate.queryForObject("SELECT * FROM store_keywords WHERE ...", new StoreKeywordRowMapper());
    }
  • StaticMapWriter.java
    // Writing static map images to S3
    public void write(List items) {
        for (StaticMap map : items) {
            s3UploadTask.upload(map.getImagePath());
        }
    }

Partitioners

  • RangePartitioner.java
    // Partitioning data range for parallel processing
    public Map partition(int gridSize) {
        Map result = new HashMap<>();
        int range = endId - startId + 1;
        int chunkSize = range / gridSize;
        for (int i = 0; i < gridSize; i++) {
            ExecutionContext context = new ExecutionContext();
            context.putInt("startId", startId + (i * chunkSize));
            context.putInt("endId", (i == gridSize - 1) ? endId : startId + ((i + 1) * chunkSize) - 1);
            result.put("partition" + i, context);
        }
        return result;
    }

Readers

  • StoreKeywordItemReader.java
    // Reading store keywords from database
    public StoreKeyword read() {
        return jdbcTemplate.queryForObject("SELECT * FROM store_keywords WHERE ...", new StoreKeywordRowMapper());
    }

Writers

  • CustomKeywordItemWriter.java
    // Writing custom keywords to database
    public void write(List items) {
        for (CustomKeyword keyword : items) {
            jdbcTemplate.update("INSERT INTO custom_keywords ...", keyword.getValue());
        }
    }

Processors

  • StaticMapItemProcessor.java
    // Processing store data into static map requests
    public StaticMap process(Store store) {
        StaticMap map = new StaticMap();
        map.setCoordinates(store.getLatitude(), store.getLongitude());
        return map;
    }

Tasks

  • S3UploadTask.java
    // Uploading file to S3
    public void uploadToS3(String filePath) {
        s3Client.putObject(bucketName, keyName, new File(filePath));
    }

Google API Endpoints Used

These APIs enable the transformation and visualization of location data within the tool.

📍

Geocode API

https://maps.googleapis.com/maps/api/geocode/json?address={address}

Converts an address into geographic coordinates.

🔍

Places API

https://maps.googleapis.com/maps/api/place/nearbysearch/json?location={latitude},{longitude}&radius={radius}

Finds nearby businesses based on location and radius.

🗺️

Static Map API

http://maps.googleapis.com/maps/api/staticmap?center={latitude},{longitude}&zoom=15&size=400x125

Generates map images for preview and storage.

Next Steps

Potential improvements include integrating more data sources, enhancing map visualizations, and optimizing performance and error handling.